# ⭐Altair⭐ Walkthrough 

Michael Colaresi

# Plotting a Course 📍

There are a number of useful plotting libraries in Python. 

0- matplotlib is probably the most used. It is covered in PH4MJ. The pyplot submodule is the most popular way of interacting with matplotlib. Matplotlib is extensive and has a great deal of documentation, see [here](https://matplotlib.org).

1- seaborn is a nice module that sits on top of matplotlib and enables nicer looking plots more quickly than matplotlib. See [here](http://seaborn.pydata.org).

2- plotnine is useful if you come from an R and ggplot2 world. Information [here](https://plotnine.readthedocs.io/en/latest/).

3- plotly is available for python, see [here](https://plotly.com/python/).

4- Bokeh is an exciting project to keep an eye on for interactive graphics, see the docs [here](https://bokeh.org).


We will focus on Altair (project webpage is [here](https://altair-viz.github.io)) for a few reasons. 

a) It is build on top of Vega and Vega-lite (more information [here](https://vega.github.io/vega/) and [here](https://vega.github.io/vega-lite/) on those).

b) Wait? Why is (a) important? Because Vega and Vega-lite include a visual grammar for modern graphics. Where static grammar of graphics is powering ggplot2 and plotnine, we need different verbs and mechanisms for interactive web graphics. Vega and Vega-lite provide these.

c) Altair is imperative, which means that you tell it what you want it to produce (what are the **imperative** goals) and it largely figures out how to do it. This is in contrast to being declarative, where you declare **how** to do something.

d) Vega and Vega-lite, and thus Altair produce visualizations that are stored as json objects. These look like yaml as we talked about. Therefore the underlying structure of the plot is seperate from the rendering of the plot. This is crucial for interactive graphics in particular because there is not just "one" render of the information, but a structure.

e) Altair is a nice middle ground between developing a full interactive web-application for every graphic (if you want to do that check out dask and flask), being stuck with the plotly api (which can be limiting), or missing the opportunities for interactive altogether.

I would add that I believe these benefits, not specifically for Altair, are crucial for successful computational social science. Graphics need to convey layers of context and patterns. The careful crafting of inter-actions, such as tool tips and brushing or the linking together of plots, can convey much more insight than layers that must always be present on top of each other (as in a static plot).


# Installation ⚙️ 

Make sure you have `Altair` and `vega_datasets` installed, there are directions specifically [here](https://altair-viz.github.io/getting_started/installation.html)

# Import modules 📦 

Lets set up our dependencies. Remember, the following step `import`s code and object types into the memory/imagination of our computer so we can use these.

We will use the `alt` alias for `altair` and directly move the `data` object to our project.

We are going to grab the data on Seattle Weather from the project, this is a commonplace to start for Vega projects. This way we do not have to type `vega_datasets.data` everytime.

The cost of doing this is that now the `data` name in the global namespace is taken.

In [2]:
import altair as alt # use altair, alias as alt
from vega_datasets import data # directly import vega_datasets.data 
import inspect

We can see a little bit about this object by peeking into using the `inspect` module and the `getdoc` function. This will get the docstring for the object provided as an argument.

In [3]:
print(inspect.getdoc(data.seattle_weather))

Loader for the seattle-weather dataset.

This dataset contains precipitation totals, temperature extremes, wind
speed, and weather type recorded daily in Seattle from 2012 to
2015. The dataset is drawn from public-domain `NOAA data
<https://www.weather.gov/disclaimer>`_, and transformed using
scripts available at http://github.com/vega/vega_datasets/.

This dataset is bundled with vega_datasets; it can be loaded without web access.
Dataset source: https://vega.github.io/vega-datasets/data/seattle-weather.csv

Usage
-----

    >>> from vega_datasets import data
    >>> seattle_weather = data.seattle_weather()
    >>> type(seattle_weather)
    <class 'pandas.core.frame.DataFrame'>

Equivalently, you can use

    >>> seattle_weather = data('seattle-weather')

To get the raw dataset rather than the dataframe, use

    >>> data_bytes = data.seattle_weather.raw()
    >>> type(data_bytes)
    bytes

To find the dataset url, use

    >>> data.seattle_weather.url
    'https://vega.github.io/veg

That is all good to know. In particular we now know what the data is about, as well as the fact that running the loader will produce a pd.DataFrame object for us from the csv.

Now lets load it.

In [4]:
df = data.seattle_weather()
type(df)

pandas.core.frame.DataFrame

Here we have run the loader with `data.seattle_weather()`; this is like instatiating the class. In this class `seattle_weather` has inhereted from a more general class of objects that are defined as `Dataset`. You can see the attributes described at the bottom of the docstring a few cells up.

We then checked the type with `type(df)` to make sure we got what we expected. Lets look a little more closely using the `.head()` method for pd.DataFrames

In [5]:
df.head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain


## Our first simple charts 📈 

We use the functions in Altair to build up an imperative statement about what we want to visualize.

0) So we will start things by using our **Altair** alias, `alt`. 

1) We will then have to reach into the `alt` code base, so we will need to add a dot `alt.`

2) To instantiate a **chart**, we use `alt.Chart()`

3) The **Chart** object takes arguments, the first of which is the data that will be charted, so we will provide that `alt.Chart(df)`

4) A Chart is just a blank (potentially interactive) canvas, we need to make **marks** on the canvas to represent information. There are many types of `mark` we can make, one type is `mark_tick`, `mark_tick` belongs to the `alt.Chart(df)` instance, so we will type `alt.Chart(df).mark_tick()`

5) Marks **encode** data in visual form. Se we need to tell mark_tick() what to encode.  An encoding is like a map from a visual feature (like the x-axis position) to data (usually a column of scalar). We can do this with the *.encode()* method. In our case, lets encode precipitation to x, with:  `alt.Chart(df).mark_tick().encode(x = 'precipitation')`


lets see what that looks like:


In [8]:
alt.Chart(df).mark_tick().encode(
    x = 'precipitation'
)

We can look at the frequency of values within `bins` by making three changes to this code.

Instead of marks being `ticks`, lets make bars. Bars will represent bins on the x-axis (so ranges of precipitation) and we will add an encoding for y, where height of the bar will be encoded to the frequency of the values within the range/bin for the car.

We will do this:

0) Again, we start with `alt.Chart(df)`

1) Now we switch to `.mark_bar`, to get `alt.Chart(df).mark_bar()` 

2) We again `.encode`, but this time we are going to have an encoding for the x and y values for each bar; `alt.Chart(df).mark_bar().encode()` 

3) We will use the special function alt.X to bin the x-values and the **data transformation** `count()` to count the values in each bin:

```
alt.Chart(df).mark_bar().encode(
    alt.X(bin, 'precipitation'),
    y = count()
)

```

In [9]:
alt.Chart(df).mark_bar().encode(
    alt.X('precipitation', bin=True),
    y = 'count()'
)

This is a common combination, where **transformations** are applied while encoding visual elements, like `y` to data.

The `alt.X` and `alt.Y` functions as just setting up the encodings.

In [116]:
?alt.X

[0;31mInit signature:[0m
[0malt[0m[0;34m.[0m[0mX[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mshorthand[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maggregate[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mband[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mbin[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfield[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mimpute[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mscale[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msort[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstack[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtimeUnit[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0m

This means we can make the same plot above that we produced with:

```python
alt.Chart(df).mark_bar().encode(
    alt.X('precipitation', bin=True),
    y = 'count()'
)
```

by using `alt.Y` too:

```python
alt.Chart(df).mark_bar().encode(
    alt.X('precipitation', bin=True),
    alt.Y('count()')
)
```

We are just setting the encoding for that axis.

In [10]:
alt.Chart(df).mark_bar().encode(
    alt.X('precipitation', bin=True),
    alt.Y('count()')
)

We can make a horizontal bar chart just by flipping the visual encoding so that the precipitation bins are now visually encoded to y instead of x, and the count is now on x.

In [11]:
alt.Chart(df).mark_bar().encode(
    alt.Y('precipitation', bin=True),
    alt.X('count()')
)

# Transformations and Aggregations 🚛

If we want to plot the precentage of the data instead of the counts we can aggregate and transform the data in the Chart object

0) set up the Chart and the data, `alt.Chart(df)`

1) We need the total number of records so we create that as **aggregate** and then **join** that aggregate to our data object, with `transform_joinaggregate`:

```python

alt.Chart(df).transform_joinaggregate(
    TotalRecords = 'count(*)',
)
```

The `'count(*)'` says count all the records. This is an aggregation/summary of the data.

There is more information on this type of summarization [here](https://vega.github.io/vega/docs/transforms/joinaggregate/).

We can then create a new value for each record we have call it `pct` that we will sum up to make the total percentage.


```python
alt.Chart(df).transform_joinaggregate(
    TotalRecords = 'count(*)',
).transform_calculate(
    pct = '1 / datum.TotalRecords'
)

```  

That is all we do to the underlying data, now we just plot almost as before. 


```python
alt.Chart(df).transform_joinaggregate(
    TotalRecords = 'count(*)',
).transform_calculate(
    pct = '1 / datum.TotalRecords'
).mark_bar().encode(
    alt.Y('precipitation:Q', bin=True),
    alt.X('sum(pct):Q')
)

```  

The only difference here is that I have added a specific annotation `:Q` telling Altair that the columns are quantitative, since it was no obvious from the values.

Finally, I will make the percentage axis a little bit prettier by formatting it as a percent.

```python
alt.Chart(df).transform_joinaggregate(
    TotalRecords = 'count(*)',
).transform_calculate(
    pct = '1 / datum.TotalRecords'
).mark_bar().encode(
    alt.Y('precipitation:Q', bin=True),
    alt.X('sum(pct):Q', axis=alt.Axis(format="%")
)

``` 




In [12]:
alt.Chart(df).transform_joinaggregate(
    TotalRecords = 'count(*)',
).transform_calculate(
    pct = '1 / datum.TotalRecords'
).mark_bar().encode(
    alt.Y('precipitation:Q', bin=True),
    alt.X('sum(pct):Q', axis=alt.Axis(format='%'))
)

Notice that this transformation could also be done in the original data frame

In [13]:
df["pct"] = 1 / df.shape[0] #df.shape returns the shape attribute for the DataFrame, rows by columns
alt.Chart(df).mark_bar().encode(
    alt.Y('precipitation:Q', bin=True),
    alt.X('sum(pct):Q', axis=alt.Axis(format='%'))
)

In [14]:
df.shape

(1461, 7)

# Titles 🤴 

Can add and change titles too to more clearly communicate what is being plotted:

In [15]:
df["pct"] = 1 / df.shape[0] #df.shape returns the shape attribute for the DataFrame, rows by columns
alt.Chart(df).mark_bar().encode(
    alt.Y('precipitation:Q', bin=True, title='Precipitation(cm)'),
    alt.X('sum(pct):Q', axis=alt.Axis(format='%'), title='Percentage of total (%)')
)

# Time ⏳

Time can be tricky, in terms of dates and plotting. When you have observations over time, you should make sure the intervals are regular, and meaningful. It is ussually a good idea to encode the information as a date/time.

Altair has some time time-handling capabilities since it leverage Vega.

Lets look at average precipitation by month



0) We will mark the information with a `line`, so we start with 

```python
alt.Chart(df).mark_line().encode(
...
)
```

1) We will use the `month()` transform, and tell Altair this is a temporal unit (`:T`), this pulls out the month from a properly formatted date as we have in the df.`date` field. Formally, we are using timeUnit binning as discussed [here](https://altair-viz.github.io/user_guide/transform/timeunit.html#user-guide-timeunit-transform). We will encode month to x.

```python
alt.Chart(df).mark_line().encode(
    x='month(date):T'
    ...
)
```

2) Then we will average precipitation and encode that on y, telling Altair this is quantitative.

```python
alt.Chart(df).mark_line().encode(
    x='month(date):T',
    y='average(precipitation):Q'
)
```

Lets take a look:


In [16]:
alt.Chart(df).mark_line().encode(
    x='month(date):T',
    y='average(precipitation):Q'
)

We can look at these cycles annually byt binning observations not by month but by year and month

We do this by simply replacing `month()` with `yearmonth()` as the data we are encoding to x.

In [124]:
alt.Chart(df).mark_line().encode(
    x='yearmonth(date):T',
    y='average(precipitation):Q'
)

# Scatter Plot 👉

Scatter plots are a basic form of plot that often are useful as background.

To see the relationship between temperature and precipitation in Seattle (it rains alot when it is hotter)

We can encode minimum temperature to x

In [17]:
alt.Chart(df).mark_point().encode(
    x='temp_min:Q',
    y='precipitation:Q'
)

# Combining Layers 🍰 

It is often useful to be able to mark the raw data and average to provide context for the aggregations.

In Altair we can layer marks on top of each other.

Lets go back to our avg temperature over time plot and add the raw data as points


Instead of simply displaying them, lets save them with names:

```python
avgPlot = alt.Chart(df).mark_line().encode(
    x='yearmonth(date):T',
    y='average(precipitation):Q'
)
```

If we were only going to use the raw points, and save the plotting object, we would:

```python
rawDataPlot = alt.Chart(df).mark_point().encode(
    x='date:T',
    y='precipitation:Q'
)
```


Now we need to tell Altair that we want layers and which order we want the layers. 

The easiest way is with `+`, adding layers:

```python
rawDataPlot + avgPlot
```



In [18]:
rawDataPlot = alt.Chart(df).mark_point().encode(
    x='date:T',
    y='precipitation:Q'
)


avgPlot = alt.Chart(df).mark_line().encode(
    x='yearmonth(date):T',
    y='average(precipitation):Q'
)

rawDataPlot + avgPlot

I think the raw data is too pronounced so I am going to set the opacity for the points to a lower percentage. This will make them lighter.

In [19]:
rawDataPlot = alt.Chart(df).mark_point(opacity=.3).encode(
    x='date:T',
    y='precipitation:Q'
)


avgPlot = alt.Chart(df).mark_line().encode(
    x='yearmonth(date):T',
    y='average(precipitation):Q'
)

rawDataPlot + avgPlot

We can do the same thing by directly calling alt.layer

In [128]:
alt.layer(
    rawDataPlot,
    avgPlot
)

# Concatenating Plots 🔗

## Horizontal

It often useful to communicate information about multiple related plots together. We can do this by concatenting plots

Lets create a horizontal histogram for precipitation, like what we started with, but we will add the `.properties` method to set the height and the width manually.

In [23]:
df["pct"] = 1 / df.shape[0] #df.shape returns the shape attribute for the DataFrame, rows by columns
hHist = alt.Chart(df).mark_bar().encode(
    alt.Y('precipitation:Q', bin=True),
    alt.X('sum(pct):Q', axis=alt.Axis(format='%'))
).properties(height=300, width=100)
hHist

In [24]:

mainPlot = alt.layer(
    rawDataPlot,
    avgPlot).properties(height=300, width=300)
mainPlot

Then we will horizontally concatenate them with `|`, as in `mainPlot | hHist`

In [25]:
mainPlot | hHist

This is pretty good, but the bin sizers are too big by default. Lets fix those right now

In [27]:
hHist = alt.Chart(df).mark_bar().encode(
    alt.Y('precipitation:Q', bin={"step": 5}, title=None),
    alt.X('sum(pct):Q', axis=alt.Axis(format='%'), title="% of days")
).properties(height=300, width=100)
hHist

In [28]:
hplot =(mainPlot | hHist).resolve_scale(y='shared')
hplot

Instead of using `left | right` you can also use `alt.hconcat(left, right)`


# Vertical

We can also place a plot under or over a plot.

Lets look at temperature over time as another line graph and place it under the first graph, which I saved as hplot.

In [29]:
tempPlot = alt.Chart(df).mark_line().encode(
    x = 'date:T',
    y = 'temp_max:Q'
).properties(height=20, width=300)
tempPlot

In [30]:
hplot & tempPlot

Again, there is another way, instead of `upper & lower`, we could have done `alt.vconcat(upper, lower`)

# Adding addition encodings like color 🎨 

Color, opacity, and size are often used to improve the plots. 

We used opacity above, lets add color.

Specifically, days can have qualitatively different weather, so we can encode the `df.weather` column as `color` in the raw data.

In [32]:
rawDataPlot = alt.Chart(df).mark_point(opacity=.3).encode(
    x='date:T',
    y='precipitation:Q',
    color='weather:N'
)
rawDataPlot

It would be better to control what colors are mapped to what values, we can do that with `alt.Scale`

`scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                  range=['#e7ba52', '#c7c7c7', '#aec7e8', '#1f77b4', '#9467bd'])`
                  
- domain - the values that the variable/column takes
- range - what visual values each is encoded to

So lets create a scale and then use.

I will also explicity set the legend title.

In [33]:
scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                  range=['#e7ba52', '#c7c7c7', '#aec7e8', '#1f77b4', '#9467bd'])
rawDataPlot = alt.Chart(df).mark_point(opacity=.3).encode(
    x='date:T',
    y='precipitation:Q',
    color=alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale),
)
rawDataPlot

We can add color to a bar chart too. `count()` will act within, not across, color when we set that channel to a variable.

Later I am going to move the legend below the plot, so I can show that command here here `orient='bottom`, this all takes 'left' and 'top', but 'right' is the default.

In [35]:
newLowerPlot = alt.Chart(df).mark_bar().encode(
    x=alt.X('yearmonth(date):N', title='Month of the year'),
    y='count()',
    color = alt.Color('weather', legend=alt.Legend(title='Weather type', orient="bottom"), scale=scale),
).properties(height=100, width=300)
newLowerPlot

Notice above, because I started with named arguments for the encoding `x=..,, y=...`, I had to also given `color=alt.Color(...)`


Lets put the three plots together.

We do this in three steps:

0) Create a mainPlot with rawDataPlot underneath avgPlot, creating the correct sizes.

1) horizontally concatenating mainPlot with the hHist (using `|`)

2) vertically concatenating the previous step (1) with the newLowerPlot (using &)

`resolve_scale` will help us when two scales are the same type to ensure they are aligned, if `shared` is set for a given axis.

In [36]:
mainPlot = alt.layer(
    rawDataPlot,
    avgPlot).properties(height=300, width=300)
hplot =(mainPlot | hHist).resolve_scale(y='shared')
hplot
(hplot & newLowerPlot)

# Interactive tooltips 🤩 

One introductory feature of interactivity are tooltips. This is information that will appear if you hover or click on a visual artifact. 

We can make provide the points in our main plot with tool tips.

We simply use the `tooltip` encoding channel and map it to data. Specifically as list of column names.

We have to render the plot as `interactive()` for this to take effect.

This interactivity will not only let us hover for tooltips, but also zoom.

In [37]:
scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                  range=['#e7ba52', '#c7c7c7', '#aec7e8', '#1f77b4', '#9467bd'])
rawDataPlot = alt.Chart(df).mark_point(opacity=.3).encode(
    x='date:T',
    y='precipitation:Q',
    color=alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale),
    tooltip = ['date', 'precipitation', 'weather', 'temp_max', 'temp_min']
)
rawDataPlot

In [38]:
mainPlot = alt.layer(
    rawDataPlot,
    avgPlot).properties(height=300, width=300)
hplot =(mainPlot | hHist).resolve_scale(y='shared')
(hplot & newLowerPlot).resolve_scale(x='shared')

Lets add some color to the right plot, the one name `hHist`.

In [39]:
hHist = alt.Chart(df).mark_bar().encode(
    alt.Y('precipitation:Q', bin={"step": 5}, title=None),
    alt.X('sum(pct):Q', axis=alt.Axis(format='%'), title="% of days"),
    alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale)
).properties(height=300, width=100)
hHist

Then putting this into the full array of plots:

In [40]:
mainPlot = alt.layer(
    rawDataPlot,
    avgPlot).properties(height=300, width=300)
hplot =(mainPlot | hHist).resolve_scale(y='shared')
(hplot & newLowerPlot).resolve_scale(x='shared')

We can add a tooltip at the bottom histogram so that a user can zoom in on that information. We just provide a `tooltip` encoding for the `newLowerPlot` (remember that is just a name, it could have been `Common`) 

In [41]:
newLowerPlot = alt.Chart(df).mark_bar().encode(
    x=alt.X('yearmonth(date):N', title='Month of the year'),
    y='count()',
    color=alt.Color('weather', legend=alt.Legend(title='Weather type', orient="bottom"), scale=scale),
    tooltip=['yearmonth(date):N','weather', 'count()']
).properties(height=100, width=300)
newLowerPlot

Now putting them all together again

In [45]:
(hplot & newLowerPlot).resolve_scale(x='shared')

# Selecting elements across plots 🌌 

Note that this only works in Altair 4 and above...

To show a more advanced example of what Altair can do, we allow a user to use the legend to select parts of the visualization to focus on by weather.

We do this by using a multiple selection tool, `selection_multi`, this means that more than one visual object can selected at a time (by shift clicking). We will bind the `selection_multi` to the `legend`, so that is where the user will click to change things.

Then we can use `alt.condition` which tracks whether an object has been selected or not. We can use that condition within an encoding to **change** the encoding based on the user behavior.



In [46]:
selection = alt.selection_multi(fields=['weather'], bind='legend')


newLowerPlot = alt.Chart(df).mark_bar().encode(
    x=alt.X('yearmonth(date):N', title='Month of the year'),
    y='count()',
    color=alt.Color('weather', legend=alt.Legend(title='Weather type', orient="bottom"), scale=scale),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
    tooltip=['yearmonth(date):N','weather', 'count()']
).add_selection(
    selection
).properties(height=100, width=300)
newLowerPlot

Click on the legend to see what happens. We will add that to each plot.

In [47]:
scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                  range=['#e7ba52', '#c7c7c7', '#aec7e8', '#1f77b4', '#9467bd'])

selection = alt.selection_multi(fields=['weather'], bind='legend')


rawDataPlot = alt.Chart(df).mark_point(opacity=.3).encode(
    x='date:T',
    y='precipitation:Q',
    color=alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale),
    opacity=alt.condition(selection, alt.value(.3), alt.value(0.1)),
    tooltip = ['date', 'precipitation', 'weather', 'temp_max', 'temp_min']
).add_selection(
    selection
)

avgPlot = alt.Chart(df).mark_line(color="black").encode(
    x='yearmonth(date):T',
    y='average(precipitation):Q'
).transform_filter(
    selection
)


hHist = alt.Chart(df).mark_bar().encode(
    alt.Y('precipitation:Q', bin={"step": 5}, title=None),
    alt.X('sum(pct):Q', axis=alt.Axis(format='%'), title="% of days"),
    alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
).add_selection(
    selection
).properties(height=300, width=100)


newLowerPlot = alt.Chart(df).mark_bar().encode(
    x=alt.X('yearmonth(date):N', title='Month of the year'),
    y='count()',
    color=alt.Color('weather', legend=alt.Legend(title='Weather type', orient="bottom"), scale=scale),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
    tooltip=['yearmonth(date):N','weather', 'count()']
).add_selection(
    selection
).properties(height=100, width=400)


mainPlot = alt.layer(
    rawDataPlot,
    avgPlot).properties(height=300, width=400)
hplot =(mainPlot | hHist).resolve_scale(y='shared')
finalPlot = (hplot & newLowerPlot).resolve_scale(x='shared')
finalPlot

I also added a `filter_transform` to the average so that the avg is recalculated based on what is selected. 

In [150]:
alt.__version__

'4.1.0'

# Saving plots

Altair's job is to translate your python code and objects into a json representation that is then displayed using the vega-lite schema.

# json

You can use the `chart.tojson()` (where chart is an `alt.Chart` object) method to see this. We will create an that json object here and look at it later, using `chart.save()`

In [151]:
finalPlot.save('finalPlot.json')

# html

Generally, we will not want to figure out how to embed the code in html ourselves, we would like Altair to do that for us (like jupyter notebooks render the plots for us).

We can also do that with `chart.save()`

In [103]:
finalPlot.save('finalPlot.html')

To render the interactive plot as a scalable vector graphic (svg) within the html, you can use

`finalPlot.save('finalPlot.html', embed_options={'renderer':'svg'})`

# other formats

More information can be found [here](https://altair-viz.github.io/user_guide/saving_charts.html) on saving. Remember that only html is likely to keep all of the interaction effects. 

# Organization

As a final step, we wrap the steps into functions that document what we did and then a main function that calls each. 

We will use the cell magi `%%writefile` to save the cell to a file and then `%pycat` to read and check it.

In [48]:
%%writefile CreateWeatherGraphic.py

"""Create plots
Parameters:
  sys.argv[1]: name of plot to save, with extension (eg html)
"""

import sys
import altair as alt
from vega_datasets import data


def load_data():
    """load seattle weather data
    assumptions: data imported from vega_datasets
    """
    return data.seattle_weather()


def add_pct(df):
    """add column so pct can be calculated, as side effect
    """
    df["pct"] = 1 / df.shape[0]  # df.shape returns the shape attribute for the DataFrame, rows by columns


def make_plot(df):
    """create weather plot
    Parameters
      df: data frame to use (from load_data)
    """
    scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                      range=['#e7ba52', '#c7c7c7', '#aec7e8', '#1f77b4', '#9467bd'])

    selection = alt.selection_multi(fields=['weather'], bind='legend')


    rawDataPlot = alt.Chart(df).mark_point(opacity=.3).encode(
        x='date:T',
        y='precipitation:Q',
        color=alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale),
        opacity=alt.condition(selection, alt.value(.3), alt.value(0.1)),
        tooltip=['date', 'precipitation', 'weather', 'temp_max', 'temp_min']
    ).add_selection(
        selection
    )

    avgPlot = alt.Chart(df).mark_line(color="black").encode(
        x='yearmonth(date):T',
        y='average(precipitation):Q'
    ).transform_filter(
        selection
    )


    hHist = alt.Chart(df).mark_bar().encode(
        alt.Y('precipitation:Q', bin={"step": 5}, title=None),
        alt.X('sum(pct):Q', axis=alt.Axis(format='%'), title="% of days"),
        alt.Color('weather', legend=alt.Legend(title='Weather type'), scale=scale),
        opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
    ).add_selection(
        selection
    ).properties(height=300, width=100)


    newLowerPlot = alt.Chart(df).mark_bar().encode(
        x=alt.X('yearmonth(date):N', title='Month of the year'),
        y='count()',
        color=alt.Color('weather', legend=alt.Legend(title='Weather type', orient="bottom"), scale=scale),
        opacity=alt.condition(selection, alt.value(1), alt.value(0.1)),
        tooltip=['yearmonth(date):N', 'weather', 'count()']
    ).add_selection(
        selection
    ).properties(height=100, width=400)


    mainPlot = alt.layer(
        rawDataPlot,
        avgPlot).properties(height=300, width=400)
    hplot = (mainPlot | hHist).resolve_scale(y='shared')
    finalPlot = (hplot & newLowerPlot).resolve_scale(x='shared')
    return finalPlot


def main(fname):
    """run everything, save plot with name
    """
    df = load_data()
    add_pct(df)
    finalPlot = make_plot(df)
    finalPlot.save(fname)


if __name__ == '__main__':
    NAME = sys.argv[1]
    main(NAME)

Overwriting CreateWeatherGraphic.py


In [49]:
%pycat CreateWeatherGraphic.py

[0;34m[0m
[0;34m[0m[0;34m"""Create plots[0m
[0;34mParameters:[0m
[0;34m  sys.argv[1]: name of plot to save, with extension (eg html)[0m
[0;34m"""[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;32mimport[0m [0msys[0m[0;34m[0m
[0;34m[0m[0;32mimport[0m [0maltair[0m [0;32mas[0m [0malt[0m[0;34m[0m
[0;34m[0m[0;32mfrom[0m [0mvega_datasets[0m [0;32mimport[0m [0mdata[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;32mdef[0m [0mload_data[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""load seattle weather data[0m
[0;34m    assumptions: data imported from vega_datasets[0m
[0;34m    """[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mdata[0m[0;34m.[0m[0mseattle_weather[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;34m[0m
[0;34m[0m[0;32mdef[0m [0madd_pct[0m[0;34m([0m[0mdf[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""add column s