# Data Visualization with Altair

Altair is a Python plotting package that differs from other plotting solutions because it takes a **declarative** approach as opposed to an **imperative** one. This means it is designed to write code that describes **what** you want to do, instead of **how** the program should run.

## Installation

This notebook requires the following `pip` installable packages:

`pip install pandas altair altair_saver`

## Setup


In [None]:
import pandas as pd
import altair as alt


We will use a common example dataset that describes the historical performance of various cars. As with any new data set, our first step should always be to load the data and look at it to understand how it is organized.

In [None]:
mpg = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv')
mpg.head()


## Basic Plots

`altair` is built around three key concepts:

1. **Charts** are what represent the data you want to display and can be thought of as the individual figure you are creating.
2. **Marks** are used to describe how the data should be displayed on the chart
3. **Encodings** configure the position, size, color, and other properties of each mark

Here is an example:

In [None]:
# Chart objects can be constructed directly from a DataFrame, Easy!
chart = alt.Chart(mpg)

# The `mark_point` method represents a scatter plot
mark = chart.mark_point()

# Now we tell the marks where they should be
mark.encode(
    x='mpg',
    y='horsepower'
)


In practice, a short example like this can be written in a few lines:

In [None]:
alt.Chart(mpg).mark_point().encode(
    x='mpg',
    y='horsepower'
)


Next lets add some color to the plot to represent the different countries of origin for each car. We could try to iterate over each of the points and manually specify their colors and shapes, but this is very **imperative** thinking. Lets be **declarative** instead by telling `altair` to change the color and shape of the points to represent the `origin` column.

In [None]:
alt.Chart(mpg).mark_point().encode(
    x='mpg',
    y='horsepower',
    color='origin'
)

#### Exercise:

In the below cell, create a scatter plot of the weight vs. acceleration of each car. Set the `color` of the points to represent the `horespower` column and the `shape` of the points to represent the `origin` column.


#### Exercise:

Here is a table of a few other **marks** that are available with `altair`. Change the kind of mark that is used in the following cell. Dose the result look like what you expected?

Mark Name |  Method                |        Description                              
----------|------------------------|-------------------------------------------------
area      |  `Chart.mark_area`     | A filled area plot.                             
bar       |  `Chart.mark_bar`      | A bar plot.                                     
circle    |  `Chart.mark_circle`   | A scatter plot with filled circles.             
line      |  `Chart.mark_line`     | A line plot.                                    
point     |  `Chart.mark_point`    | A scatter plot with configurable point shapes.  
square    |  `Chart.mark_square`   | A scatter plot with filled squares.             
tick      |  `Chart.mark_tick`     | A vertical or horizontal tick mark.             


In [None]:
alt.Chart(mpg).mark_point().encode(
    x='mpg',
    y='horsepower',
    color='origin'
)


If you ever need to see all the available options, a quick and easy way to do this is using auto-complete. Place your text cursor at the end of the following code cell and press tab.

In [None]:
alt.Chart.mark_

Whenever you encode data on a chart, your changes are persistent. This means you can mark up a chart, and then adjust your encodings as necessary. 

In [None]:
point = alt.Chart(mpg).mark_point().encode(
    y='horsepower',
    color='origin'
)

In [None]:
point.encode(x='displacement')


In [None]:
point.encode(x='acceleration')


#### Exercise:

Re-encode the `point` variable so that miles per gallon is on the x-axis and points are colored by acceleration.


In [None]:
# Put your answer here

## Aggregating Data

Altair makes it easy to plot aggregated data, such as the kind you would represent with a histogram. 
Here is a quick example:

In [None]:
alt.Chart(mpg).mark_bar().encode(
    alt.X('horsepower'),
    alt.Y('horsepower', aggregate='count')
)


Notice that we have used a new type of plotting syntax. Instead of specifying keyword arguments, we declare our `x` and `y` coordinates as objects. When using the syntax shown above, the order of the arguments does not matter! It knows what to do with each argument by its type.

#### Exercise:

Use the `alt.Color` class to change the color of the following plot. Set the color to represent the `origin` column.

In [None]:
# Change the following code
alt.Chart(mpg).mark_bar().encode(
    alt.X('mpg'),
    alt.Y('mpg', aggregate='count')
)


There are a couple of different ways to change the bin sizes that are used in the plot. Here are some examples you can try:

In [None]:
maximum_bins = alt.Bin(maxbins=50)
explicit_bins = alt.Bin(extent=[0, 50], step=2.5)
nice_bins = alt.Bin(nice=True)  # Auto select "human friendly" bins

alt.Chart(mpg).mark_bar().encode(
    alt.X('mpg', bin=nice_bins),
    alt.Y('mpg', aggregate='count'),
    alt.Color('origin')
)


#### Exercise:

Create a histogram of acceleration values. Color code the values by horsepower.


In [None]:
# Put your answer here


We should note at this point that Altair also provides a shorthand for aggregating data. A full list is available in the [official docs](https://altair-viz.github.io/user_guide/encoding.html#binning-and-aggregation).

In [None]:
alt.Chart(mpg).mark_bar().encode(
    alt.X('mpg', bin=nice_bins),
    alt.Y('count(mpg)'),  # <- Note the shorthand here
    alt.Color('origin')
)


#### Exercise:

Create a scatter plot of the average mpg per year (With or without the Altair shorthand).


In [None]:
# Put your answer here


## Basic Formatting


By default, `altair` will try to include the zero point in the plot. For example:


In [None]:
alt.Chart(mpg).mark_point().encode(
    alt.X('acceleration'),
    alt.Y('horsepower')
)

We can disable this by setting the axis scale:

In [None]:
alt.Chart(mpg).mark_point().encode(
    alt.X('acceleration', scale=alt.Scale(zero=False)),
    alt.Y('horsepower')
)


Alternatively, we can also set the axis limits directly:

In [None]:
alt.Chart(mpg).mark_point().encode(
    alt.X('acceleration', scale=alt.Scale(domain=(5, 30))),
    alt.Y('horsepower')
)


#### Exercise: 

Create a duplicate of the above plot with an x-range from 0 to 15. Hint: you may need to pass `clip=True` to your `mark_point()` object.

In [None]:
# Put your answer here

In principle, column names from a pandas DataFrame don't always make good axes labels. We can change them by specifying an `Axis` object.

In [None]:
alt.Chart(mpg).mark_point().encode(
    alt.X('acceleration', axis=alt.Axis(title='New Acceleration Label')),
    alt.Y('horsepower',  axis=alt.Axis(title='New Horsepower Label'))
)


And as a final formatting note, we can also change the size of our figure. 

In [None]:
alt.Chart(mpg).mark_point().encode(
    alt.X('acceleration'),
    alt.Y('horsepower')
).properties(
    width=500,
    height=500
)


## Subplots

Altair provides the `hconcat()` and `vconcat()` functions that combine charts into rows and columns:

In [None]:
chart1 = alt.Chart(mpg).mark_point().encode(
    alt.X('model_year', scale=alt.Scale(domain=(65, 85))),
    alt.Y('mpg'),
    color='origin'
)

chart2 = alt.Chart(mpg).mark_point().encode(
    alt.X('model_year', scale=alt.Scale(domain=(65, 85))),
    alt.Y('horsepower'),
    color='origin'
)

alt.hconcat(chart1, chart2)


#### Exercise:

Altair provides shortcuts for combining subplots together. Pause to think what the output of the following cell will be and then run it to see the output. What happens when you use the `&` operator instead?

In [None]:
chart1 | chart2

#### Exercise: 

A set of incomplete charts is provided below. Complete them as follows.

1. In the first chart plot the binned `model_year` vs. the binned `horsepower` and color code by `'count()'`.
1. In the second chart plot the same data (not binned) and color code by `origin`.
1. What do you get when you add the two plots together using the `+` opperator?


In [None]:
counts = alt.Chart(mpg).mark_rect(opacity=0.3).encode(
    # Put the first figure values here
)

points = alt.Chart(mpg).mark_point().encode(
    # Put the second figure values here
)

# Put your final sum here


## Interactive Plots

Altair supports adding interactive plotting using a few different approaches. The `interactive` method is the simplest and adds basic zoom ans scrolling:

In [None]:
alt.Chart(mpg).mark_point().encode(
    alt.X('horsepower'),
    alt.Y('mpg'),
    alt.Color('origin')
).interactive()


The `properties` method allows us to conect additional behaviors that control how data is encoded:

In [None]:
interval = alt.selection_interval()

alt.Chart(mpg).mark_point().encode(
    alt.X('horsepower'),
    alt.Y('mpg'),
    color=alt.condition(interval, 'origin', alt.value('lightgray'))
).properties(
    selection=interval
)


## Exercise:

1. Create one scatter plot of horse power vs weight
2. Create a second scatter plot of horse power vs  acceleration
2. Add an interval selection tool as demonstrated above to both plots. Use the same interval for both plots.
3. Display the plots side by side.

In [None]:
# Put your answer here


The `transform_filter` method allows us to filter the plotted data:

In [None]:
interval = alt.selection_interval()

base = alt.Chart(mpg).mark_point().encode(
    alt.X('horsepower'),
    alt.Y('mpg'),
    color=alt.condition(interval, 'origin', alt.value('lightgray'))
).properties(
    selection=interval
)

hist = alt.Chart(mpg).mark_bar().encode(
    x='count(origin)',
    y='origin',
    color='origin'
).transform_filter(
    interval
)

interactive_demo = (base & hist)
interactive_demo


## Saving plots

Altair supports the Vega JSON standard so that the plot **and** data can be shared together. We can also access Vega from the menu in the top right-hand corner of our plots. 

In [None]:
interactive_demo.save('chart.json')

with open('chart.json') as infile:
    print(infile.readline()[:1000])
    

Alternatively, you can save and share a completely rendered version as HTML.

In [None]:
interactive_demo.save('chart.html')


Unfortunatly, `altair` can't save our results to common image formats like PDF or PNG. However, this can be accomplished through the [altair_saver](https://github.com/altair-viz/altair_saver/) package.