# Advanced Graphics with Altair

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', 10)
import datetime as dt
import altair as alt

## Automobile Dataset

We will use the Automobile Data Set [https://archive.ics.uci.edu/ml/datasets/automobile] from the UCI Machine Learning Repository [https://archive-beta.ics.uci.edu/]. It includes categorical and continuous variables. 

Defining the headers

In [2]:
# Defining the headers
headers = ["symboling", "normalized_losses", "make", "fuel_type", "aspiration",
           "num_doors", "body_style", "drive_wheels", "engine_location",
           "wheel_base", "length", "width", "height", "curb_weight",
           "engine_type", "num_cylinders", "engine_size", "fuel_system",
           "bore", "stroke", "compression_ratio", "horsepower", "peak_rpm",
           "city_mpg", "highway_mpg", "price"]

In [3]:
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
                  header=None, names=headers, na_values="?" )
df.head()

Unnamed: 0,symboling,normalized_losses,make,fuel_type,aspiration,...,horsepower,peak_rpm,city_mpg,highway_mpg,price
0,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,13495.0
1,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,16500.0
2,1,,alfa-romero,gas,std,...,154.0,5000.0,19,26,16500.0
3,2,164.0,audi,gas,std,...,102.0,5500.0,24,30,13950.0
4,2,164.0,audi,gas,std,...,115.0,5500.0,18,22,17450.0


## Linking Brush to Scatterplot

The following examples show how to add a simple brush to a scatter plot. By clicking and dragging on the graph, you can highlight points within the range.

### Brush: interval selection

We will create an interval selection.

Notice it has a conditional associated. For the points inside the rectangular brush, you will see the colored circles of the scatterplot; for the points outside, you will see grey circles.

Defining `brush` as `alt.selection_interval()`

In [4]:
brush = alt.selection_interval()

Adding `brush1` to a simple scatter plot.

In [5]:
alt.Chart(df).mark_circle(size=80).encode(
    x='horsepower:Q',
    y='highway_mpg:Q',
    color=alt.condition(brush, 'body_style:N', alt.value('grey')),
).add_selection(brush)

### Brush: encoding x

Each selection type has attributes for customizing its behavior. We could connect our brush to the "x" encoding. Let's see it!

Defining `brush_x`

In [6]:
brush_x = alt.selection_interval(encodings=['x'])

Adding `brush_x` to the same scatter plot. 

Select and interval and notice the difference in the selection.

In [7]:
alt.Chart(df).mark_circle(size=80).encode(
    x='horsepower:Q',
    y='highway_mpg:Q',
    color=alt.condition(brush_x, 'body_style:N', alt.value('grey')),
).add_selection(brush_x)

### Brush: encoding y

We can do the same thing, now using the y values.

In [8]:
brush_y = alt.selection_interval(encodings=['y'])

In [9]:
alt.Chart(df).mark_circle(size=80).encode(
    x='horsepower:Q',
    y='highway_mpg:Q',
    color=alt.condition(brush_y, 'body_style:N', alt.value('grey')),
).add_selection(brush_y)

## Scatterplot - Bars

Now we can concatenate a bar chart to the scatterplot and add a brush selection tool such that the bar plot reflects the content of the selection.

In [10]:
# We already has a brush defined
brush = alt.selection_interval()

Applying the `brush` to the scatterplot

In [11]:
points = alt.Chart(df).mark_circle(size=80).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    color=alt.condition(brush, 'body_style:N', alt.value('grey'))     
).properties(width=350, height=300,
    title={"text":"Horsepower vs. Highway (mpg)",
           "fontSize":16}
).add_selection(brush)

Creating a bar chart and applying `transform_filter(brush)` to it.

In [12]:
bars = alt.Chart(df).mark_bar().encode(
    alt.X('make:N', axis=alt.Axis(title='Make')),
    alt.Y('count()', axis=alt.Axis(title='Count')),
    alt.Color('body_style')
).properties(
    width=350,
    height=300,
    title={"text":"Count of body style by Make",
           "fontSize":16}
).transform_filter(brush)

Visualizing both graphs

Brush the scatterplot, and the bar chart will update accordingly.

In [13]:
# Horizontal concatenation
points | bars

In [14]:
# Vertical concatenation
points & bars

## Scatterplot - Scatterplot

We want to tie two scatterplots by using `alt.condition(predicate, if_true, if_false)`

Let's start creating `base_scatter` with:
- `price` as the y-axis 
- `body_style` as the color. 
- brush

Notice we have not yet defined the x-axis.

In [15]:
base_scatter= alt.Chart(df).mark_circle(size=80).encode(
    alt.Y('price'),
    color=alt.condition(brush, 'body_style:N', alt.value("gray"))
).properties(
    width=250,
    height=280
).add_selection(brush)

Let's define highway and city using base_scatter and adding highway_mpg and city_mpg, respectively, as the x-axis.

In [16]:
highway = base_scatter.encode(
    alt.X('highway_mpg',scale=alt.Scale(domain=[10,55]))
).properties(
        title={"text":"Car Price vs. Highway (mpg)",
               "fontSize":16}
) 

In [17]:
city = base_scatter.encode(
    alt.X('city_mpg',scale=alt.Scale(domain=[10,55]))
).properties(
        title={"text":"Car Price vs. City (mpg)",
               "fontSize":16}
) 

Select points in any scatter plot, and the other will update accordingly.

In [18]:
highway | city

### Adding a third scatterplot

In [19]:
horsepower = base_scatter.encode(
    alt.X('horsepower')
    ).properties(
        title={"text":"Car Price vs. Horsepower",
               "fontSize":16}
    )

Select points in any scatterplot and the others will update accordingly.

In [20]:
# Select points in any scatter plot, and the others will update accordingly.
highway | city | horsepower

## Interactive Crossfilter

Define the base_hist chart, with the common parts.

In [21]:
base_hist = alt.Chart().mark_bar().encode(
    x=alt.X(alt.repeat('column'), type='quantitative', bin=alt.Bin(maxbins=20)),
    y='count()'
).properties(
    width=250,
    height=250
) 

Gray background with selection

In [22]:
# gray background with selection
background = base_hist.encode(
    color=alt.value('lightgrey')
).add_selection(brush_x)

Blue highlights on the transformed data

In [23]:
# blue highlights on the transformed data
highlight = base_hist.transform_filter(brush_x)

Layer the charts & repeat

In [24]:
# layer the charts & repeat
alt.layer(
    background,
    highlight,
    data=df
).repeat(column=["length", "width", "height"])

## Sunspots Dataset

Sunspots are impermanent phenomena on the Sun's photosphere that appear darker than the surrounding areas. Sunspots occur in pairs of opposite magnetic polarities. Their number varies according to the **`11-year`** solar cycle.


Source: https://en.wikipedia.org/wiki/Sunspot

Database from SIDC - Solar Influences Data Analysis Center - the solar physics research department of the Royal Observatory of Belgium. SIDC website

In [25]:
dfs = pd.read_csv('Sunspots.csv', index_col=0)
dfs.head()

Unnamed: 0,Date,Monthly Mean Total Sunspot Number
0,1749-01-31,96.7
1,1749-02-28,104.3
2,1749-03-31,116.7
3,1749-04-30,92.8
4,1749-05-31,141.7


In [26]:
dfs.rename(columns = {'Monthly Mean Total Sunspot Number':'Sunspots'}, inplace=True)
dfs.head()

Unnamed: 0,Date,Sunspots
0,1749-01-31,96.7
1,1749-02-28,104.3
2,1749-03-31,116.7
3,1749-04-30,92.8
4,1749-05-31,141.7


In [27]:
dfs.Date = pd.to_datetime(dfs.Date)

In [28]:
dfs['Year'] = dfs.Date.dt.strftime('%Y').astype('int')
dfs['Month'] = dfs.Date.dt.strftime('%m').astype('int')
dfs['Day'] = dfs.Date.dt.strftime('%d').astype('int')
dfs.head()

Unnamed: 0,Date,Sunspots,Year,Month,Day
0,1749-01-31,96.7,1749,1,31
1,1749-02-28,104.3,1749,2,28
2,1749-03-31,116.7,1749,3,31
3,1749-04-30,92.8,1749,4,30
4,1749-05-31,141.7,1749,5,31


In [29]:
dfs.dtypes

Date        datetime64[ns]
Sunspots           float64
Year                 int32
Month                int32
Day                  int32
dtype: object

## Interactive Mean

The plot below uses an interval selection with an interactive brush for calculating the mean (red line).

In [30]:
bars2 = alt.Chart().mark_bar(color='skyblue').encode(
    alt.X('Date:T'),
    alt.Y('mean(Sunspots):Q'),
    opacity=alt.condition(brush_x, alt.OpacityValue(1), alt.OpacityValue(0.5)),
).properties(
    width=1000,
    height=400,
    title={"text":"Sunspots by Year"},
).add_selection(brush_x)

In [31]:
line_mean = alt.Chart().mark_rule(color='brown').encode(
    y='mean(Sunspots):Q',
    size=alt.SizeValue(3)
).transform_filter(brush_x)

In [32]:
alt.layer(bars2, line_mean, data=dfs[dfs.Year > 1980])

### Adding min and max lines

In [33]:
line_min = alt.Chart().mark_rule(color='black').encode(
    y='min(Sunspots):Q',
    size=alt.SizeValue(1)
).transform_filter(brush_x)

In [34]:
line_max = alt.Chart().mark_rule(color='black').encode(
    y='max(Sunspots):Q',
    size=alt.SizeValue(1)
).transform_filter(brush_x)

In [35]:
alt.layer(bars2, line_min, line_mean, line_max, data=dfs[dfs.Year > 1980])

## References

- https://altair-viz.github.io/user_guide/interactions
- https://vega.github.io/vega-lite/docs/aggregate.html#ops