# Introduction to Altair

Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', 10)
import altair as alt

## Automobile Dataset

We will use the Automobile Data Set [https://archive.ics.uci.edu/ml/datasets/automobile] from the UCI Machine Learning Repository [https://archive-beta.ics.uci.edu/]. It includes categorical and continuous variables. 

Defining the headers

In [2]:
# Defining the headers
headers = ["symboling", "normalized_losses", "make", "fuel_type", "aspiration", "num_doors", "body_style", 
        "drive_wheels", "engine_location", "wheel_base", "length", "width", "height", "curb_weight", 
        "engine_type", "num_cylinders", "engine_size", "fuel_system", "bore", "stroke", "compression_ratio", 
        "horsepower", "peak_rpm", "city_mpg", "highway_mpg", "price"]

In [3]:
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data",
                  header=None, names=headers, na_values="?" )
df.head()

Unnamed: 0,symboling,normalized_losses,make,fuel_type,aspiration,...,horsepower,peak_rpm,city_mpg,highway_mpg,price
0,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,13495.0
1,3,,alfa-romero,gas,std,...,111.0,5000.0,21,27,16500.0
2,1,,alfa-romero,gas,std,...,154.0,5000.0,19,26,16500.0
3,2,164.0,audi,gas,std,...,102.0,5500.0,24,30,13950.0
4,2,164.0,audi,gas,std,...,115.0,5500.0,18,22,17450.0


Altair allows the variable name specification, the aggregate and type within a simple short-hand string syntax. The table shows examples of the short-hand and long-hand specifications.

Short-hand | Long-hand 
---------|-----
x='varname' | alt.X('varname')
x='varname:Q' | alt.X('varname', type='quantitative')
x='sum(varname)' | alt.X('varname', aggregate='sum')
x='sum(varname):Q' | alt.X('varname', aggregate='sum', type='quantitative')
x='count():Q' | alt.X(aggregate='count', type='quantitative')

## Scaterplots

A scatterplot using short-hand syntax

In [4]:
# A scatterplot using short-hand syntax
alt.Chart(df).mark_point().encode(
    x='horsepower',
    y='highway_mpg',
)

A scatterplot using long-hand syntax

In [5]:
# A scatterplot using long-hand syntax
alt.Chart(df).mark_point().encode(
    alt.X('horsepower'),
    alt.Y('highway_mpg'),
)

Both syntaxes produce the same result. Notice that we are not specifying the type of variables; therefore, Altair assumes both are quantitative. However, we can do it.

A scatterplot using short-hand syntax and specifying the type of variables

In [6]:
# A scatterplot using short-hand syntax and specifying the type of variables
alt.Chart(df).mark_point().encode(
    x='horsepower:Q',
    y='highway_mpg:Q',
)

A scatterplot with circles

In [7]:
# A scatterplot with circles
alt.Chart(df).mark_circle().encode(
    x='horsepower:Q',
    y='highway_mpg:Q',
)

A scatterplot with squares

In [8]:
# A scatterplot with squares
alt.Chart(df).mark_square().encode(
    x='horsepower:Q',
    y='highway_mpg:Q',
)

In [9]:
print('Min horsepower: ', df['horsepower'].min())
print('Min highway_mpg:', df['highway_mpg'].min())

Min horsepower:  48.0
Min highway_mpg: 16


As you can see, there is no car with horsepower lesser than 48 and with highway_mpg lesser than 16. Our graph has an empty space in the inferior left corner. To avoid this, we can change the scale.

In [10]:
# Changing the scale of the axes
alt.Chart(df).mark_point().encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False))
)    

Let's analyze the variable `num_cylinders` and introduce the `Size` property.

In [11]:
# Introducing Size feature
alt.Chart(df).mark_point().encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
)

As you can see in the legend, the values of the variable `num_cylinders` are not in order. Let's change the labels for their corresponding numbers.

In [12]:
df.num_cylinders.unique()

array(['four', 'six', 'five', 'three', 'twelve', 'two', 'eight'],
      dtype=object)

Creating a dictionary of the number of cylinders

In [13]:
# creating a dictionary of the number of cylinders
num_cyl = {
    'four'  : 4,
    'six'   : 6,
    'five'  : 5,
    'eight' : 8,
    'two'   : 2,
    'three' : 3,
    'twelve':12,
}

Replacing the labels by the numbers using the dict `num_cyl`

In [14]:
df['num_cylinders'] = df['num_cylinders'].map(num_cyl)

In [15]:
df.num_cylinders.unique()

array([ 4,  6,  5,  3, 12,  2,  8], dtype=int64)

Introducing the `Size` feature

In [16]:
# Introducing the Size feature
alt.Chart(df).mark_point().encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
)

The graph is more understandable now. The bigger the circle size, the bigger the number of cylinders a car has.

Filling the circle pointers

In [17]:
# Filling the circle pointers
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
)

Altair allows us to introduce a Color feature. We can use Color with the same variable: `num_cylinders` or with a different one: `fuel_type`. 

Coloring by `num_cylinders`

In [18]:
# Coloring by num_cylinders
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('num_cylinders'),
)

Coloring by `fuel_type`

In [19]:
# Coloring by fuel_type
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
)

Adding `Shape` feature

In [20]:
# Adding Shape feature
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
    alt.Shape('engine_location')
)

Adding some transparency

In [21]:
# Adding some transparency
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
    alt.Shape('engine_location'),
    alt.OpacityValue(0.5)
)

Let's add a hover tool to get helpful information when passing the mouse cursor over a point in the graph.

Adding tooltip for moving around and seeing some data properties

In [22]:
# Adding tooltip for moving around and seeing some data properties
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
    alt.Shape('engine_location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price'),
        alt.Tooltip('make')
        ]       
)

The graph is still crowded in some locations. We can add interactivity to zoom in and get a better view.

Adding a simple `interactive()` functionality

In [23]:
# Adding a simple interactive() functionality
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
    alt.Shape('engine_location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price'),
        alt.Tooltip('make')
        ]       
).interactive()

The code `.interactive()` lets you zoom in to a particular area. It fits the axis values automatically.

 Changing `width` and `height` of the plot

In [24]:
# Changing width and height of the plot
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
    alt.Shape('engine_location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price'),
        alt.Tooltip('make')
        ]       
).properties(
    width=500, 
    height=350
)

Adding a title

In [25]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
    alt.Shape('engine_location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price'),
        alt.Tooltip('make')
        ]       
).properties(width=500, height=350,
    title={"text":"Horsepower vs Highway MPG"}
)

Modifying the title style

In [26]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False)),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False)),
    alt.Size('num_cylinders'),
    alt.Color('fuel_type'),
    alt.Shape('engine_location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price'),
        alt.Tooltip('make')
        ]       
).properties(width=500, height=350,
    title={"text":"Horsepower vs Highway MPG", 
           "fontSize":20,
           "fontWeight":"bold",
           "color":"grey"}
)

Modifying the axis and legend titles

In [27]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False), title='Horsepower'),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False), title='Highway (mpg)'),
    alt.Size('num_cylinders', title='Number of cylinders'),
    alt.Color('fuel_type', title='Fuel type'),
    alt.Shape('engine_location', title='Engine location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price'),
        alt.Tooltip('make')
        ]       
).properties(width=500, height=350,
    title={"text":"Horsepower vs Highway MPG", 
           "fontSize":20,
           "fontWeight":"bold",
           "color":"grey"}
)

Modifying the hover style

In [28]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False), title='Horsepower'),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False), title='Highway (mpg)'),
    alt.Size('num_cylinders', title='Number of cylinders'),
    alt.Color('fuel_type', title='Fuel type'),
    alt.Shape('engine_location', title='Engine location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price', title='Price'),
        alt.Tooltip('make', title='Make')
        ]       
).properties(width=500, height=350,
    title={"text":"Horsepower vs Highway MPG", 
           "fontSize":20,
           "fontWeight":"bold",
           "color":"grey"}
)

Increasing the axes titles sizes

In [29]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False), title='Horsepower'),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False), title='Highway (mpg)'),
    alt.Size('num_cylinders', title='Number of cylinders'),
    alt.Color('fuel_type', title='Fuel type'),
    alt.Shape('engine_location', title='Engine location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price', title='Price'),
        alt.Tooltip('make', title='Make')
        ]       
).properties(width=500, height=350,
    title={"text":"Horsepower vs Highway MPG", 
           "fontSize":20,
           "fontWeight":"bold",
           "color":"grey"}
).configure_axis(
    titleFontSize=16, 
    titleColor="grey",
    labelFontSize=12
    )

Removing the grid lines

In [30]:
alt.Chart(df).mark_point(filled=True).encode(
    alt.X('horsepower',scale=alt.Scale(zero=False), title='Horsepower'),
    alt.Y('highway_mpg',scale=alt.Scale(zero=False), title='Highway (mpg)'),
    alt.Size('num_cylinders', title='Number of cylinders'),
    alt.Color('fuel_type', title='Fuel type'),
    alt.Shape('engine_location', title='Engine location'),
    alt.OpacityValue(0.5),
    tooltip = [
        alt.Tooltip('price', title='Price'),
        alt.Tooltip('make', title='Make')
        ]       
).properties(width=500, height=350,
    title={"text":"Horsepower vs Highway MPG", 
           "fontSize":20,
           "fontWeight":"bold",
           "color":"grey"}
).configure_axis(
    titleFontSize=16, 
    titleColor="grey",
    labelFontSize=12,
    grid=False
    )

## References

- https://altair-viz.github.io/