# Altair: Vega-Lite in Python

In this notebook we will learn about Vega-Lite in Python

There are multiple Vega and Vega-Lite wrappers in Python

The one we will learn about (which is also the most popular) is called `altair`

The purpose of this notebook is to introduce the core concepts of Altair

Further exploration and experimentation will be left as an exercise

> Note: we borrow heavily from the [official documentation](https://altair-viz.github.io/getting_started/overview.html) in this notebook. We strongly encourage you to review the documentation yourself for more examples and details on how to utilize altair in your workflow

In [1]:
# uncomment the line below and evaluate this cell to install altair
%pip install --user altair vega_datasets


Note: you may need to restart the kernel to use updated packages.


## Overview

Core concepts:

- `alt.Chart`: The *container* for your chart specification. Typically all charts start with `alt.Chart(df: DataFrame)`
- `marks`: A *type* of visual element -- perhaps a line, circle, star, bar, etc.
- `encodings`: A *mapping* between the columns in your dataset and "visual encoding channels" -- perhaps x, y, color, etc.

### Example: cars scatter

In [2]:
import altair as alt

# load a simple dataset as a pandas DataFrame
from vega_datasets import data
cars = data.cars()
cars.head()

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


In [3]:
c_cars = alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
).interactive()
c_cars

## Marks and Encodings

Let's dive into more detail about marks and encodings

We'll use the following dataset as an example

In [4]:
import pandas as pd
data = pd.DataFrame(
    {
        "a": list("CCCDDDEEE"),
        "b": [2, 7, 4, 1, 2, 6, 8, 4, 7],
        "c": [1, 2, 3]*3,
    }
)

data

Unnamed: 0,a,b,c
0,C,2,1
1,C,7,2
2,C,4,3
3,D,1,1
4,D,2,2
5,D,6,3
6,E,8,1
7,E,4,2
8,E,7,3


### Marks

The mark property is how altair tells Vega what *type* of element to draw

These are set on the `Chart` object using a method named `.mark_TYPE` where `TYPE` is the type of the mark:

In [5]:
alt.Chart(data).mark_point()

# it tells vega what type of element to draw
# point type

In [6]:
alt.Chart(data).mark_rect()

In [7]:
c2 = alt.Chart(data).mark_circle()
print(type(c2))
c2

<class 'altair.vegalite.v4.api.Chart'>


In these examples there is actually one mark per row in the dataset (9 marks)

However, all the marks are plotted on top of one another because we haven't specified where they should be plotted

To fix this we need to *encode* variables (columns) of our dataset as visual channels

To do this we use the `Chart.encode` method (notice that `Chart.mark_TYPE` returns `Chart`, so we can chain the `.encode` method call)

Below we'll instruct altair to map the column named `a` to the `x` channel, which controls the horizontal or x position of the mark

In [8]:
alt.Chart(data).mark_point().encode(x="a")

# now we will instruct altair to map the column a to the x channel

In this chart we can see three distinct marks

There are actually three points at each of `C`, `D`, and `E`

To see all 9 points we also need to encode the `y` channel:

In [9]:
c2 = (
    alt.Chart(data)
    .mark_point()
    .encode(
        x="a",
        y="b"
    )
)
c2

# .encode dentes the family of the mapping
# we can use different sizes and colors too

We can now map the `c` column to another channel...

> Note: `.encode` also returns a `Chart` so we can call `.encode` again to add more mappings

In [10]:
c2.encode(color="c")

In [11]:
c2.encode(size="c")

### Aggregations

When specifying the encoding for the chart, we mapped keyword arguments (like `x` and `y`) into strings

Above we used strings that mapped into column names

Altair has a mini-language for expressing other types of operations in the strings

We'll demonstrate this via examples

#### Example: Plotting mean over `a`

**Want**: Plot the mean of the values in column `b`, for each value in column `a`

Being pandas experts, we might first think to do a groupby then plot:

In [12]:
# given the want, we could plot the mean of values in column b for each value in column a

(
    alt.Chart(data.groupby("a").mean().reset_index())
    .mark_point()
    .encode(x="a", y="b")
)

This certainly works, but we can actually let altair do the aggregation for us:

In [13]:
# however, altair can help us to get a similar result
# altair change the name og the column to average(b)

c3 = (
    alt.Chart(data)
    .mark_point()
    .encode(x="a", y="average(b)")
)

c3

There are a few benefits to doing things this way:

1. The y-axis label was set to "Average of b" instead of just "b"
2. We can leverage further Altair operations that might not be as straightforward with raw pandas then altair
3. The aggregations or transformations happen in a context that is aware of the rest of the chart, allowing for other optimizations or conveniences (similar to setting y-axis title)

Our chart above did what we said we wanted, but looks a bit odd...

Usually an aggregation like an average is represented via bars instead of points

To make a bar chart instead we need to use the `rect` mark:

In [14]:
c3.mark_bar()
# 1. The y-axis label was set to "Average of b" instead of just "b"
# 2. We can leverage further Altair operations that might not be as straightforward with raw pandas then altair
# 3. The aggregations or transformations happen in a context that is aware of the rest of the chart, allowing for other optimizations or conveniences (similar to setting y-axis title)

# This could lead to more concise and readible work

Another tweak we might make to this chart would be to make a horizontal bar chart

To do this we need only swap the map for the `x` and `y` channels:

In [15]:
# we just swap x and y

c4 = (
    alt.Chart(data)
    .mark_bar()
    .encode(y="a", x="average(b)")
)

c4

### Viewing Chart JSON

The main purpose of the altair library is to make it convenient for Python users to create Vega-Lite compliant JSON specifications from pandas DataFrames

Altair can report back the JSON that it generated using the `to_json` method

In [16]:
print(c4.to_json())

# main purpose of altair: The main purpose of the altair library is to make it convenient for Python users to create Vega-Lite compliant JSON specifications from pandas DataFrames

# using json coulf help debugging some difficulties

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-ce6f61104972b1d87c22003b1869b8dd"
  },
  "datasets": {
    "data-ce6f61104972b1d87c22003b1869b8dd": [
      {
        "a": "C",
        "b": 2,
        "c": 1
      },
      {
        "a": "C",
        "b": 7,
        "c": 2
      },
      {
        "a": "C",
        "b": 4,
        "c": 3
      },
      {
        "a": "D",
        "b": 1,
        "c": 1
      },
      {
        "a": "D",
        "b": 2,
        "c": 2
      },
      {
        "a": "D",
        "b": 6,
        "c": 3
      },
      {
        "a": "E",
        "b": 8,
        "c": 1
      },
      {
        "a": "E",
        "b": 4,
        "c": 2
      },
      {
        "a": "E",
        "b": 7,
        "c": 3
      }
    ]
  },
  "encoding": {
    "x": {
      "aggregate": "average",
      "field": "b",
      "type": "qua

Viewing the chart JSON can be a useful debugging tool when trying to learn from the Altair or Vega-Lite documentation

## Data in Altair

Let's take a closer look at the `encoding` section of the Vega-Lite JSON for `c4` from above:

In [17]:
print(c4.encoding.to_json())

# nominal ex: good, better and best
# temporal: any unit of time
# geojson: for when we have geographical notation


{
  "x": {
    "aggregate": "average",
    "field": "b",
    "type": "quantitative"
  },
  "y": {
    "field": "a",
    "type": "nominal"
  }
}


Notice that both "x" and "y" have a `type` field

Vega-Lite requires that all encoding channels have a type

Altair took care of these for us based on the dtype of the DataFrame column

There are 5 core types of encoding, summarized in the table below:

<table class="docutils" border="1">
<colgroup>
<col width="16%">
<col width="19%">
<col width="65%">
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Data Type</th>
<th class="head">Shorthand Code</th>
<th class="head">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>quantitative</td>
<td><code class="docutils literal"><span class="pre">Q</span></code></td>
<td>a continuous real-valued quantity</td>
</tr>
<tr class="row-odd"><td>ordinal</td>
<td><code class="docutils literal"><span class="pre">O</span></code></td>
<td>a discrete ordered quantity</td>
</tr>
<tr class="row-even"><td>nominal</td>
<td><code class="docutils literal"><span class="pre">N</span></code></td>
<td>a discrete unordered category</td>
</tr>
<tr class="row-odd"><td>temporal</td>
<td><code class="docutils literal"><span class="pre">T</span></code></td>
<td>a time or date value</td>
</tr>
<tr class="row-even"><td>geojson</td>
<td><code class="docutils literal"><span class="pre">G</span></code></td>
<td>a geographic shape</td>
</tr>
</tbody>
</table>

Using the shorthand code we can give altair a hint about the type of our columns

We'll see that the type has significant implications for some encoding channels...

In [18]:
c2.encode(color="c:Q").mark_bar()

# q refers to quantitative data on a continuous scale
# we could have chosen ordinal as well. The c column is a discrete value


In [19]:
c2.encode(color="c:O").mark_bar()

In [20]:
c2.encode(color="c:N").mark_bar()

# nominal variables do not have any order, so theere is no natural prograssion of shapes

The shorthand for specifying the type of an encoding also works when using an aggregation:

In [21]:
(
    alt.Chart(data)
    .mark_bar()
    .encode(x="a", y="average(b):Q")

    # the command also works with aggregation and transformation: y="average(b):Q"
)

In [22]:
(
    alt.Chart(data)
    .mark_bar()
    .encode(x="a", y="average(b):N")
)

In [23]:
(
    alt.Chart(data)
    .mark_bar()
    .encode(x="a", y="average(b):O")
)

Sometimes the `keyword=STRING` shorthand isn't flexible enough for a particular application

Altair also lets you construct the encoding channels using `alt.CHANNEL` types

These types are passed as unordered positional arguments before any keyword arguments


In [24]:
(
    alt.Chart(data)
    .mark_bar()
    .encode(
        alt.X("a"), # x = "a"  
        alt.Y("b", aggregate="average", type="quantitative"),  # y="average(b):Q"
#         alt.Color("c"), --> this is ghe positional argumant, 
#         the following is a keyword argument though
        color="c",
    )
)

#

### Data from files

In addition to setting data for our charts by passing in a DataFrame, we could also pass a url to a remote dataset

> Note when not using a DataFrame, we **must** specify column types

In [25]:
# we can use the url directly with altair function

url_cars = "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/cars.json"

c_cars_url = alt.Chart(url_cars).mark_point().encode(
    x='Horsepower:Q',
    y='Miles_per_Gallon:Q',
    color='Origin:N',
).interactive()

c_cars_url

In [26]:
url_aapl = "https://raw.githubusercontent.com/plotly/datasets/master/2014_apple_stock.csv"

c_aapl = (
    alt.Chart(url_aapl)
    .mark_line() # we want a line
    .encode(x="AAPL_x:T", y="AAPL_y:Q") # we want the information from apple and treat as temporal, while Q is quantitative
).interactive()

c_aapl

This is not much easier than generating a DataFrame using `pd.read_csv(url_aapl)` and then passing the DataFrame to Altair

So, why would we do it?

The benefit here is that the JSON spec for the chart can actually contain a URL which will be handled by the Vega-Lite runtime when rendering the chart

With a DataFrame, all the data is written out/hard-coded into the JSON spec before Vega-Lite sees it

In [27]:
# altair will take the url to vega lite
# the dataframe will be encoded in JSON form

print(len(c_cars.to_json()))  # not going to print the whole thing... too long

120649


In [28]:
print(len(c_cars_url.to_json()))

686


In [29]:
print(c_cars_url.to_json())

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/cars.json"
  },
  "encoding": {
    "color": {
      "field": "Origin",
      "type": "nominal"
    },
    "x": {
      "field": "Horsepower",
      "type": "quantitative"
    },
    "y": {
      "field": "Miles_per_Gallon",
      "type": "quantitative"
    }
  },
  "mark": "point",
  "selection": {
    "selector002": {
      "bind": "scales",
      "encodings": [
        "x",
        "y"
      ],
      "type": "interval"
    }
  }
}


In [30]:
print(c_cars.to_json())

# why i cannot see the whole JSON data ads in the video

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json",
  "config": {
    "view": {
      "continuousHeight": 300,
      "continuousWidth": 400
    }
  },
  "data": {
    "name": "data-f02450ab61490a1363517a0190416235"
  },
  "datasets": {
    "data-f02450ab61490a1363517a0190416235": [
      {
        "Acceleration": 12.0,
        "Cylinders": 8,
        "Displacement": 307.0,
        "Horsepower": 130.0,
        "Miles_per_Gallon": 18.0,
        "Name": "chevrolet chevelle malibu",
        "Origin": "USA",
        "Weight_in_lbs": 3504,
        "Year": "1970-01-01T00:00:00"
      },
      {
        "Acceleration": 11.5,
        "Cylinders": 8,
        "Displacement": 350.0,
        "Horsepower": 165.0,
        "Miles_per_Gallon": 15.0,
        "Name": "buick skylark 320",
        "Origin": "USA",
        "Weight_in_lbs": 3693,
        "Year": "1970-01-01T00:00:00"
      },
      {
        "Acceleration": 11.0,
        "Cylinders": 8,
        "Displacement": 318.0,
      

The smaller spec size makes a Vega-Lite chart more suitable for sharing, loading into websites, or tracking with version control systems

### Other features

There are many other features we didn't cover:

- Chart Types: maps, candlesticks, compound chart types (multiple marks), heatmaps, area chart, scatter charts, etc...
- Compound Charts: multiple subplots in one figure
- Interactivity: linked brushing
- Customization: colors, labels

The best way to learn these concepts is by practice and study of the documentation

We'll provide opportunity for both on the upcoming homework

### Saving to webpage

The last thing we will show is how straightforward it is to include an Altair chart on a webpage

The `Chart` type has a `to_html` method that will generate an html document

This can be used directly as a standalone webpage, or parts can be copied and pasted into an existing page



In [31]:
print(c_aapl.to_html())

# this is a whole webpage that we generated from html
# now that we created the html we can download it
# it is a chart on my computer

<!DOCTYPE html>
<html>
<head>
  <style>
    .error {
        color: red;
    }
  </style>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@4.8.1"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@6"></script>
</head>
<body>
  <div id="vis"></div>
  <script>
    (function(vegaEmbed) {
      var spec = {"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}}, "data": {"url": "https://raw.githubusercontent.com/plotly/datasets/master/2014_apple_stock.csv"}, "mark": "line", "encoding": {"x": {"type": "temporal", "field": "AAPL_x"}, "y": {"type": "quantitative", "field": "AAPL_y"}}, "selection": {"selector003": {"type": "interval", "bind": "scales", "encodings": ["x", "y"]}}, "$schema": "https://vega.github.io/schema/vega-lite/v4.8.1.json"};
      var embedOpt = {"mode": "vega-lite"};

      function showError(el, error){
    

In [32]:
with open("aapl_altair_chart.html", "w") as f:
    f.write(c_aapl.to_html())

    # use context manageer (with) to open a file
    # first save aapl_altair_chart.html to a file in a writable mode
    # download the chart and see how it is readable since we used the httml form