<a href="https://colab.research.google.com/github/APWright/CSC477-Fall2025/blob/main/In-Class/Intro_Altair.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Visualization with Altair

The previous lectures have talked about data-wrangling, marks, and channels.

  - Data are the abstractions we choose to represent visually to allow users to perform some analytic, exploratory, or explanatory task.
  - Marks are the shapes or "glyphs" drawn in the visualization.
  - Channels are the visually perceptive properties of marks, like their horizontal or vertical positions, color, size, etc.

With a basic framework of data, marks, and channels, we can design fairly sophisticated visualizations.

The tool we will use in this class, Altiar (which is the python binding for Vega-Lite), makes use of this framework to provide a simple declarative grammar in which we can specify our visualizations.

In [None]:
import altair as alt
from vega_datasets import data
import polars as pl

## Dataset

We'll use the cars dataset from the vega-datasets collection. It has a good mix of data types to help showcase some of Vega-Lite's core capabilities. Importantly it is a *tidy dataset*. Altair will only work properly with, and assumes, tidy data.

In [None]:
cars = pl.DataFrame(data.cars())
cars

## Our first chart

To begin with, let's try to make a simple scatterplot that looks like this. This is just a screenshot of our target visualization.

![example scatterplot](https://static.observableusercontent.com/files/f4a132c88d02a4d753acdf11b8a9f8ad3e320b6b9bfab64c8c7417e2a4bae210cb14c461269e0e63c9517a3abfcb00e4901210b6549b03f79950ba58fae42ae9)


First, stop and ask:

  - What mark is being drawn?
  - What data is being visualized?
  - Which visual channels are being used to communicate those data?


In [None]:
(
    alt.Chart(cars)
      .mark_circle()
      .encode(
          x='Miles_per_Gallon',
          y='Horsepower'
      )
)

In [None]:
chart = alt.Chart(data.cars.url).mark_point().encode(
          x='Miles_per_Gallon:Q',
          y='Horsepower:Q'
      )

print(chart.to_json(indent=2))

**Can we make improvements over the initial chart?**

Right now it looks like there's a fair bit of overlap in the middle due to "overplotting". Lots of points are drawn on top of each other, obscuring the points underneath.

Increasing the size of the plot may help.

In [None]:
(
    alt.Chart(cars)
      .mark_circle()
      .encode(
          x='Miles_per_Gallon',
          y='Horsepower'
      )
      .properties(
          width=800,
          height=400
      )
)

The "Miles_per_Gallon" title for the x-axis is a little ugly. There's no reason why our table's column names need to appear in our charts. You can set axis options for the position encodings.

In [None]:
(
    alt.Chart(cars)
      .mark_circle()
      .encode(
          x=alt.X(
              'Miles_per_Gallon',
              axis={'title':'Miles per Gallon (mpg)'}),
          y='Horsepower'
      )
      .properties(
          width=800,
          height=400
      )
)

## Nominal fields

Our dataset contains cars from the following Origins.
|Origin	|count|
| -- | -- |
|USA|	254 |
|Europe|	73|
|Japan	|79|

Let's incorporate that information into the chart.

**Which visual channel do you think we should use?**

In [None]:
(
    alt.Chart(cars)
      .mark_circle()
      .encode(
          x=alt.X(
              'Miles_per_Gallon',
              axis={'title':'Miles per Gallon (mpg)'}),
          y='Horsepower',
          color='Origin'
      ).properties(
          width=800,
          height=400
      )
)

## Faceting


In [None]:
(
    alt.Chart(cars)
      .mark_circle()
      .encode(
          x=alt.X(
              'Miles_per_Gallon',
              axis={'title':'Miles per Gallon (mpg)'}),
          y='Horsepower',
          color=alt.Color('Origin',
                          legend=None),
          column='Origin'
      )
)

## Ordinal Fields
Is `Cylinders` quantitative or ordinal?

In [None]:
cars

In [None]:
alt.Chart(cars.group_by('Cylinders').len()).mark_bar().encode(
    x='Cylinders',
    y='len',
)

In [None]:
alt.Chart(cars.group_by('Cylinders').len()).mark_bar(size=40).encode(
    x='Cylinders',
    y='len',
)

In [None]:
alt.Chart(cars.group_by('Cylinders').len()).mark_bar().encode(
    x='Cylinders:O',
    y='len',
)

In [None]:
alt.Chart(
    cars.group_by('Cylinders').agg(pl.col('Acceleration').mean())
).mark_bar().encode(
    x='Cylinders:O',
    y='Acceleration',
)

In [None]:
alt.Chart(cars).mark_bar().encode(
    alt.X('Miles_per_Gallon').bin(True),
    alt.Y('Horsepower').bin(True),
    alt.Color(aggregate='count')
)

Explore more!

https://altair-viz.github.io/

https://colab.research.google.com/github/altair-viz/altair-tutorial/blob/master/notebooks/Index.ipynb

