# 10.3 The Grammar of Graphics

The grammar of graphics makes it easy to specify information-dense graphics. The key idea is to map "aesthetics" of a plot (e.g., color, size, $x$-axis, $y$-axis) to variables in the data set. Libraries based on the grammar of graphics include `ggplot2` in `R` and [Altair](https://altair-viz.github.io/) in Python. In this section, you will learn to use Altair to quickly make a complex graphic.

In [None]:
from altair import *
import pandas as pd

housing = pd.read_csv("/data301/data/AmesHousing.txt", sep="\t")

## Altair Basics

Let's make a scatterplot that shows the relationship between house price and square footage, where each point is colored according to its building type.

Every Altair command starts with `Chart(your_data_frame)`. Then, you have to specify two elements of the graphic:
- the mark (a.k.a. geometric object)
- the encoding channels (a.k.a. aesthetic mappings)

For a scatterplot, the "mark" is a circle.

In [None]:
Chart(housing).mark_circle().encode(
    x="Gr Liv Area",
    y="SalePrice",
    color="Bldg Type"
)

To display this information in "small multiples" format (i.e., a series of side-by-side plots), we can map a variable to the `row` or `column` aesthetic.

In [None]:
Chart(housing).mark_circle().encode(
    x="Gr Liv Area",
    y="SalePrice",
    column="Bldg Type"
)

## Customizing Plots

In the plots above, we mapped variables to aesthetics by simply specifying the column names. Although this is convenient, it does not allow for further customization of the aesthetics. To customize an aesthetic, we have to use the verbose method of specifying the aesthetic.

Each aesthetic has an associated Python class. The name of the class is usually just the name of the aesthetic, but capitalized. For example, the `x` aesthetic is associated with the `X` class, and the `color` aesthetic is associated with the `Color` class. The constructor for each class takes as arguments the name of a variable, along with any relevant customizations.

For example, suppose we want to change the $x$-axis limits to go from 0 to 4000, and we want the tick labels on the $y$-axis to print 4e+5 instead of 400,000. Here's how to do this in Altair.

In [None]:
Chart(housing).mark_circle().encode(
    x=X("Gr Liv Area", scale=Scale(domain=(0, 4000))),
    y=Y("SalePrice", axis=Axis(format="e")),
    column="Bldg Type"
)

Notice the use of [D3 format strings](https://github.com/d3/d3-format/blob/master/README.md#formatPrefix) to specify the axis format.

# Exercises

**Exercise 1.** Use Altair to make a graphic that shows the relationship between square footage and living area---using color to represent the lot area and using row and column facets to represent the building type and roof style. How does Altair handle color for a quantitative variable, like lot area?

In [None]:
# ENTER YOUR CODE HERE.

**Exercise 2.** Use Altair to make a graphic that communicates the information in the Tips data set (`/data301/data/tips.csv`)

In [None]:
# ENTER YOUR CODE HERE.