# Information Visualization I 
## School of Information, University of Michigan

## Altair Review and Tutorial

In this tutorial, we're going to walk you through the construction of visualizations using Altair--a tool to encode data using a Grammar of Graphics implementation (one called Vega-Lite). [Altair](https://altair-viz.github.io/) is a Python API to [Vega-Lite](https://vega.github.io/vega-lite/). When we code in Altair we are ultimately producing a data structure that tells Vega-Lite what to draw. Whatever is doable in Vega-Lite should be doable in Altair so you can always look at examples in one and figure out how to generate them in the other ([Vega-Lite Examples](https://vega.github.io/vega-lite/examples/) and [Altair Examples](https://altair-viz.github.io/gallery/index.html)). 

As a side note, Vega-Lite is a lighter variant of [Vega](https://vega.github.io/). Altair and Vega-Lite are equally capable. Vega, on the other hand, is much more full-featured but there is no API in Python at the moment. 

Internally, we are describing the visualization in a JSON object--a kind of data structure that has lists and dictionaries. This will allow the rendering system (the thing that draws on the screen) to interpret your Grammar of Graphics specification. You can always ask Altair to produce the Vega-Lite JSON object (and you can experiment directly with this JSON object in the [Vega/Vega-Lite Online editor](https://vega.github.io/editor/#/custom/vega)).

In addition to Python, there are other language layers for Vega-Lite. The most popular of which is the [Javascript library](https://github.com/vega/vega-lite-api). It's very similar to the Python version, so it's fairly straightforward to translate. The reason we're going to prefer Altair over the Javascript implementation is largely due to the data analysis ecosystem we have in Python. Tools like Pandas make data manipulation, cleaning, and analysis much easier.

We're going to try to cover some highlights of using Altair here, but very much encourage you to make use of online resources. Specifically:

* A whole [online course](https://github.com/uwdata/visualization-curriculum) by the creators of Vega-Lite. In fact, they show you have to use both Altair and the Javascript API.
* The [Altair tutorial](https://github.com/altair-viz/altair-tutorial) from the creator of Altair, Jake Vanderplas

Both will be useful resources if you get stuck or need more examples.

This particular tutorial will not cover interactivity or really sophisticated filtering/transformation. For the most part, we will either use data that's already in a form we want or get it into shape using Pandas.

We will be using two datasets here, the standard "cars" dataset (which has miles per gallon, cost, country of origin, horsepower, etc.) and an ["indicators" dataset](http://www.bigcitieshealth.org/) that has various pieces of data for many cities in the US over time. Everything from demographics to the number of food poisonings. We've cleaned up the data a little bit for this exercise, but you can grab the full version from the link above.

In this tutorial we will often 'slice-and-dice' data to it's suitable for the examples we want to show. We will do this using [Pandas](https://pandas.pydata.org/pandas-docs/stable/index.html), a data manipulation library (which you should be familiar with by now). Each block of code contains comments that explain the Pandas operations being used.

There are a few places where we ask you to try to write some code. There's a button that will pop up the answers. We have noticed that it doesn't work with some security configurations. If the button doesn't seem to work you can also find a file with all the answers [here](assets/altair/altair_tutorial_answers.txt).

In [1]:
# load up the libraries we need
import pandas as pd
import altair as alt

# load up a helper to process the data and for tutorial guidance
exec(open("tutorial_helper.py").read())

In [2]:
# If you are running this in a jupyter notebook, run the following line
alt.renderers.enable('default')

# see https://altair-viz.github.io/user_guide/troubleshooting.html if 
# you start seeing error messages in showing charts. You need to install
# the right version for your notebook environment.

RendererRegistry.enable('default')

In [3]:
# uses intermediate json files to speed things up
alt.data_transformers.enable('json')

DataTransformerRegistry.enable('json')

## How we build a visualization

Before we start building a chart, it will be helpful to understand what the Altair library does. As you write code in Altair, the system translates your code to the JSON format that Vega-Lite understands (and can render).

For example, let's say we want to use color to encode the transmission type of the car (it's either "automatic" or "manual"-- 1 or 0). To do so, we use the "Color" method of the Altair library.

Run the code below to see what happens:

In [4]:
alt.Color('transmission',type='nominal')

Color({
  shorthand: 'transmission',
  type: 'nominal'
})

What you are seeing is the JSON corresponding to "snippet" of GoG Vega-Lite code. Remember, we need to describe: <font color='#55AAE7'>data</font>, <font color="#7F7C1C">encoding/channels</font>, and <font color="#AC491A">marks</font>. Here, we're describing the <font color='#55AAE7'>data</font> ("transmission" which is a "nominal" variable) and the <font color="#7F7C1C">encoding/channel</font> (color). In English: "encode the <font color='#55AAE7'>nominal transmission</font> variable using <font color="#7F7C1C">color</font>."

A couple of things to note:

First, although Vega-Lite and Altair will try to infer the type of the data (is transmission ordinal? nominal? time?), we can override the decision. In this case, our data file uses 0 and 1 to indicate manual or automatic. So we want to make sure that Altair understands that it's a nominal (category) and not ordinal or quantitative.

This is important: Altair/Vega-Lite will use defaults or its best guess unless you override it. Most often, you will be happy with those defaults and best guesses but there will be times when you want to force the system to do something else (use different colors, transform the data in a certain way, use different kinds of axes, etc.).

Second, notice we haven't actually loaded any data yet. Altair won't give you error messages until you try to render the chart.  Here's an example that is encoding the MPG (miles per gallon) using the X coordinate.

In [5]:
alt.X("MPG")

X({
  shorthand: 'MPG'
})

Ok, let's try to build an actual visualization. First, let's load some data:

In [6]:
mtcars = pd.read_csv("assets/altair/mtcars.csv")

# and let's see what's inside
mtcars.sample(5)

Unnamed: 0,model,MPG,cylinders,displacement,HP,rear_axle_ratio,weight,qsec,vs,transmission,gears,carb
28,Ford Pantera L,15.8,8,351.0,264,4.22,3.17,14.5,0,1,5,4
10,Merc 280C,17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
8,Merc 230,22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
31,Volvo 142E,21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2
3,Hornet 4 Drive,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1


Remember, to get a reasonable chart we need to describe what we want to use <font color='#55AAE7'>data</font>, <font color="#7F7C1C">encoding/channels</font>, and <font color="#AC491A">marks</font>.

To pick the kind of visualization we want, we need to make the right choice on the "mark." Do we want points (for a scatter plot)? lines (for a time series)? or bars (for a bar chart or heatmap)? See your choices [here](https://altair-viz.github.io/user_guide/marks.html).

Let's start easy. Let's make a scatterplot. We will tell Altair that we want a new Chart and that we will be using the mtcars dataframe. We will also, at minimum, need to pick the kind of mark. Scatterplots use points (a dot for every car in this case), so here's the code:

In [7]:
alt.Chart(mtcars).mark_point()

Not very interesting... what we see is something like this:
    
![](simplestscatter.png)

(If you try to click on the little icon you may not even be able to see anything because the cell will be cut off)

The chart is exactly what we asked for. There is one dot for every car in the dataset. Unfortunately, because we didn't say how to encode the data (or even what actual pieces of the data we wanted), Altair went with some defaults: all points were placed at coordinate 0,0 and rendered with the default style (a blue outlined circle).

If you're curious you can take a look at what JSON Altair generated for this call:

In [8]:
RenderJSON(alt.Chart(mtcars).mark_point().to_json())

Ok, let's go with something a bit more interesting. We'll build a scatterplot comparing miles per gallon to car weight, but let's just start with one variable, MPG, that we'll put on the x-axis.  Abstractly:

* <font color="#AC491A">Mark:    Point</font>
* <font color='#55AAE7'>Data:</font>
    * <font color='#55AAE7'>MPG:		quantitative</font>
* <font color="#7F7C1C">Encoding:</font>
    * <font color="#7F7C1C">MPG:		x position</font>

Or in English: "Draw a 1-D scatterplot with a point for every car where the miles per gallon of the car determine the placement of the point."

Or in code:

In [9]:
alt.Chart(mtcars).mark_point().encode(
    alt.X('MPG',type='quantitative')
)

To break this apart: We grabbed the Altair library (`alt`) and told it to create a chart using the mtcars data frame (`.Chart(mtcars)`). We then said, for this chart, use points as the mark style (`.mark_point`). We want to encode the points using the X location (`.encode(alt.X(`) as determined by the miles per gallon (`'MPG'`) variable which is a quantitative variable (`,type='quantitative'))`).

A side note: strictly speaking, we could have used `mark_circle` instead of `mark_point` if all we ever wanted were circle shapes. We're going to switch the shape later so we'll just keep using `mark_point`.

Notice that we get a nice visualization with an axis and axis label, and we didn't need to do much work to get it. In fact, we could even do less work. There's a shortcut to specify the variable type (the `:Q` at the end of the variable name):

In [10]:
alt.Chart(mtcars).mark_point().encode(
    alt.X('MPG:Q')
)

In fact, if we're happy with the default inference Altair makes about the data type (it's going to guess MPG is quantitative), we could do:

```Python
alt.Chart(mtcars).mark_point().encode(
    alt.X('MPG')
)```

Or even shorter (without the alt.X... stuff): 

```Python
alt.Chart(mtcars).mark_point().encode(
    x='MPG'
)```

Or some combination:

```Python
alt.Chart(mtcars).mark_point().encode(
    x='MPG:Q'
)```

All these versions will produce the same visualization. The longest form has extra arguments that we will later use to have finer control of the look and feel or the transformations on the data. But you may see all these variants out in the wild.

Last step: let's make this a true scatter plot with 2 dimensions. We will use the weight for the Y-axis.

In [11]:
alt.Chart(mtcars).mark_point().encode(
    x='MPG',
    y='weight'
)

## Variants

Let's try some basic variants on the plot. 

Maybe we want to compare different transmission types. For this, we can use the color encoding for the transmission data: 

In [12]:
alt.Chart(mtcars).mark_point(filled=True).encode(
    x='MPG',
    y="weight",
    color="transmission"
)

This, unfortunately, isn't quite right. Transmission in our dataset is 0 or 1. Altair understood this to mean that the data was quantitative and encoded color on the default blue gradient. We want 'transmission' to be a categorical or nominal variable so we can use:

In [13]:
alt.Chart(mtcars).mark_point().encode(
    x='MPG',
    y="weight",
    color="transmission:N"
)

This would be a bit easier to read if the dots were filled so we could better see the colors. Luckily we can easily override the "global" properties of points to make them filled (`.mark_point(filled=True)`). You can read more about this and see more options on the [Altair website](https://altair-viz.github.io/user_guide/customization.html).

In [14]:
alt.Chart(mtcars).mark_point(filled=True).encode(
    x='MPG',
    y="weight",
    color="transmission:N"
)

Let's say we're working with a black and white printer so color isn't the best encoding. Modify the chart definition to use shape instead of color (you might want to sneak a peek at the different encoding channels you have access to on [Altair site](https://altair-viz.github.io/user_guide/encoding.html)). Try to make the default size a bit larger so the points are easier to see.

If you run the next cell you *should* get a button that will show you the answer, but we have noticed that it doesn't work with some security configurations so you can also find a file with all the answers [here](assets/altair/altair_tutorial_answers.txt).

In [15]:
answerButton('assets/altair/5aa41aeb8c53470fad4abdca0b4dfcc6')

answerButton(description='Get Answer', style=ButtonStyle())

In [16]:
alt.Chart(mtcars).mark_point(filled=True).encode(
    x='MPG',
    y="weight",
    shape="transmission:N"
)

Try to make it the "best of both worlds." Encode transmission using both color and shape.

In [17]:
answerButton('assets/altair/2b4770a5c0a145878848f35704eae131')

answerButton(description='Get Answer', style=ButtonStyle())

In [18]:
alt.Chart(mtcars).mark_point(filled=True).encode(
    x='MPG',
    y="weight",
    shape="transmission:N",
    color = "transmission:N"
)

## Bars and Binning

Let's try to build some bar charts. These will require switching to a different kind of mark (`mark_bar` in this case).  Let's plot the MPG for each car:

In [19]:
alt.Chart(mtcars).mark_bar().encode(
    x='model',
    y='MPG'
)

It's a bit hard to read the text. See if you can rotate the plot:

In [20]:
alt.Chart(mtcars).mark_bar().encode(
    y='model',
    x='MPG'
)

Often bar charts are used to "bin" data. For example, let's say we want to plot a bar chart of the number of cars in our dataset that are either manual or automatic. To do this, we would need to bin based on transmission type. We could use Pandas to do this. Here's one way to do this:

In [21]:
binned = mtcars.groupby("transmission").count().reset_index()[['transmission','model']]
binned = binned.rename(columns={'model':'count'})
binned

Unnamed: 0,transmission,count
0,0,19
1,1,13


From this, we can build our bar chart:

In [22]:
alt.Chart(binned).mark_bar().encode(
    x='transmission:N',
    y='count'
)

Because this kind of operation is quite common, the Grammar of Graphics (generally), and Altair (specifically), provide an easier way to do this. 

We can tell Altair that we would like to bin our data (the default behavior for a nominal variable is to bin based on the categorical value... in this case 0 or 1).

`x=alt.X('transmission:N',bin=True)`

This will group all cars into two buckets: 1 and 0. At that point, we have to tell Altair what we want to know about each of those buckets. In this case we just want to know how many cars of each type we have. So we can use the special operation `count()`:

In [23]:
alt.Chart(mtcars).mark_bar().encode(
    x=alt.X('transmission:N',bin=True),
    y=alt.Y('count()'),
)

Note that we can ask lots of other things about data in each bin. For example, we can ask for the average MPG (replace `count()` with `mean(MPG)`).

Or we could ask for the max horsepower (`max(HP)`) or the median cylinders or any number of other aggregations. (see more on the [Altair site](https://altair-viz.github.io/user_guide/transform.html)). Try some of your own here:

In [24]:
alt.Chart(mtcars).mark_bar().encode(
    x=alt.X('transmission:N',bin=True),
    y=alt.Y('max(HP)')
)

We can bin data based on numerical properties as well. For example, we can generate a histogram of weights for our cars:

In [25]:
alt.Chart(mtcars).mark_bar().encode(
    x=alt.X('weight',bin=True),
    y=alt.Y('count()')
)

As with most things in Altair, bins can be modified by using a special function to generate additional grammar of graphics commands. Bins, in particular, can be modified with the [BinParams](https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html#altair.BinParams) method. A simple example is making more bins:

In [26]:
alt.Chart(mtcars).mark_bar().encode(
    x=alt.X('weight',bin=alt.BinParams(maxbins=20)),
    y=alt.Y('count()')
)

`mark_bar` based plots can also have various encoding channels. Try to create a plot with two histograms on weight, one for each transmission type:

In [30]:
answerButton('assets/altair/83f8b167a0f346eebcd1a3a35034a228')

answerButton(description='Get Answer', style=ButtonStyle())

In [32]:
# your answer here
alt.Chart(mtcars).mark_bar().encode(
    x=alt.X('weight',bin=alt.BinParams(maxbins=20)),
    y=alt.Y('count()'),
    color=alt.Color('transmission:N')
)

## Sorting
In addition to binning, Altair also supports other transformations. Sorting is one of the most useful ones and is fairly easy. Below, we take our original bar chart showing the MPG for each car and add a sort on the Y-axis (the cars). We use the `EncodingSortField` function, indicating that we want the models sorted by MPG (in descending order). More information on sorting can be found [here](https://altair-viz.github.io/user_guide/generated/core/altair.EncodingSortField.html).

In [33]:
alt.Chart(mtcars).mark_bar().encode(
    x='MPG',
    y=alt.Y(
        'model',
        sort=alt.EncodingSortField(
            field="MPG",  # The field to use for the sort
            order="descending"  # The order to sort in
        )
    )
)

## Other marks
Before we leave our cars dataset, we illustrate one other mark type, `mark_rect`, which can be used for heatmap style visualizations. This structure is useful when we want to bin in two different dimensions. For example, we might want to know how many cars have certain combinations of MPG and HP. This would be:

In [34]:
alt.Chart(mtcars).mark_rect().encode(
    x=alt.X('MPG', bin=True),
    y=alt.Y('HP', bin=True),
    color='count()'
)

## Additional Worked Examples
We're going to work through a more sophisticated dataset to illustrate a few other key features. To do so, we'll make use of an indicator dataset that tracks various properties of cities over time (demographic, health, financial, etc.) 

In [35]:
# load and clean the dataset a bit
indic = pd.read_csv('assets/altair/indicator_tutorial.csv')
indic.sample(5)

Unnamed: 0,Indicator,Year,Value,Place
4410,Gonorrhea_Rate,2016,201.9,"Phoenix, AZ"
1406,Heart_Disease_Mortality_Rate,2011,301.4,"Detroit, MI"
5608,High_School_Students_Who_Are_Obese,2014,0.2,"Seattle, WA"
4602,High_Housing_to_Income,2016,20.7,"Portland, OR"
259,Adults_Who_Currently_Smoke,2013,18.4,"Boston, MA"


For our next analysis, we want to compare the fraction of adults who are obese (overweight) to the number of deaths due to heart disease. We would like to see if there is a relationship between the two but would like to understand if some states are more extreme than others. To make our analysis easier, we're just going to grab the data for 2014.

In [36]:
# we're going to use a helper function that will transform the 'long' form 
# to a wide form so that the two indicators we care about sit in the same
# row based on matching the year and place
health1 = mergeIndic(indic,["Heart_Disease_Mortality_Rate","Adults_Who_Are_Obese"])

# grab the data for 2014
health1 = health1[health1.Year == 2014]

# finally, we're going to add a state column (Place is city and state)
health1['State'] = health1['Place'].str[-2:]

health1.sample(5)

Unnamed: 0,Place,Year,Adults_Who_Are_Obese,Heart_Disease_Mortality_Rate,State
56,"New York City, NY",2014,24.7,178.0,NY
16,"Columbus, OH",2014,35.3,207.5,OH
61,"Oakland, CA",2014,19.8,133.7,CA
68,"Philadelphia, PA",2014,33.0,211.0,PA
80,"San Antonio, TX",2014,32.1,374.7,TX


Let's create a scatterplot for our data

In [37]:
alt.Chart(health1).mark_point(filled=True,size=90).encode(
    x=alt.X('Heart_Disease_Mortality_Rate'),
    y=alt.Y('Adults_Who_Are_Obese'),
    color=alt.Color('State')
)

Notice that because there are so many states with very similar colors, it's hard to find the specific points we want. If we used a "text" mark instead of a point, this might be easier. Here's an example of this encoding:

In [38]:
alt.Chart(health1).mark_text().encode(
    x=alt.X('Heart_Disease_Mortality_Rate'),
    y=alt.Y('Adults_Who_Are_Obese'),
    color=alt.Color('State'),
    text='State'
)

A particular power of the grammar of graphics framing is that we can combine two marks to get the best of both worlds. Let's make a scatter plot *with* the text label next to it.

To do so we will create two charts and then use the `+` operator to tell Altair to combine them. This is known as operator overloading. Where + usually means to add, Altair "overrides" this so that + means 'put them together':

In [41]:
# first, we will make our scatter plot and 
# instead of plotting it will keep it in a variable 
# called "points"
points = alt.Chart(health1).mark_point(filled=True,size=90).encode(
    x=alt.X('Heart_Disease_Mortality_Rate'),
    y=alt.Y('Adults_Who_Are_Obese'),
    color=alt.Color('State')
)

# next we will create a our text marks. We need to 
# move these a smidge so they don't sit at exactly
# the same place. Put this chart in "text"
text = alt.Chart(health1).mark_text(
    align='left',  # move the text mark
    dx=7           # 7 points
).encode(
    x=alt.X('Heart_Disease_Mortality_Rate'),
    y=alt.Y('Adults_Who_Are_Obese'),
    color=alt.Color('State'),
    text='State'
)

# combine the two and geneate the plot
points+text

This is a bit more verbose than necessary. We don't need to create two visualizations completely from scratch because `text` repeats much of what we already set up in `points`. Here's a more concise form. We will take the points chart and overwrite features of it (i.e., the mark type) and add a few features to it (e.g., `text='State'`). The rest (the X, Y, and color encoding) will stay the same.

In [42]:
points = alt.Chart(health1).mark_point(filled=True,size=90).encode(
    x=alt.X('Heart_Disease_Mortality_Rate'),
    y=alt.Y('Adults_Who_Are_Obese'),
    color=alt.Color('State')
)

text = points.mark_text(
    align='left',
    dx=7
).encode(
    text='State'
)

points + text

As another layering example, consider the basic time series. We want to see how E-coli infections have changed over time. We start by getting the data:

In [43]:
ecoli = getIndic(indic,indicator="E-Coli")  # grab all the e-coli data
ecoli = ecoli.dropna() # drop missing data
ecoli.sample(5)

Unnamed: 0,Indicator,Year,Value,Place
1508,E-Coli,2013,0.0,"Detroit, MI"
1304,E-Coli,2014,2.7,"Denver, CO"
5073,E-Coli,2014,1.1,"San Diego County, CA"
1835,E-Coli,2010,0.3,"Houston, TX"
3231,E-Coli,2014,0.7,"Miami, FL"


And then creating a basic time series using `mark_line`. The x-axis will be the year and the Y will be the mean infection rate for that year:

In [44]:
alt.Chart(ecoli).mark_line().encode(
    alt.X('Year:O'),
    alt.Y('average(Value)')
).properties(height=200)

This is ok, but it's a bit hard to read because we often want points at key places in the time series so we can find the values on the axes more easily (imagine a time series with a dot at every bend). We can do this by layering:

In [45]:
line = alt.Chart(ecoli).mark_line().encode(
    alt.X('Year:O'),
    alt.Y('average(Value)')
).properties(height=200)

dots = line.mark_point(filled=True)

line+dots

# Long versus Wide Data
One issue with Altair is that certain visualizations are easier or harder, depending on how the data is organized. For example, to get a stacked bar chart when our data is organized in a wide format, we have to create two different visualizations and layer them. For example:

In [46]:
# here's the data
health3 = mergeIndic(indic,["Adults_Who_Are_Obese","Heart_Disease_Mortality_Rate"])
health3 = health3[health3.Year == 2014]
health3.sample(5)

Unnamed: 0,Place,Year,Heart_Disease_Mortality_Rate,Adults_Who_Are_Obese
26,"Denver, CO",2014,154.6,17.3
99,"Philadelphia, PA",2014,211.0,33.0
55,"Kansas City, MO",2014,181.0,34.7
32,"Detroit, MI",2014,307.1,32.2
92,"Oakland, CA",2014,133.7,19.8


In [47]:
# make one bar chart for heart disease
c1 = alt.Chart(health3).mark_bar(opacity=0.7).encode(
    x = alt.X('Place'),
    y = alt.Y('Heart_Disease_Mortality_Rate')
)

# and another for obesity
c2 = alt.Chart(health3).mark_bar(opacity=0.7).encode(
    x = alt.X('Place'),
    y = alt.Y('Adults_Who_Are_Obese')
)

# and put them together
c1 + c2

The problem is that with more classes of data this layering becomes tedious. We also don't get quite the right colors because the two charts don't know what the other has picked. Thus, depending on the task we may need to switch from the "wide" format to the "long" format (see [more here](https://altair-viz.github.io/user_guide/data.html#long-form-vs-wide-form-data)). 

To get the data back into the long form, we'll use the Pandas operation [melt](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html).

In [48]:
health4 = health3.melt(id_vars=['Place','Year'])
health4.head(5)

Unnamed: 0,Place,Year,variable,value
0,"Columbus, OH",2014,Heart_Disease_Mortality_Rate,207.5
1,"Denver, CO",2014,Heart_Disease_Mortality_Rate,154.6
2,"Detroit, MI",2014,Heart_Disease_Mortality_Rate,307.1
3,"Fort Worth, TX",2014,Heart_Disease_Mortality_Rate,158.6
4,"Indianapolis, IN",2014,Heart_Disease_Mortality_Rate,169.3


Here's the stacked bar chart in one short command

In [49]:
alt.Chart(health4).mark_bar(opacity=0.7).encode(
    x='Place',
    y=alt.Y('value',stack=None),
    color='variable'
)

We can also build a bar chart for every city using the `column` argument that will "facet" the data.

In [50]:
alt.Chart(health4).mark_bar().encode(
    x='variable',
    y='value',
    color='variable',
    column='Place'
)

In the case above, all the bars are actually part of the same bar chart. Altair has the ability to facet explicitly based on some aspect of the data so that each facet is in its own chart. This may give us finer control over the look:

In [51]:
alt.Chart(health4).mark_bar().encode(
    x='variable',
    y='value',
    color='variable'
).properties(
    width=80,
    height=180
).facet(
    column='Place'
)

Another cool aspect of Altair is that it's easy to put visualizations together. The "|" operation will put two charts side by side, whereas the "&" will put one on top of the other. You can combine these in various ways.

In [52]:
houston = getIndic(indic,indicator="Total_Population",place="Houston, TX")
houston = houston.dropna()

dallas = getIndic(indic,indicator="Total_Population",place="Dallas, TX")
dallas = dallas.dropna()

houston_chart = alt.Chart(houston).mark_bar().encode(
    x="Year:O",
    y="Value:Q"
).properties(
    title='Houston',
    height=200
)

dallas_chart = alt.Chart(dallas).mark_bar().encode(
    x="Year:O",
    y="Value:Q"
).properties(
    title='Dallas',
    height=200
)

In [53]:
houston_chart | dallas_chart

In [54]:
houston_chart & dallas_chart

Here's another example where we show a scatter plot and place the two distributions (for the x and y) along the top and right.

In [55]:
health5 = mergeIndic(indic,["Heart_Disease_Mortality_Rate","Adults_Who_Are_Obese"])

points = alt.Chart(health5).mark_point(filled=True,size=90).encode(
    x=alt.X('Heart_Disease_Mortality_Rate'),
    y=alt.Y('Adults_Who_Are_Obese')
)

distribright= alt.Chart(health5).mark_bar().encode(
    y = alt.Y('Adults_Who_Are_Obese',bin=alt.BinParams(maxbins=20)),
    x = alt.X('count()'),
).properties(width=30)

distribtop = alt.Chart(health5).mark_bar().encode(
    x = alt.X('Heart_Disease_Mortality_Rate',bin=alt.BinParams(maxbins=20)),
    y = alt.Y('count()'),
).properties(height=30)

distribtop & (points | distribright)


Notice that the visualization doesn't look quite right. There are axis labels and underscore characters and other features that make this somewhat visually unappealing. Altair lets you fix many of these things up (we'll talk about more below), but here's a simple example where we modify the axes and labels.

In [56]:
points = alt.Chart(health5).mark_point(filled=True,size=90).encode(
    x=alt.X('Heart_Disease_Mortality_Rate',axis=alt.Axis(title='Heart Disease Mortality Rate')),
    y=alt.Y('Adults_Who_Are_Obese',axis=alt.Axis(title='Percent Obese'))
)

# remove the axes
distribright= alt.Chart(health5).mark_bar().encode(
    y = alt.Y('Adults_Who_Are_Obese',bin=alt.BinParams(maxbins=20),axis=None),
    x = alt.X('count()',axis=None),
).properties(width=30)

distribtop = alt.Chart(health5).mark_bar().encode(
    x = alt.X('Heart_Disease_Mortality_Rate',bin=alt.BinParams(maxbins=20),axis=None),
    y = alt.Y('count()',axis=None),
).properties(height=30)

distribtop & (points | distribright)

## Styling

We end with a note that while many of the defaults of Altair are reasonable, there are situations that we need to override these either for aesthetic or more practical reasons (see https://altair-viz.github.io/user_guide/configuration.html). Take a look at the Altair documentation. Chances are you can modify the chart to look the way you want. Here's an example where we override the axes and titles, add some sorting, etc.

In [57]:
# get the population data for 2014
population = getIndic(indic,indicator="Total_Population",year=2014)
population = population.dropna()
population.sample(5)

Unnamed: 0,Indicator,Year,Value,Place
1547,Total_Population,2014,680281.0,"Detroit, MI"
4382,Total_Population,2014,1537045.0,"Phoenix, AZ"
4536,Total_Population,2014,776712.0,"Portland, OR"
760,Total_Population,2014,389524.0,"Cleveland, OH"
2338,Total_Population,2014,470816.0,"Kansas City, MO"


In [58]:
alt.Chart(population).mark_point(color='firebrick',size=90).encode(
    x=alt.X('Value',scale=alt.Scale(type='log'),axis=alt.Axis(title='Log-Scaled Values')),
    y=alt.Y(
        'Place',
        axis=alt.Axis(title='City Name'),
        sort=alt.EncodingSortField(
            field="Value",  # The field to use for the sort
            order="descending"  # The order to sort in
        )
    )
)


In [59]:
# here's an example where we take the chart from the previous example and 
# modify it by globally changing the background color, making the grid white
# and then 
(distribtop & (points | distribright)).configure(
    background='#DCDCDC',
).configure_axis(
    labelFontSize=10,
    labelFont='Courier',
    titleFontSize=20,
    titleFont='Helvetica',
    gridColor='white'
)

Altair and Vega-Lite have a number of built-in 'themes' that you can enable. These will set global fonts, colors and other features to look like your favorite newspaper or application.  Take a look at how to enable themes (or configure your own) here: https://altair-viz.github.io/user_guide/configuration.html#altair-themes and here: https://github.com/vega/vega-themes/

In [60]:
alt.themes.enable('dark')
(distribtop & (points | distribright))

In [61]:
# to go back to the regular theme, use 'default'
alt.themes.enable('default')
(distribtop & (points | distribright))