# Altair Practice
For the CDCS Python Library Series, Course 3, Week 2, December 7-11 2020, with Lucy Havens
***

Before we get started, we'll import the libraries we'll need for visualizing data with Altair:

In [None]:
# To prevent SSL certificate failure
import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
    getattr(ssl, '_create_unverified_context', None)):
    ssl._create_default_https_context = ssl._create_unverified_context

import pandas as pd  # Altair stores data as a pandas DataFrame (a type of table)
import altair as alt
from vega_datasets import data as vega_data  # Altair comes with sample datasets you can practice with

## PART II
Let's explore how Altair's customization capabilities help us to follow principles of good data visualization design!

We'll reference a couple sections of the Altair documentation:
* [Customizing Visualizations](https://altair-viz.github.io/user_guide/customization.html)
* [Top-Level Chart Configuration](https://altair-viz.github.io/user_guide/configuration.html)

To play with a different type of visualization in Altair, we'll use a **bar chart** from an Altair case study, [Exploring Seattle Weather](https://altair-viz.github.io/case_studies/exploring-weather.html).

In [2]:
sw_df = vega_data.seattle_weather()
sw_df.head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain


In [10]:
print(sw_df.shape) # rows, columns
print(min(sw_df.date))
print(max(sw_df.date))

(1461, 6)
2012-01-01 00:00:00
2015-12-31 00:00:00


To make a bar chart, we specify our chart's marks to be of type bar using `.mark_bar()`:

In [12]:
viz = alt.Chart(sw_df).mark_bar().encode(
    alt.X('precipitation', bin=True, title="Precipitation (binned)"),
    y = 'count()'
).properties(title="Precipitation in Seattle from 2012-15")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Can you guess what `bin` does?

Let's see what happens if we change the value of`bin` to `False`...

In [8]:
viz = alt.Chart(sw_df).mark_bar().encode(
    alt.X('precipitation', bin=False),
    y = 'count()'
).properties(title="Precipitation in Seattle")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Let's try adjusting the **scale** of our chart so that we can remove the empty space between 0 and -5 in our domain (the x axis values), because it wouldn't make sense for their to be negative precipitation!  We can use the `scale` channel to adjust the x axis values and the `clip` property to remove data outside the minimum and maximum x axis values we define (the `domain`):

In [27]:
viz = alt.Chart(sw_df).mark_bar(clip=True).encode(
    alt.X('precipitation', bin=False, scale=alt.Scale(domain=[2,30])),
    y = 'count()',
).properties(title="Precipitation in Seattle")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Now, to get a better sense of how precipitation falls *over time*, let's put the dates along our x axis and use the y axis only for counting total precipitation.

In [30]:
viz = alt.Chart(sw_df).mark_bar().encode(
    alt.X('date', bin=True),
    alt.Y('count(precipitation)', title='Total Precipitation')
).properties(title="Precipitation in Seattle from 2012-15")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Hmmm.  That's not a very useful way to display the dates...Altair seems to think they're times, rather than years, months, and days.  Let's specify the format of our `temporal` data points as `yearmonth(date)`.  We can also use an abbreviation for the temporal data type, `T`.

In [43]:
viz = alt.Chart(sw_df).mark_bar().encode(
    alt.X('yearmonth(date):T', title='Month'),
    alt.Y('count(precipitation):Q', title='Total Precipitation')
).properties(title="Precipitation in Seattle from 2012-15")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

We've been simply counting the total precipitation in a month, but what if we want to get a better sense of the day-to-day experience of precipitation in Seattle?  We could take the *average* precipitation for each month, rather than the count!  We'll need to update our Y axis label and chart title, too.

In [44]:
viz = alt.Chart(sw_df).mark_bar().encode(
    alt.X('yearmonth(date):T'),
    alt.Y('average(precipitation):Q', title='Average Precipitation')
).properties(title="Average Monthly Precipitation in Seattle from 2012-15")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Often when you're studying changes over time with lots of time periods, it's more helpful to visualize data in line chart, rather than a bar chart, so let's change our chart to use `.mark_line()`:

In [52]:
viz = alt.Chart(sw_df).mark_line().encode(
    alt.X('yearmonth(date):T', title='Month'),
    alt.Y('average(precipitation):Q', title='Average Precipitation')
).properties(title="Average Monthly Precipitation in Seattle from 2012-15")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Notice what happens when we change the `date` data type from `T` (temporal) to `O` or `N`...

In [53]:
viz = alt.Chart(sw_df).mark_line().encode(
    alt.X('yearmonth(date):N', title='Month'),
    alt.Y('average(precipitation):Q', title='Average Precipitation')
).properties(title="Average Monthly Precipitation in Seattle from 2012-15")

viz.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

The year-month combinations are treated as named categories, so they all appear on the X axis with their own tick marks!

The data we have shows other types of weather besides precipitation, so let's use a *stacked bar chart* to visualize that part of our dataset.  We'll combine all years of each month (i.e. January 2012, January 2013, January 2014, and January 2015 will all be under the category January).

In [59]:
weather = alt.Chart(sw_df).mark_bar().encode(
    x='month(date):O',
    y='count()',
    color='weather'
).properties(title="Types of Weather in Seattle from 2012-2015")

weather.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Let's customize our colors!  We'll use the `scale` channel again and specify our colors with [hexadecimal](https://www.w3schools.com/colors/colors_picker.asp) values...

*P.S. I like to use [colorbrewer2.org](https://colorbrewer2.org/#type=qualitative&scheme=Set3&n=5) to quickly look for color schemes that are color-blind friendly, if possible.  If you're interested in learning more about color theory from a design perspective, [Canva](https://www.canva.com/colors/color-wheel/) offers a nice introduction!*

In [71]:
scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                  range=['#ffffb3','#8dd3c7', '#bebada', '#80b1d3', '#fb8072'])

weather = alt.Chart(sw_df).mark_bar().encode(
    x=alt.X('month(date):T', title='Month'),
    y=alt.Y('count()', title='Days with Weather Type'),
    color=alt.Color('weather', legend=alt.Legend(title='Weather Type'), scale=scale),
).properties(title="Types of Weather in Seattle from 2012-2015")

weather.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)

Notice how we've also specified more specific axes and legend names to help our viewers interpret the data visualization!

Instead of grouping the data by month, we could also group it by year...

In [70]:
scale = alt.Scale(domain=['sun', 'fog', 'drizzle', 'rain', 'snow'],
                  range=['#ffffb3','#8dd3c7', '#bebada', '#80b1d3', '#fb8072'])

weather = alt.Chart(sw_df).mark_bar().encode(
    x=alt.X('year(date):N', title='Month'),
    y=alt.Y('count()', title='Days with Weather Type'),
    color=alt.Color('weather', legend=alt.Legend(title='Weather Type'), scale=scale),
).properties(title="Types of Weather in Seattle from 2012-2015")

weather.configure_title(
    fontSize=18,
    font='Helvetica',
    anchor='start',
    color='black'
)