In [None]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Visualization

## Concents

1. Grammar of graphics
    1. Mapping
    2. Geomerty
    3. Scales
    4. Faceting
    5. Statitics
    6. Annotations
    7. Themes
2. Visualization as storytelling
    1. Remove distractions ("chart junk")
    2. Be truthful
    3. Hihglight elements critical to story
    4. Avoid saturated colors
    5. Make it easy for the viewer

## Matplotlib

This is the basic packagge for visualizaiton in Python. Many other visualizaiotn libraries are built on top of this.

- [Matplotlib gallery](https://matplotlib.org/stable/gallery/index)
    - You can download all examples as [Jupyter notebooks](https://matplotlib.org/stable/_downloads/fcaddee3a42ae2e2c41e00ae08d70347/gallery_jupyter.zip)
- Many [packages](https://matplotlib.org/mpl-third-party/) extend Matplotlib

### Data wrangling

In [None]:
import glob

Data from https://aqs.epa.gov/aqsweb/airdata/download_files.html

In [None]:
data_files = glob.glob('data/annual_aqi_by_county*zip')
data_files[:3]

In [None]:
data = pd.concat([pd.read_csv(f, compression='zip') for f in data_files])

In [None]:
data.shape

In [None]:
data.iloc[0]

In [None]:
data.sample(3)

#### Get data for Durham, NC

In [None]:
data.State.unique()

In [None]:
data.loc[data.State == 'North Carolina', 'County'].unique()

In [None]:
df = data.query("State == 'North Carolina' and County == 'Durham'")

In [None]:
df.head()

In [None]:
df = df.sort_values('Year')

### Basci plotting

In GoG terms, we map from data features `Year` and `Median AQI` to the graphical features `x coordiante` and `y coordinate` and display using the line, scatter, and barchart geometries. For the line plot, we add labels and titles.

In [None]:
plt.plot('Year', 'Median AQI', data=df)
plt.xlabel('Year')
plt.ylabel('Median AQI')
plt.title('Air Quality in Durham, NC')
pass

In [None]:
plt.scatter('Year', 'Median AQI', data=df)
pass

In [None]:
plt.bar('Year', 'Median AQI', data=df);

Since `Matplotlib` came before GoG, it is more common to use a more imperative function call.

In [None]:
x = df['Year']
y = df['Median AQI']
plt.plot(x, y, 'b-o')
pass

## Plot within `pandas`

Pandas can generate Matplotlib plots conveniently. Since they are Matplotlib objects, you can work with them using Matplotlib functions.

In [None]:
df.plot.line(x='Year', y='Median AQI')
plt.title('Air quality in Durham, NC')
pass

In [None]:
df.plot.scatter(x='Year', y='Median AQI')
pass

In [None]:
df.plot.bar(x='Year', y='Median AQI')
pass

In [None]:
df_nc = data.query("State == 'North Carolina' and County in ['Durham', 'Mecklenburg', 'Wake']")
df_nc.head(3)

In [None]:
fig, ax = plt.subplots()
df_nc.groupby('County').plot.line(x='Year', y='Median AQI', ax=ax)
pass

The above looks weird because the query did not preserve ordering by Year. Also, the legends are messed up.

In [None]:
df_nc = df_nc.sort_values('Year')

In [None]:
fig, ax = plt.subplots()
for name, group in df_nc.groupby('County'):
    group.plot.line(x='Year', y='Median AQI', ax=ax, label=name)
pass

## Themes

In [None]:
plt.style.available

In [None]:
x = df['Year']
y = df['Median AQI']

In [None]:
with plt.style.context('ggplot'):
    plt.plot(x, y)
pass

In [None]:
with plt.style.context('Solarize_Light2'):
    plt.plot(x, y)
pass

This is a special theme that is called differently.

In [None]:
with plt.xkcd():
    plt.plot(x, y)
pass

## Multiple plots

In [None]:
plt.style.use('default')
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12,4))
for ax, (name, group) in zip(axes, df_nc.groupby('County')):
    group.plot.line(x='Year', y='Median AQI', ax=ax, label=name)

## Maps

Plotting maps is more complex. We will use the `geopanddas` pacakage to do it — here we show how to generate static figures. For interactive maps, see

- [Folium](https://github.com/python-visualization/folium)
- [Interactive mapping with geopandas](https://geopandas.org/en/stable/docs/user_guide/interactive_mapping.html)

You can also use [plotly](https://plotly.com/python/choropleth-maps/) to plot maps.

In [None]:
import geopandas

We need geographic information form US Census in a shapefile format. I downloaded these from the [US Census Cartographic Boundary Files - Shapefile](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.2018.html)

More recent and more detailed shape files can also be found at https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html

### Read shape file using `geopandas`

In [None]:
states = geopandas.read_file('data/cb_2018_us_state_5m/cb_2018_us_state_5m.shp')

Reproject coordinates to Mercator

In [None]:
states = states.to_crs("EPSG:3395")

In [None]:
states.head()

### Merge catographic data with EPA air quality data

In [None]:
states[['NAME', 'STUSPS']]

For display purpsose, only use lower 48 states + DC.

In [None]:
states = states.query("STUSPS not in ['AK', 'AS', 'HI', 'GU', 'MH', 'MP', 'PR', 'VI']")

In [None]:
states.shape

In [None]:
mean_aqi_by_state = data.groupby('State')[['Median AQI']].mean().reset_index()
mean_aqi_by_state

In [None]:
states = states.merge(mean_aqi_by_state, left_on='NAME', right_on='State')

In [None]:
# Create the axes we will plot on
fig, ax = plt.subplots(figsize=(25,15))

# Basic plot coloring by Median AQI using the Wisteria colormap
states.plot(ax=ax, column='Median AQI', cmap='Wistia')

# Add state boundaries
states.boundary.plot(ax=ax, color='grey', linewidth=0.4)

# Pass in the state name and AQI as f-string to the annotate method to label map
states.apply(lambda x: ax.annotate(
    f"{x.NAME}\n{x['Median AQI']:.1f}", 
    xy=x.geometry.centroid.coords[0],
    ha='center', 
    fontsize=14,
    color='black',
),axis=1);

# remove uninformtiv ticsk (geographical coordinates)
plt.xticks([])
plt.yticks([])

# Add title
plt.title('Mean AQI', fontsize=24)

# Suppress retrun output
pass

### Use plotly

https://plotly.com/python/choropleth-maps/