# The Grammar of Graphics
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo17_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotnine import *
from plotnine.data import midwest
import warnings 
warnings.filterwarnings('ignore') 

**NOTE** If you get errors when you run the cell above, go to the terminal and type the following
```python
pip install plotnine
pip install matplotlib==3.8.3
```

Then come back to this notebook and try again. (You might have to restart your kernel). 

In [None]:
df = midwest
df

In [None]:
# Visualize the relationship between the percent who went to college, and the percent who got a professional degree
# With matplotlib
plt.scatter('percollege','percprof',data = df)
plt.xlabel('percollege')
plt.ylabel('percprof')
plt.show()

In [None]:
# Do the same with a plotnine ggplot
ggplot?

The rest of the plotnine documentation is [here](https://plotnine.org/reference/) (including examples). 

In [None]:
# Do the same with a plotnine ggplot
(ggplot(df, aes(x='percollege',y='percprof'))
+geom_point())

Does the relationship vary by state? Let's make each state a different color.

In [None]:
# Do this with matplotlib
df.state.unique()

In [None]:
# Separate our data
IL = df[df.state == 'IL']
IN = df[df.state == 'IN']
MI = df[df.state == 'MI']
OH = df[df.state == 'OH']
WI = df[df.state == 'WI']

In [None]:
# Create our scatter plots
plt.scatter('percollege','percprof',data = IL,label='IL')
plt.scatter('percollege','percprof',data = IN,label='IN')
plt.scatter('percollege','percprof',data = MI,label='MI')
plt.scatter('percollege','percprof',data = OH,label='OH')
plt.scatter('percollege','percprof',data = WI,label='WI')
plt.xlabel('percollege')
plt.ylabel('percprof')
plt.legend()
plt.show()

In [None]:
# do the same thing with a plotnine ggplot
(ggplot(df, aes('percollege','percprof',color = 'state'))
+geom_point())

These are all on top of eachother. Let's split up the visulazations

In [None]:
# with matplotlib
fig, ax = plt.subplots(1,5,figsize = (12,3))
ax[0].scatter('percollege','percprof',data = IL)
ax[0].set_title('IL')
ax[1].scatter('percollege','percprof',data = IN)
ax[1].set_title('IL')
ax[2].scatter('percollege','percprof',data = MI)
ax[2].set_title('IL')
ax[3].scatter('percollege','percprof',data = OH)
ax[3].set_title('IL')
ax[4].scatter('percollege','percprof',data = WI)
ax[4].set_title('IL')
ax[0].set_xlabel('percollege')
ax[0].set_ylabel('percprof')
plt.show()

In [None]:
# with a plotnine ggplot
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+facet_wrap('state'))

In [None]:
# add statistical transformations
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+facet_wrap('state')
+stat_smooth())

What if we wanted to visualize the number of counties in each state.

In [None]:
# with matplotlib 
counties_per_state = df.state.value_counts()
plt.bar(counties_per_state.index, counties_per_state.values)
plt.show()

In [None]:
# with ggplot
(ggplot(df,aes(x='state'))
+geom_bar())