# More Grammar of Graphics
If you want to type along with me, use [this notebook](https://humboldt.cloudbank.2i2c.cloud/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fbethanyj0%2Fdata271_sp24&branch=main&urlpath=tree%2Fdata271_sp24%2Fdemos%2Fdata271_demo18_live.ipynb) instead. 
If you don't want to type and want to follow along just by executing the cells, stay in this notebook. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotnine import *
from plotnine.data import *
import warnings 
warnings.filterwarnings('ignore') 

**NOTE** If you get errors when you run the cell above, go to the terminal and type the following
```python
pip install plotnine
pip install matplotlib==3.8.3
```

Then come back to this notebook and try again. (You might have to restart your kernel). 

In [None]:
df = midwest
df

## Basics

In [None]:
# Visualize the relationship between the percent who went to college, and the percent who got a professional degree
# With matplotlib
plt.scatter('percollege','percprof',data = df)
plt.xlabel('percollege')
plt.ylabel('percprof')
plt.show()

The rest of the plotnine documentation is [here](https://plotnine.org/reference/) (including examples). 

In [None]:
# Do the same with a plotnine ggplot
(ggplot(df, aes(x='percollege',y='percprof'))
+geom_point())

In [None]:
# to make the output pretty, use .draw() (kind of like .show() in matplotlib)
(ggplot(df, aes(x='percollege',y='percprof'))
+geom_point())...

## Adding additional variables with aesthetic mappings (aes)
Does the relationship vary by state? Let's make each state a different color.

In [None]:
# Do this with matplotlib


In [None]:
# Separate our data
IL = ...
IN = ...
MI = ...
OH = ...
WI = ...

In [None]:
# Create our scatter plots
plt.scatter(...)
plt.scatter(...)
plt.scatter(...)
plt.scatter(...)
plt.scatter(...)
plt.xlabel('percollege')
plt.ylabel('percprof')
plt.legend()
plt.show()

In [None]:
# do the same thing with a plotnine ggplot
(
    

).draw()

## Scales
Maybe we don't like that default color scale, we can reset the scales in a number of ways. One way is to choose one of the [discrete color scales from matplotlib](https://matplotlib.org/stable/users/explain/colors/colormaps.html#qualitative).

In [None]:
# Change color scale with matplotlib
(ggplot(df, aes('percollege','percprof',color = 'state'))
+geom_point()
+...).draw() 

In [None]:
# Or you can change the color scale manually
(ggplot(df, aes('percollege','percprof',color = 'state'))
+geom_point()
+...).draw()

## Geometric objects (geom)
What if we wanted to visualize the number of counties in each state?

In [None]:
# with matplotlib 
counties_per_state = ...
plt.bar(...)
plt.show()

In [None]:
# with ggplot
(
    
).draw()

In [None]:
# To adjust the order of the bars, we adjust the x-axis scale
(ggplot(df, aes(x='state'))
+geom_bar()
...).draw()

In [None]:
# Histograms
(
    
).draw()

## Facetting
In the plots above, a lot of points fell on top of eachother. Let's split up the visulizations for each state.

In [None]:
# with matplotlib
fig, ax = plt.subplots(1,5,figsize = (12,3))
ax[0].scatter('percollege','percprof',data = IL)
ax[0].set_title('IL')
ax[1].scatter('percollege','percprof',data = IN)
ax[1].set_title('IN')
ax[2].scatter('percollege','percprof',data = MI)
ax[2].set_title('MI')
ax[3].scatter('percollege','percprof',data = OH)
ax[3].set_title('OH')
ax[4].scatter('percollege','percprof',data = WI)
ax[4].set_title('WI')
ax[0].set_xlabel('percollege')
ax[0].set_ylabel('percprof')
plt.show()

In [None]:
# with a plotnine ggplot
(
    ggplot(df,aes('percollege','percprof',color = 'state'))
    +geom_point()
    ...
).draw()

In [None]:
# adjusting the number of rows in your facetting
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+facet_wrap('state',...)).draw()

In [None]:
# facetting by more than one variable
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+facet_wrap('state',nrow=1)
...).draw()

In [None]:
# adjust the figure size
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
...
+theme(figure_size=(8,16))).draw()

## Statistical transformations (stat)

In [None]:
# add statistical transformations
(ggplot(df,aes('percollege','percprof',color = 'state'))
+geom_point()
+facet_wrap('state')
...).draw()

## Layer-specific mappings

In [None]:
# Use different aesthetics for different parts of graphic
(ggplot(df,aes('percollege','percprof'))
+geom_point(...)
+facet_wrap('state')
+stat_smooth()).draw()

## Themes
There are several things we can adjust about the figure that don't fall into the specific grammar of graphics components. These often fall into the `theme` category. For example, the graph below shows how we can adjust the angle of the x-tick labels. See the documentation for other options.

In [None]:
(ggplot(df, aes(x='percollege'))
+geom_histogram()
+theme(axis_text_x  = element_text(angle = 45, hjust = 1))).draw()

## Activity

The `plotnine` module has several has a dataset called `diamonds`. 

A dataset containing the prices and other attributes of almost 54,000 diamonds.

1. Create a grid of histograms showing the distribution of prices facetted by `cut` and `clarity`. Adjust the number of bins as needed.

2. Plot the number of carats vs the price. Think about how you might gain additional insights with additional aesthetic mappings, facetting, statistical transformations, etc. 