https://plotnine.readthedocs.io/en/stable/api.html

In [None]:
#Import packages
import pandas as pd
from  plotnine import *

In [None]:
#Instruct Jupyter to display plots inline
%matplotlib inline

## 1. Loading the data into Python

In [None]:
#Read the WaterUse and Population worksheets into data frames
dfWaterUse = pd.read_excel('../data/State_Data_Formatted.xlsx','WaterUse')
dfPopulation = pd.read_excel('../data/State_Data_Formatted.xlsx','Population')

## 2. Joining tables

In [None]:
#Join the tables
dfAll = pd.merge(left=dfWaterUse,
                 right=dfPopulation,
                 how='inner',
                 left_on = 'State',
                 right_on = 'STATE'
                )
dfAll.head()

## 3. Construct the visualization
Recall the Tableau stacked bar plot of water withdrawal by type and category:
![Plot1](https://datadevils.github.io/DataBootCamp/media/tableau/StackedBarPlot1.PNG)
Here we recreate the same plot using Plotnine's ggplot interface

### 3a. Organizing our data
In Tableau, we had to organize our data. Here, our data exist in a tidy format (one and only one row for each observation, with a column for each property of that observation). `ggplot` is designed to work with data in that format, so no need to further organize our data!

### 3b. Plotting our data
The next step is to construct our plots, which we will be doing with the `Plotnine` package. This package enables us to use the`ggplot` interface, which has a bit of a steep learning curve. So, let's pause and introduce the grammer of graphics...

---
### ►`ggplot` _and the Grammer of Graphics_ ◄
Plots have numerous options in presentation, design, and content. The Grammer of Graphics ("GG") is a set of structural rules defining the components of a plotting language that strives to simplify designing plots from such a wide array of options from a coding platform. Even so, it has a bit of a steep learning curve, so to help with that, let's examine it from it's foundations. 

The ["CheatSheet"](https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf) for `ggplot2` (the "R" version of what we are doing) provides a quick introduction to GG. In it, we see that plots are divided into a set of components that our code must specify:
* First is the source of the **data** for our plot. 
* Associated with the data object are its **aesthetics**. The aesthetics include the columns that we wish to plot, as well as the size and color attributes we want to assign to these features. 
* Then we specify a **`geom`** object, which indicate how we want to plot our data. We can specify multiple `geom` objects.

There are some more elements in the grammer of graphics, but let's start there and experiment. First, let's see if we can replicate the stacked bar plot we did in Tableau.  

---

### Stacked bar plot
Take a look at the ggplot2 cheatsheet. Specifically, look at the various graphical primitives, or "geoms", we have at our disposal. They are organized by the types of data we wish to plot. In our example, we want to plot one discrete value ("`Category`") and one continuous one ("`Withdrawal_MGD`"). The option for `geom_col` appears to serve our purpose. Let's try that: 

This is a two step process. First we create the base ggplot object, telling it to use the `dfAll` dataframe as the source of data. Then, we add our `geom_col` object, using its `aes` object to define which columns to use in the plot. 

In [None]:
#Create the ggplot object, defining it to use our dfAll dataframe as the source of data
myPlot = ggplot(data=dfAll)

In [None]:
#Now add the geom_col object to our ggplot object along with some aesthetics, and it will plot
myPlot + geom_col(aes(x='Category', y='Withdrawal_MGD'))

Not perfect, but a good start. To learn what we just did, making a plot showing total withdrawal by source vs by category. 

Next, we can modify our plot by tweaking the aesthetics. Here we'll use the `fill` property, which colors the bar based on the property we specify. (I've also now split the command across mulitple lines to improve readability.)

In [None]:
#Now add the geom_col object
myPlot + geom_col(aes(x='Category',
                      y='Withdrawal_MGD',
                      fill='Source',
                     ))

#### Facets
More interesting and closer to our ultimate goal, but we also want separate plots for Fresh vs Saline water withdrawals. This is done using **facets**. Facets split data on values in one or more fields, creating a separate plot for each subset of data. Below the `facet_grid` command causes our plot to create separate plots based on values the `Type` field. (*I've again modified the format of the Python command to make it more readable...*)

In [None]:
#Create facets based on the Type column
(ggplot(data=dfAll) + 
 geom_col(aes(x='Category',
              y='Withdrawal_MGD',
              fill='Source')) + 
 facet_grid(facets='~Type'))

Closer to what we want, but we still need to (1) arrange the data so the bars are horizontal, and (2) tidy up the text. 

In [None]:
options.figure_size

In [None]:
#Create facets based on the Type column
(ggplot(data=dfAll) + 
 geom_col(aes(x='Category',
              y='Withdrawal_MGD',
              fill='Source')) + 
 facet_grid(facets='~Type') +
 coord_flip() + 
 ggtitle('Withdrawal by category') + 
 ylab("Water withdrawals (MGD)")
)

In [None]:
(ggplot(data=dfAll,mapping=aes(x='Category',y='Withdrawal_MGD',fill='Source')) +
  geom_col() +
  facet_wrap(facets = 'Type') + 
  theme(axis_text = element_text(angle=90)) +
  ggtitle('Withdrawal by category') + 
  ylab('Total withdrawal (MGD)') + 
)

![Transposed](https://datadevils.github.io/DataBootCamp/media/tableau/StackedBarPlot2.PNG)

In [None]:
(ggplot(data=dfAll[dfAll.Category != 'Total'],
        mapping=aes(x='Category',y='Withdrawal_MGD',fill='Source')) +
  geom_col() +
  coord_flip() +
  facet_wrap(facets = 'Type',ncol=1) + 
  ylab('Total withdrawal (MGD)') +
  theme(axis_text = element_text(angle=0)) 
)

### 3c. Filter for just Fresh

In [None]:
df_Fresh = dfAll.query('Type == "Fresh"')
df_Fresh.index = 
(ggplot(data=df_Fresh[dfAll.Category != 'Total'],
        mapping=aes(x='Category',y='Withdrawal_MGD',fill='Source')) +
  geom_col() +
  coord_flip() +
  #facet_wrap(facets = 'Type',ncol=1) + 
  ylab('Total withdrawal (MGD)') +
  theme(axis_text = element_text(angle=0)) 
)