<img src=../images/gdd-logo.png width=300px align=right>

# Plotting in Pandas

A great feature of Pandas is that it is really easy to make plots and visualisation directly from the dataframes.

In [None]:
import pandas as pd

- [Pandas Plots](#1)
    - [<mark> Exercise: line </mark>](#line)
    - [<mark> Exercise: bar </mark>](#bar)   
- [Further Customisation](#2)
- [Advanced Pandas Plots](#3)
    - [<mark> Exercise: Growth for diet 1 </mark>](#diet1)  
    - [<mark> Assignment </mark>](#assignment)  

<a id='1' ></a>
## Pandas Plots

You can use the [`.plot()`](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.plot.html) method on a DataFrame to create plot, e.g.

    *dataframe*.plot(x = ..., y =..., kind =..., ...)
    
The `.plot()` method takes several optional parameters. Most notably the *kind* parameter, which accepts eleven different string values and determines which kind of plot you’ll create:

1. "area" is for area plots.
2. "bar" is for vertical bar charts.
3. "barh" is for horizontal bar charts.
4. "box" is for box plots.
5. "hexbin" is for hexbin plots.
6. "hist" is for histograms.
7. "kde" is for kernel density estimate charts.
8. "density" is an alias for "kde".
9. "line" is for line graphs.
10. "pie" is for pie charts.
11. "scatter" is for scatter plots.

Let's demonstrate a few of these on some toy datasets.

### Scatter

In [None]:
#Dataset tracking unemployment rate against the stock index price.

data1 = {'Unemployment_Rate': [6.1,5.8,5.7,5.7,5.8,5.6,5.5,5.3,5.2,5.2],
        'Stock_Index_Price': [1500,1520,1525,1523,1515,1540,1545,1560,1555,1565]
       }
  
df1 = pd.DataFrame(data1)
df1

In [None]:
df1.plot(x = 'Stock_Index_Price', y = 'Unemployment_Rate', kind='scatter');

You can suppress the excess output with a semi-colon. This can also be done by creating a dummy variable (e.g. saying `_ = df.plot(...)`)

<a id='line' ></a>
### <mark> Exercise: Line </mark>

1. Produce a line chart for the following dataset tracking yearly unemployment rate. 

2. Customise it with a `color` argument (the input to which should be a string).

In [None]:
data2 = {'Year': [1920,1930,1940,1950,1960,1970,1980,1990,2000,2010],
        'Unemployment_Rate': [9.8,12,8,7.2,6.9,7,6.5,6.2,5.5,6.3]
       }
df2 = pd.DataFrame(data2)
df2

<a id='bar' ></a>
### <mark> Exercise: Bar </mark>

1. Make a bar chart with the data below. 
2. See if you can add a title.

In [None]:
#Dataset for countries GDP per Capita.
data3 = {'Country': ['USA','Canada','Germany','UK','France'],
        'GDP_Per_Capita': [45000,42000,52000,49000,47000]
       }
  
df3 = pd.DataFrame(data3)
df3

### <mark> Exercise: Pie </mark>

1. Make a pie chart with the data below. 
2. Try adding `startangle=90` as argument. What changes?

<details>
    
  <summary><span style="color:blue">Show hint</span></summary>
  
A pie chart doesn't have an x-axis.

</details>

In [None]:
#Dataset tacking tasks completed
data4 = {'Tasks': [300,500,700]}
df4 = pd.DataFrame(data4, index = ['Tasks Pending',
                                 'Tasks Ongoing',
                                 'Tasks Completed']
                 )
df4

#### Answers

In [None]:
# %load ../answers/plotting_with_pandas/basic-plots.py

<a id='3' ></a>
## Advanced Pandas Plots

Often a datasets needs some processing before you make our visualisations.

To demonstrate this, let's bring in the chickweight dataset.

In [None]:
chickweight = (
    pd.read_csv('../data/chickweight.csv') 
      .rename(str.lower, axis='columns')
)
chickweight.head()

In [None]:
chickweight.plot();

Imagine you wanted to explore how weight increased over time. Simply plotting weight vs. time would not be that insightful.

In [None]:
chickweight.plot(x='time', y='weight', kind='scatter');

It may be more interesting to plot the mean weight over time.

In [None]:
#First we'll collect this information
(
    chickweight
    .groupby('time')
    ['weight'].mean()
)

In [None]:
(
    chickweight
    .groupby('time')
    ['weight'].mean()
).plot()

This is good, but originally yu wanted to investigate how the differrent diets affect the chicks.

So, let's see how the average weight at different time periods compares for the different diets. 

i.e. let's plot four different average weight vs. time graphs, one for each diet.

In [None]:
# First let's get the average weight at different time periods for the different diets.

mean_diet_weight_per_timestep = (
    chickweight
    .groupby(['time', 'diet'])
    ['weight'].mean()
    .reset_index()
    .rename(columns={'weight': 'mean_weight'})
)

In [None]:
mean_diet_weight_per_timestep.head()

<a id='diet' ></a>
### <mark> Exercise: Growth for diet 1 </mark>

Select the data only for diet 1 and plot this information.

In [None]:
mean_diet_weight_per_timestep

Ok, now let's see one for each diet.

To do this, you could use a for loop to repeat the above, but for the different diets.

Alternatively the unstack method allows you to plot everything in one go!

In [None]:
(
    chickweight
    .groupby(['time', 'diet'])
    ['weight'].mean()
    .unstack()
)

In [None]:
(
    chickweight
    .groupby(['time', 'diet'])
    ['weight'].mean()
    .unstack()
).plot()

Note that the index (time) is the x-axis, and that each column gets charted.

<a id='assignment' ></a>
## <mark>Assignment: rate of growth</mark>

You have now seen how the weight of the chickens increases over time. But at what stage do the  chickens grow the most? 
Also, does the growth per time depend on the diet?  To answer these questions you need to investigate the *rate* at which the chickens grow.


## Part A

To get started, you'll add a column to show weight increase for the chicks.

1) Create a column for the weight difference between each row using the `.diff()` method. 

To test whether you have done this correctly, use `.head(15)` and check if the previous weight for Chick 2 at time step 0 makes sense.

*Hint: look back at the Creating Columns notebook on how to do this.*

In [None]:
# %load ../answers/plotting_with_pandas/weight-increase-1.py

2) Fill any missing values with 0.

*Hint: use the `.fillna()` method on your previous chain.*

In [None]:
# %load ../answers/plotting_with_pandas/weight-increase-2.py

## Part B

1) Now you have the code to create a weight increase column, calculate average weight increase over time combined for all the chicks. This means you should calculate the average chicken weight at time 0, at time 2, etc. 
2) Plot this average increase!

*Hint: You will need to do another `.groupby()` and use `.agg()`.*
<br></br>
<details>
    
  <summary><span style="color:blue">Click here to see what the end result should look like</span></summary>
    
<img src=../images/answer-example2.png width=300px align=left>

</details>

When do chickens grow the most?

In [None]:
# %load ../answers/plotting_with_pandas/growth-rate-1.py

### Bonus: Challenge
1) After adding the difference column, calculate the average weight increase *per diet* for all the chicks combined over time. You should have a row for the combination of time 0 and diet 1, the combination of time 0 and diet 2 ... up to the combination of time 21 and diet 4. 
2) Plot the result!

*Hint: use `.unstack()` before plotting.*
<br></br>
<details>
    
  <summary><span style="color:blue">Click here to see what the end result should look like</span></summary>
    
<img src=../images/answer-example1.png width=300px align=left>

</details>


Does the growth per time depend on the diet?

In [None]:
# %load ../answers/plotting_with_pandas/growth-rate-2.py