<img src='img/logo.png'>

<img src='img/title.png'>

<img src='img/py3k.png'>

# Plotting with Pandas

Pandas operations followed by `.plot()` is a powerful way to interact with your data.

# Table of Contents
* [Plotting with Pandas](#Plotting-with-Pandas)
* [Learning Objectives:](#Learning-Objectives:)
* [Using Pandas .plot()](#Using-Pandas-.plot%28%29)
	* [Line plots](#Line-plots)
		* [Timeseries](#Timeseries)
	* [Scatter](#Scatter)
	* [Box Plots](#Box-Plots)
	* [Histograms](#Histograms)
	* [2D Histograms](#2D-Histograms)
	* [Bar Charts](#Bar-Charts)
* [Exercise](#Exercise)


# Learning Objectives:

After completion of this module, learners should be able to:

* generate statistical plots from Pandas DataFrames with `.plot()`
  * Bar charts, box plots, line plots, scatter plots and historgrams

In [None]:
import numpy as np
import pandas as pd
from pandas_datareader import data

In [None]:
# seaborn provides a modern look to the plots
import seaborn as sns
%matplotlib inline

# Using Pandas .plot()

Pandas plotting with `.plot()` is driven by Matplotlib. Many plot options are set by default. Some plot styles perform statistical operations.

There are several plot types available using the `kind=` keyword argument.
* `line` : line plot (default)
* `bar` : vertical bar plot
* `barh` : horizontal bar plot
* `hist` : histogram
* `box` : boxplot
* `kde` : Kernel Density Estimation plot
* `density` : same as 'kde'
* `area` : area plot
* `pie` : pie plot
* `scatter` : scatter plot
* `hexbin` : hexbin plot

See the [Pandas documentation](http://pandas.pydata.org/pandas-docs/stable/visualization.html).

## Line plots

The default is to plot all columns at once. It can crowd the plot.

In [None]:
degrees = pd.read_csv('data/percent-bachelors-degrees-women-usa.csv', index_col='Year')
degrees.plot()

We can make a subset of the columns.

In [None]:
stem=['Computer Science', 'Math and Statistics', 'Engineering', 'Physical Sciences', 'Biology']
degrees[stem].plot(figsize=(15,7))

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big><tt>figsize</tt> is a tuple of width and height of the plot in inches</big>
<br><br>
</div>

### Timeseries

Timeseries formatting is handled gracefully with Pandas.

In [None]:
aapl = data.DataReader('AAPL','yahoo', '2007-1-1', '2007-12-31')
aapl[['Close','Volume']].plot(figsize=(15,8), subplots=True)

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big><tt>subplots=True</tt> works for any plot over more than one column of data</big>
<br><br>
</div>

## Scatter

Plot bivariate `x` and `y` data stored in columns.

In [None]:
auto = pd.read_csv('data/auto-mpg.csv')

In [None]:
auto.plot(kind='scatter', x='hp', y='mpg')

In [None]:
auto.plot(kind='scatter', x='hp', y='mpg', color='green')

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big>All plot styles accept <tt>subplots=</tt>, which can be a single colorname or a list</big>
<br><br>
</div>

<div class='alert alert-success'>
<img src='img/topics/Advanced-Concept.png' align='left' style='padding:10x'>
<br>
<big>The <tt>statsmodels</tt> package can be used to fit regressions and curves.</big>
<br><br>
</div>

## Box Plots

Box Plots provide a quick statistical overview of column data.

In [None]:
auto[['mpg','hp','weight']].plot(kind='box', subplots=True, figsize=(20,8), sym='k.');

We can also use the `.boxplot()` method to perform the statistical analysis after a groupby operation.

In [None]:
auto.boxplot(column='mpg', by='origin', figsize=(12,8), sym='k.')

<div class='alert alert-info'>
<img src='img/topics/Essential-Concept.png' align='left' style='padding:10x'>
<br>
<big>Only <tt>.boxplot()</tt> supports grouping</big>
<br><br>
</div>

Here I'm using a timeseries and grouping/resampling by month.

Putting the grouper in a list is required suppress an error message.

In [None]:
pit= pd.read_csv('data/pittsburgh2013.csv', parse_dates=['Date'], index_col='Date')
pit.boxplot(column='Mean TemperatureF', by=[pd.TimeGrouper(freq='M')], figsize=(12,8), rot=45, sym='k.')

## Histograms

In [None]:
pit['Max TemperatureF'].plot(kind='hist', figsize=(15,8))

The Kernel Density Plot is closely related to a histogram.

In [None]:
pit['Max TemperatureF'].plot(kind='kde', figsize=(15,8))

## 2D Histograms

2D histograms (also known as hexbins) have a reasonable default bin size and can be changed with `bins=`.

In [None]:
auto.plot(kind='hexbin', x='weight', y='hp', bins=10, figsize=(12,8))

## Bar Charts

Bar charts can be constructed from one or more columns of numeric data. The important part is that the X-axis data must be in the Index.

In [None]:
auto.head()

Pandas groupby is a convenient way to assign an index along with performing aggregation. What was the average MPG per year?

In [None]:
auto.groupby('yr')['mpg'].mean().plot(kind='bar', figsize=(16,8))

<div class='alert alert-info'>
<img src='img/topics/Best-Practice.png' align='left' style='padding:10x'>
<br>
<big>Separating Pandas operations from plotting is highly recommended. Greater customization of the plot can be achieved.</big>
<br><br>
</div>

The standard deviation can be added as an error bar.

In [None]:
auto.groupby('yr')['mpg'].agg(['mean','std']).plot(kind='bar', yerr='std', figsize=(16,8))

Multiple columns can be plotted along size each other.

In [None]:
monthly = pit.resample('M')['Min TemperatureF', 'Mean TemperatureF', 'Max TemperatureF'].mean()
monthly.plot(kind='bar', figsize=(16,8), rot=45)

<div class='alert alert-info'>
<img src='img/topics/Best-Practice.png' align='left' style='padding:10x'>
<br>
<big>In the Matplotlib section we'll learn how to manipulate and combine plots made with <tt>.plot()</tt></big>
<br><br>
</div>

# Exercise

<img src='img/topics/Exercise.png' align='left' style='padding:10px'>
<br><big>
Make bar charts using the Olympic Medals data set.</big>
<br>
<a href="./Pandas_bar_ex.ipynb" class='btn btn-primary'>Bar Charts</a>


<img src='img/copyright.png'>