# Plotting in Python

## What We're Covering Today

    - Matplotlib
    - Seaborn
    - Other Popular Libraries

# Matplotlib

In [None]:
import matplotlib.pyplot as plt

Kind of strange, but this is generally how matplotlib is imported.

In [None]:
# First plot
plt.figure()
x = [1, 2, 3, 4, 5]
y = [1, 6, 9, 6, 1]
plt.plot(x, y)
plt.show()

What is happening here:

- `plt.figure()`: Create a new figure
- `plt.plot(...)`: Add a plot to this figure
- `plot.show()`: Render the plot

In a terminal, `plot.show()` opens a new window.  Code execution halts until you close the window.

If you use **ipython** though, you can run `%matplotlib` in order to enable interactive plotting.

In jupyter notebook - use `%matplotlib inline` or `%matplotlib notebook`
- No need for `plt.show` then.  Any in-progress plots are shown at the end of a cell's execution automatically

In [None]:
#%matplotlib inline
plt.figure()
plt.plot(x,y)

In [None]:
%matplotlib notebook
plt.figure()
plt.plot(x,y)

Why use `plt.figure`?

If you don't create a new figure explicitly, then plots are added to the existing figure.
- Unless there is no existing figure.  Then one is just created

In [None]:
plt.figure()
x = [1, 2, 3, 4, 5]
y = [1, 6, 9, 6, 1]
y2 = [5, 6, 7, 6, 5]
plt.plot(x,y)
plt.plot(x,y2)

Now, let's load some more interesting data

In [None]:
from __future__ import print_function # For the python2 people
import pandas as pd # This is typically how pandas is loaded
airlines = pd.read_table("airlines.txt")
airports = pd.read_table("airports.txt")
flights = pd.read_table("flights.txt")
planes = pd.read_table("planes.txt")
weather = pd.read_table("weather.txt")

Lets look at the weather table - this table contains information on the weather at each of the three origin airports for every hour of 2013.

In [None]:
weather

In [None]:
# Let's get the total precipitation for each month
daily_precip = weather.groupby(['origin', 'month'])['precip'].sum().reset_index()
ewr_precip = daily_precip.loc[daily_precip.origin == 'EWR'].sort_values(['month']).precip.values
lga_precip = daily_precip.loc[daily_precip.origin == 'LGA'].sort_values(['month']).precip.values
jfk_precip = daily_precip.loc[daily_precip.origin == 'JFK'].sort_values(['month']).precip.values

print(ewr_precip)

In [None]:
# Let's add multiple line plots to the same axes
plt.figure()
plt.plot(lga_precip)
plt.plot(ewr_precip)
plt.plot(jfk_precip)

In [None]:
# Let's change the style of them
plt.figure()
plt.plot(lga_precip, 'o', markersize=10)
plt.plot(ewr_precip, 'v', markersize=10)
plt.plot(jfk_precip, '*', markersize=10)

### Line/Marker styles in Matplotlib

There are 2 ways to set the style for the line and the points on the edges

1. Specify each property as its own argument in the plot function

    ```python
    plt.plot(x, y,
            linestyle='solid', linewidth=10, color='blue',
            marker='o', markersize=5, markerfacecolor='green', markeredgecolor='red'
            )
    ```

2. Use an abbreviation ([documented here](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot))

    ```python
    plt.plot(x, y, '-ob')  # Solid line, 'o' marker, blue
    ```

In [None]:
# Let's add a legend
plt.figure()
plt.plot(lga_precip, 'o', markersize=10, label='LGA')
plt.plot(ewr_precip, 'v', markersize=10, label='EWR')
plt.plot(jfk_precip, '*', markersize=10, label='JFK')
plt.legend()

Let's add a bit more information to the plot

In [None]:
plt.xlabel('Month')
plt.ylabel('Total Precipitation (inches)')
plt.title('Precipitation Over Time')

In [None]:
# Easy to save to a variety for formats
plt.savefig('precipitation.pdf')
plt.savefig('precipitation.svg')
plt.savefig('precipitation.png')

### Scatter plots

Difference between scatter and plot
- Use scatter plots when you want every point to have a different size or color
- Let's see an example

In [None]:
# Like this, it's not too different from plt.plot with markersize=5
plt.figure()
plt.scatter(weather.temp, weather.dewp, s=5)

In [None]:
plt.figure()
plt.scatter(weather.temp, weather.dewp, c=weather.humid, s=5)

In [None]:
# Let's add a x=y line and a colorbar
plt.plot([0, 100], [0, 100], '--', color='#aaaaaa', linewidth=.5)
plt.colorbar()

**Here we can learn something about temperature, dewpoint, and humidity**

- Dewpoint is the temperature which water will condensate on an object
- It's linearly dependent on temperature, but there was still not a 1-1 mapping
- By adding color, we can see that humidity is the contributor to the difference between temperature and dewpoint

### Multiple plots in a figure - using subplot

In [None]:
plt.figure() # Create a new figure
plt.subplot(1, 2, 1) # 1 row, 2 columns, plot #1

plt.plot(lga_precip, 'o', markersize=10, label='LGA')
plt.plot(ewr_precip, 'v', markersize=10, label='EWR')
plt.plot(jfk_precip, '*', markersize=10, label='JFK')
plt.legend()
plt.xlabel('Month')
plt.ylabel('Total Precipitation (inches)')
plt.title('Precipitation Over Time')

plt.subplot(1, 2, 2) # 1 row, 2 columns, plot #2

plt.scatter(weather.temp, weather.dewp, c=weather.humid, s=5)
plt.plot([0, 100], [0, 100], '--', color='#aaaaaa', linewidth=.5)
plt.colorbar()
plt.xlabel('Temperature (F)')
plt.ylabel('Dewpoint (F)')
plt.title('Temp vs Dewpoint')

### Adjusting subplots

You'll notice that while we have two plots here, they aren't positioned very nicely.

There are three ways to fix this

1. Give the plots more space to start with
    - When you call `plt.figure`, you can specify a figure size (inches x inches) (demo with 10x5)<br><br>
    
2.  Manually specify different spacings using `plt.subplots_adjust`
    - left = 0.125
        - the left side of the subplots of the figure
    - right = 0.9
        - the right side of the subplots of the figure
    - bottom = 0.1
        - the bottom of the subplots of the figure
    - top = 0.9
        - the top of the subplots of the figure
    - wspace = 0.2
        - the amount of width reserved for blank space between subplots, expressed as a fraction of the average axis width
    - hspace = 0.2
        - the amount of height reserved for white space between subplots, expressed as a fraction of the average axis height<br><br>
3.  Call `plt.tight_layout()` and let matplotlib figure it out

In [None]:
plt.close('all') # Close all interactive plots currently open

### Histograms

Lets look at some histograms in matplotlib

To do this, we'll use some of the data in the 'flights' table

In [None]:
flights

In [None]:
daily_departures = (flights.groupby(['origin', 'year', 'month', 'day'])
    .size()
    .reset_index())

ewr_departures = daily_departures.loc[daily_departures.origin == 'EWR'][0]  # .size() put its result in column 0
plt.figure()
plt.hist(ewr_departures, bins=30);

In [None]:
# Add multiple histograms - use 'alpha' to overlay

jfk_departures = daily_departures.loc[daily_departures.origin == 'JFK'][0]  # .size() put its result in column 0
lga_departures = daily_departures.loc[daily_departures.origin == 'LGA'][0]  # .size() put its result in column 0
plt.figure()
plt.hist(ewr_departures, bins=30, label='EWR', alpha=.6)
plt.hist(jfk_departures, bins=30, label='JFK', alpha=.6)
plt.hist(lga_departures, bins=30, label='LGA', alpha=.6)
plt.legend()

### Alternate plot styles

You can use the command 'plt.styles.use' to change the plotting style

Matplotlib comes with several options built-in

In [None]:
plt.style.available

In [None]:
plt.style.use('grayscale')
plt.figure()
plt.hist(ewr_departures, bins=30, label='EWR', alpha=.6)
plt.hist(jfk_departures, bins=30, label='JFK', alpha=.6)
plt.hist(lga_departures, bins=30, label='LGA', alpha=.6)
plt.legend()

# Seaborn

Seaborn is a plotting library that is built on top of **matplotlib**

You can do anything in seaborn with just matplotlib commands.  Seaborn just makes it much less tedious.

Seaborn is useful for:

- Heatmaps
- Statistical plots
- Dealing with Categorical variables

## Learning seaborn conventions with *jointplot*

`jointplot` is a handy function for plotting the joint distribution of two variables

In [None]:
import seaborn as sns  # This is how seaborn is usually abbreviated
tips = sns.load_dataset('tips')
tips

In [None]:
sns.jointplot(tips.total_bill, tips.tip)
#sns.jointplot(tips.total_bill, tips.tip, kind='hex')

Here's our plot! Some things to notice:

- Seaborn has already filled in the x and y labels
- This is actually a matplotlib Figure.  The figure has three Axes (subplots)
- If we zoom in, the histograms zoom to follow

In [None]:
sns.jointplot(x=tips.total_bill, y=tips.tip, kind='kde')


### DataFrames and Seaborn

Seaborn makes it easy to just use dataframes directly.

Recall that our 'tips' dataframe has columns 'total_bill' and 'tip'

All seaborn plotting functions provide an alternate way of calling them that uses the dataframe directly:

In [None]:
# Use dataframe columns directly
sns.jointplot(x='total_bill', y='tip', data=tips)


In [None]:
# Pairplot
sns.pairplot(vars=['total_bill', 'tip', 'size'], data=tips)

In [None]:
# Pairplot - add hue
sns.pairplot(vars=['total_bill', 'tip', 'size'], hue='time', data=tips)
#plt.subplots_adjust(right=.85)

### Box/violin

These are all plots for showing distributions

Let's use a new dataset for this one

In [None]:
titanic = sns.load_dataset("titanic")
titanic

In [None]:
plt.figure()
sns.boxplot(x='class', y='fare', data=titanic)
plt.ylim(0, 200);

In [None]:
plt.figure()
sns.boxplot(x='class', y='fare', hue='alive', data=titanic)
plt.ylim(0, 160)

In [None]:
plt.figure()
sns.violinplot(x='class', y='fare', hue='alive', data=titanic)
plt.ylim(0, 200);

### Styles in Seaborn

One of seaborns early uses was just to get prettier matplotlib plots.

Matplotlib defaults were ugly, but if you just imported seaborn, they'd be set to something that looked nicer.

Now, matplotlib (2.0 and up) has decent looking defaults, but seaborn still has some nice options

- sns.set_style(*style_name*) can switch between plotting styles
    - *style_name* are 'white', 'dark', 'whitegrid', 'darkgrid', 'ticks'<br><br>
- sns.despine() removes the top and right axes spines<br><br>
- sns.set_context(*context_name*) will scale plot elements
    - *context_name* are 'talk', 'paper', 'notebook', 'poster'<br><br>

In [None]:
sns.set_context('paper') # Make text bigger
sns.set_style("dark")
plt.figure()
sns.boxplot(x='class', y='fare', hue='alive', data=titanic)
plt.ylim(0, 160)

In [None]:
sns.set_context('talk') # Make text bigger
sns.set_style("white")
sns.set_style("ticks")
plt.figure()
sns.boxplot(x='class', y='fare', hue='alive', data=titanic)
plt.ylim(0, 160)
plt.subplots_adjust(left=.2, bottom=.2)
sns.despine(offset=25)

### Using *FaceGrid* to create plots for every level of a categorical variable

Let's visualize the distribution of the number of departures per day, in a separate plot for each month


In [None]:
flights_per_day = flights.groupby(['origin', 'month', 'day']).size().reset_index()
flights_per_day

In [None]:
plt.close('all')
g = sns.FacetGrid(data=flights_per_day, col='month', col_wrap=4)
g.map(plt.hist, 0)

In [None]:
# And we can add a hue (because why not?)
g = sns.FacetGrid(data=flights_per_day, col='month', col_wrap=4, hue='origin')
g.map(plt.hist, 0, range=(200, 400), alpha=.6)

In [None]:
# That's still a little hard to see, let's use box plots instead
g = sns.FacetGrid(data=flights_per_day, col='month', col_wrap=4)
g.map(sns.boxplot, "origin", 0)

### Heatmaps

Heatmaps are an area where seaborn really makes things a bit easier.

You can use 'plt.pcolormesh' to plot a grid of color in matplotlib.  However, you'll have to set up all the tick-labels manually.

Here's an example plotting a heatmap with seaborn

In [None]:
# Number of departures per hour
# Rows - day of the week
# Columns - hour of the day
# Values - avg # of flights

def day_of_week_2013(month, day):
    """
    2013 was NOT a leapyear and started on a Tuesday
    """
    days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    day_num = (month-1)*31 + day
    if month > 2: day_num -= 3
    if month > 4: day_num -= 1
    if month > 6: day_num -= 1
    if month > 9: day_num -= 1
    if month > 11: day_num -= 1
    
    return days[(day_num) % 7]

flights['Weekday'] = [day_of_week_2013(month, day) for month, day in zip(flights.month, flights.day)]

counts = flights.groupby(['Weekday', 'hour', 'day', 'month']).size().reset_index()
weekday_counts = counts.groupby(['Weekday', 'hour'])[0].mean()
weekday_counts = weekday_counts.reset_index()
weekday_counts = weekday_counts.pivot(index='Weekday', columns='hour', values=0)
weekday_counts = weekday_counts.fillna(0)
weekday_counts = weekday_counts.loc[['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']] # Sort the weekdays
weekday_counts

In [None]:
plt.figure()
sns.heatmap(weekday_counts)

That's....ok.  But still needs a little work.

In [None]:
plt.figure()
data2 = weekday_counts.drop([1, 5, 22, 23], axis='columns')
sns.heatmap(data2, cmap='YlGnBu', vmin=15, vmax=75)
plt.yticks(rotation=20)
plt.ylabel("")
plt.tight_layout()

In [None]:
plt.figure()
data2 = weekday_counts.drop([1, 5, 22, 23], axis='columns')
sns.heatmap(data2, cmap='YlGnBu', vmin=15, vmax=75, annot=True, square=True, cbar=False)
plt.yticks(rotation=20)
plt.ylabel("")
plt.tight_layout()

## Clustermap - Heatmap+Dendrogram

Let's see if we can cluster the days based on how they deviate from the average # of flights in the day

In [None]:
# Get an idea of how every day deviates from the average number of flights in that day of the week
counts = flights.groupby(['Weekday', 'hour', 'day', 'month']).size().reset_index()
counts['avg'] = [weekday_counts.loc[weekday, hour] for weekday, hour in zip(counts.Weekday, counts.hour)]
counts['deviation'] = counts[0] - counts['avg']
counts['date'] = ["{}/{}".format(month, day) for month, day in zip(counts.month, counts.day)]
counts

In [None]:
data = counts.pivot(index='date', columns='hour', values='deviation').fillna(0)
cm = sns.clustermap(data, col_cluster=False, vmin=-10, vmax=10)
plt.sca(cm.ax_heatmap) # More on this in a minute
plt.yticks(rotation=0);

# Advanced Matplotlib




## Pyplot vs Objects

There are two ways to interact with Matplotlib plots

When we use `pyplot.plot`  (abbreviated as plt.plot) to plot a picture, we're using the pyplot state machine.

This was developed to give a MATLAB-like interface to the plotting system in matplotlib.

`plt.plot` creates a plot using the **current axes** in the **current figure**.

We could also call methods on these Figures and Axes directly.

### Figures and Axes

The image below is a single Figure object, with multiple Axes.

![Figures/Axes](subplots.png "Logo Title Text 1")

When we call `plt.plot`, it's acually calling the `plot` method of the current Axes object.


*Snippet from the matplotlib.pyplot source on Github*
![Pyplot_plot](pyplot_plot.png "Logo Title Text 1")

Focusing on the red-underlined parts, you can see the function really does two things.

1. Get the **current axes** by calling gca()
2. Calls ax.plot(...) on that axes and passes the arguments through.

*The rest has to do with the 'hold' state which determines if new plots are added to a figure or replace the currentplot*

In [None]:
plt.figure()
ax = plt.gca() # 'gca - Get Current Axes'
ax.plot([1, 2, 3, 4, 5], [1, 3, 5, 3, 1])

In [None]:
fig = plt.gcf() # gcf - 'Get Current Figure'
fig.__repr__()

The Axes and Figure objects provide us with an endpoint to access/modify various aspects of a plot

![Anatomy of a Figure](anatomy_of_a_figure.png "Logo Title Text 1")

In [None]:
fig = plt.figure(figsize=(8, 5)) # Create a new figure
ax1 = plt.subplot(1, 2, 1) # Create a subplot (returns an axes)
ax2 = plt.subplot(1, 2, 2) # Create the other subplot (returns an axes)

sns.boxplot(x='class', y='fare', hue='alive', data=titanic, ax=ax2) # Create our titanic plot, tell seaborn to use axes 2

ax1.plot(lga_precip, 'o', markersize=10)
ax1.plot(ewr_precip, 'v', markersize=10)
ax1.plot(jfk_precip, '*', markersize=10)

ax1.set_xlabel("Month") # same as plt.xlabel
ax1.set_ylabel("Precipitation") # same as plt.ylabel
ax1.set_title("Precipitation per month")

ax2.set_title("Titanic Survival")

fig.suptitle("Some plots!")

fig.subplots_adjust(wspace=.5)

This gives us a way to modify the plots that seaborn creates

In [None]:
# Use dataframe columns directly
jp = sns.jointplot(x='total_bill', y='tip', data=tips)
ax = jp.ax_joint # jointplot has 3 axes.  The main one is in this variable

rot = 0
for tick in ax.get_yticklabels(): # Rotate the Y tick labels
    tick.set_rotation(rot)
    rot += 45 
    

## Gridspec for more complicated figure layouts

So far we just showed subplots using multiple axes that evenly divided the plot area.

If you wanted to make a more complicated layout (like what Seaborn does in jointplot above) you can use GridSpec

In [None]:
from matplotlib import gridspec
plt.figure()
gs = gridspec.GridSpec(2, 2,
                       width_ratios=[1,2],
                       height_ratios=[4,1]
                       )

ax1 = plt.subplot(gs[0])
ax2 = plt.subplot(gs[1], sharey=ax1)
ax3 = plt.subplot(gs[2], sharex=ax1)
ax4 = plt.subplot(gs[3], sharex=ax2, sharey=ax3)

for ax in [ax1, ax2, ax3, ax4]:
    ax.plot(lga_precip, 'o', markersize=10)
    ax.plot(ewr_precip, 'v', markersize=10)
    ax.plot(jfk_precip, '*', markersize=10)


In [None]:
# Or even crazier layouts

plt.figure()
gs = gridspec.GridSpec(3, 3)

ax1 = plt.subplot(gs[0, :])   # Use row 1, all columns
ax2 = plt.subplot(gs[1,:-1])  # Use row 2, all columns but the last one
ax3 = plt.subplot(gs[1:, -1]) # Use row 2&3, only the last column
ax4 = plt.subplot(gs[-1,0])   # Use the last row, first column
ax5 = plt.subplot(gs[-1,-2])  # Use the last row, second column

plt.subplots_adjust(wspace=.3, hspace=.3)

# Other plotting tools

- [Plotly](https://plot.ly/python/) (Interactive HTML/JS plots)
- [Bokeh](http://bokeh.pydata.org/en/latest/docs/gallery/les_mis.html) (More interactive HTML/JS plots)
- [ggpy](https://github.com/yhat/ggpy) (ggplot-style plotting in Python)