# CHAPTER 9
# Plotting and Visualization
- Making informative visualizations is one of the most important tasks in data analysis. 
- It may be a part of the exploratory process - for example, to help identify outliers or needed data transformations, or as a way of generating ideas for models.
- This chapter will focus on [**matplotlib**](https://matplotlib.org/) and [**seaborn**](http://seaborn.pydata.org/).
- The matplotlib gallery and documentation are the best resource for learning advanced features.
- To use interactive plotting in the Jupyter notebook you need to run the following statement:
        %matplotlib notebook

## A Brief matplotlib API Primer

In [29]:
# Import libraries
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import randn
%matplotlib notebook

In [30]:
# Create dummy data for a line plot
data = np.arange(10)

plt.plot(data)

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x1982e277a00>]

### Figures and Subplots
- Plots in **matplotlib** reside within a **Figure** object. 
- You can create a new figure with **plt.figure**.
- **plt.figure** has a number of options: **figsize** will guarantee the figure has a certain size and aspect ratio if saved to disk.
- You can’t make a plot with a blank figure. You have to create one or more subplots using **add_subplot**.
- In **Jupyter notebooks** plots reset after each cell is evaluated, so for more complex plots you must put all of the plotting commands in a single notebook cell.
- When you issue a plotting command like **plt.plot**, matplotlib draws on the last figure and subplot used (creating one if necessary), thus hiding the figure and subplot creation.

In [31]:
# Create a new empty Figure object
fig = plt.figure()

# Create subplots for a 2x2 Figure (contains 4 subplots)
ax1 = fig.add_subplot(2, 2, 1) # first subplot
ax2 = fig.add_subplot(2, 2, 2) # second subplot
ax3 = fig.add_subplot(2, 2, 3) # thirsd subplot

<IPython.core.display.Javascript object>

In [32]:
# Running the plt.plot command matplotlib draws on the last figure and subplot used
# In our case this is subplot 3 from above
plt.plot(np.random.randn(50).cumsum(), 'k--')

[<matplotlib.lines.Line2D at 0x1982e296df0>]

- The **'k--'** is a style option instructing matplotlib to plot a black **dashed** line. 
- The objects returned by **fig.add_subplot** are **AxesSubplot objects**, on which you can directly plot on the other empty subplots by calling each one’s instance method.

In [33]:
# Plot a histogram in the first AxesSubplot object from above
ax1.hist(np.random.randn(100), bins=20, color='k', alpha=0.3)

# Plot a scatter plot in the second AxesSubplot object from above
ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.randn(30))

<matplotlib.collections.PathCollection at 0x1982f9938e0>

- Creating a figure with a grid of subplots is a very common task, so matplotlib includes a convenience method, **plt.subplots**, that creates a new figure and returns a NumPy array containing the created subplot objects.
- This is very useful, as the axes array can be easily indexed like a two-dimensional array; for example, axes[0, 1]. 
- You can also indicate that subplots should have the same x- or y-axis using **sharex** and **sharey**, respectively.

In [34]:
# Create a grid of subplots
fig, axes = plt.subplots(2, 3)
axes

<IPython.core.display.Javascript object>

array([[<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>],
       [<AxesSubplot:>, <AxesSubplot:>, <AxesSubplot:>]], dtype=object)

**TABLE**: pyplot.subplots options

| Argument                  | Description |
| :---                  |    :----    |
|nrows| Number of rows of subplots
|ncols| Number of columns of subplots
|sharex| All subplots should use the same x-axis ticks (adjusting the xlim will affect all subplots)
|sharey| All subplots should use the same y-axis ticks (adjusting the ylim will affect all subplots)
|subplot_kw| Dict of keywords passed to add_subplot call used to create each subplot
|**fig_kw| Additional keywords to subplots are used when creating the figure, such as plt.subplots(2, 2, figsize=(8, 6))

### Adjusting the spacing around subplots
- By default **matplotlib** leaves a certain amount of **padding** around the outside of the subplots and spacing between subplots. 
- This spacing is all specified relative to the **height and width** of the plot, so that if you resize the plot either programmatically or manually using the GUI window, the plot will dynamically adjust itself. 
- You can change the spacing using the **subplots_adjust** method on Figure objects, also available as a top-level function:
        subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)
- **wspace and hspace** controls the percent of the figure width and figure height, respectively, to use as spacing between subplots.

In [35]:
# Create a grid of 4 subplots
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
for i in range(2):
    for j in range(2):
        axes[i, j].hist(np.random.randn(500), bins=50, color='k', alpha=0.5)

# Adjust the spacing all the way to zero
plt.subplots_adjust(wspace=0, hspace=0)

<IPython.core.display.Javascript object>

### Colors, Markers, and Line Styles
- Matplotlib’s main **plot** function accepts arrays of x and y coordinates and optionally a string abbreviation indicating color and line style.
- There are a number of **color abbreviations** provided for commonly used colors, but you can use any color on the spectrum by specifying its hex code.
        ax.plot(x, y, 'g--')  # plot x vs y with green dashes
        ax.plot(x, y, linestyle='--', color='g')  # specify linestyle & color separately
- Line plots can additionally have **markers** to highlight the actual data points. The marker can be part of the **style string**, which must have color followed by marker type and line style

In [36]:
# Line plot with markers 'k' = black color, 'o' = whole circle markers, '--' = dashed line

# Create a new Figure object
plt.figure()
#plt.plot(randn(30).cumsum(), 'ko--')
plt.plot(randn(30).cumsum(), color='k', linestyle='dashed', marker='o')

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x198315b5f70>]

In [38]:
# By default points are linearly interpolated by default
# This can be altered with the drawstyle option

# Create an array
data = np.random.randn(30).cumsum()

# Create a new Figure object
plt.figure()

# Line plot with default linearly interpolated points
plt.plot(data, 'k--', label='Default')

# Line plot with step-wise interpolated points
plt.plot(data, 'k-', drawstyle='steps-post', label='steps-post')

# Add a legend
plt.legend(loc='best')

<IPython.core.display.Javascript object>

<matplotlib.legend.Legend at 0x19835602700>

### Ticks, Labels, and Legends
#### Setting the title, axis labels, ticks, and ticklabels

In [41]:
# Create a new Figure ogject and plot of a random walk
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(np.random.randn(1000).cumsum())

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0x198366df790>]

In [42]:
# To change the x-axis ticks, it’s easiest to use set_xticks and set_xticklabels
ticks = ax.set_xticks([0, 250, 500, 750, 1000])

# Set the labels for the ticks
# The rotation option sets the x tick labels at a 30-degree rotation
labels = ax.set_xticklabels(['one', 'two', 'three', 'four', 'five'],
                            rotation=30, fontsize='small')

# Set the plot title
ax.set_title('My first matplotlib plot')

# Set the x-axis label
ax.set_xlabel('Stages')

Text(0.5, 36.283333333333324, 'Stages')

### Adding legends