# Graphics and Visualization in Python

Python has a bewildering number of graphics libraries, catering to different needs. If you want a better understanding of the Python visualization landscape, see the following series of blog posts:

- [Python Data Visualization 2018: Why So Many Libraries?](https://www.anaconda.com/blog/developer-blog/python-data-visualization-2018-why-so-many-libraries/)
- [Python Data Visualization 2018: Moving Toward Convergence](https://www.anaconda.com/blog/developer-blog/python-data-visualization-moving-toward-convergence/)
- [Python Data Visualization 2018: Where Do We Go From Here?](https://www.anaconda.com/blog/developer-blog/python-data-visualization-2018-where-do-we-go-from-here/)

However for simple statistical plots, the `matplotlib` and `seaborn` libraries suffice most of the time:

- [Matplotlib tutorial](https://matplotlib.org/3.1.1/tutorials/index.html)
- [Matplotlib gallery](http://matplotlib.org/1.2.1/gallery.html)
- [Seaborn gallery](http://stanford.edu/~mwaskom/software/seaborn/examples/index.html#example-gallery)

As shown below, `pandas` also has useful plotting functionality.

In [None]:
%matplotlib inline
import numpy as np
import numpy.random as rng
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns

In [None]:
import warnings
warnings.filterwarnings("ignore")

## Pandas

You can do plots directly from a Series or DataFrame. See [docs](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html). Some interesting plots available in Pandas are

- Scatter matrix plots
- Andrews curves
- Autocorrelation plots
- Bootstrap plots

In [None]:
url = 'https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv'
df = pd.read_csv(url)

In [None]:
df.head()

In [None]:
fig, axes = plt.subplots(1,3,figsize=(10, 6))
df.sort_values('wt').plot('model', 'wt', kind='barh', ax=axes[0])
df.sort_values('wt').plot('wt', 'mpg', kind='scatter', ax=axes[2])
df.sort_values('wt').plot('wt', 'mpg', kind='area', ax=axes[1])
plt.tight_layout()
pass

## Matplotlib

Matplotlib has a "functional" interface similar to Matlab via the `pyplot` module for simple interactive use, as well as an object-oriented interface that is useful for more complex graphic creations.

### Types of plots

In [None]:
plt.hist(np.random.randn(1000), bins=np.linspace(-4,4,11))
pass

In [None]:
xs = [np.random.normal(mu, 0.5, (100)) for mu in range(5)]

In [None]:
for x in xs:
    plt.hist(x, bins=15, alpha=0.4)

In [None]:
plt.boxplot(np.random.random((6,10)))
pass

In [None]:
plt.scatter(*np.random.uniform(0.1, 0.9, (2,100)),
            s=np.random.randint(10, 200, 100), 
            c=np.random.random(100))
pass

In [None]:
x = y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = X**2 + Y**2
plt.contourf(X, Y, Z, cmap=plt.cm.RdPu)
plt.axis('square')
pass

In [None]:
plt.stem(np.random.random(8))
plt.margins(0.05)
pass

In [None]:
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)

In [None]:
plt.plot(x, y)
plt.axis([0, 2*np.pi, -1.05, 1.05,])
pass

In [None]:
xs = np.c_[np.zeros(10), np.random.choice([-1,1], (10, 100)).cumsum(axis=1)]
plt.plot(xs.T)
plt.title('10 random walks', fontsize=14)
pass

### Displaying arrays

In [None]:
x = np.random.random((80, 80, 3))

In [None]:
plt.imshow(x)
pass

In [None]:
plt.imshow(x, interpolation='bicubic')
pass

In [None]:
plt.imshow(x.mean(axis=-1), cmap='bone')
pass

In [None]:
plt.imshow(x.mean(axis=-1), cmap='Reds')
plt.xticks(range(0, x.shape[1], 4))
plt.yticks(range(0, x.shape[0], 4))
plt.grid(color='white')
ax = plt.gca()
ax.set_xticklabels([])
ax.set_yticklabels([])
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')
pass

### Colors

In [None]:
plt.scatter(*np.random.uniform(0.1, 0.9, (2,100)),
            s=np.random.randint(10, 200, 100), 
            c=np.random.random(100))
pass

In [None]:
plt.scatter(*np.random.uniform(0.1, 0.9, (2,100)),
            s=np.random.randint(10, 200, 100), 
            c=np.random.random(100), cmap='summer')
pass

In [None]:
plt.scatter(*np.random.uniform(0.1, 0.9, (2,100)),
            s=np.random.randint(10, 200, 100), 
            c=np.random.random(100), cmap='hsv')
pass

#### Getting a list of colors from a colormap

Giving an argument of 0.0 < x < 1.0 to a `colormap` gives the appropriate interpolated color.

In [None]:
# find the bottom, middle and top colors of the winter colormap
colors = plt.cm.winter(np.linspace(0, 1, 100))
colors[:3]

In [None]:
plt.scatter(*np.random.uniform(0.1, 0.9, (2,100)),
            s=np.random.randint(10, 200, 100), 
            c=colors)
pass

### Styles

In [None]:
plt.style.available

In [None]:
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)

In [None]:
with plt.style.context('classic'):
    plt.plot(x, y)
    plt.axis([0, 2*np.pi, -1.05, 1.05,])

In [None]:
with plt.style.context('fivethirtyeight'):
    plt.plot(x, y)
    plt.axis([0, 2*np.pi, -1.05, 1.05,])

In [None]:
with plt.style.context('ggplot'):
    plt.plot(x, y)
    plt.axis([0, 2*np.pi, -1.05, 1.05,])

In [None]:
with plt.style.context('seaborn-darkgrid'):
    plt.plot(x, y)
    plt.axis([0, 2*np.pi, -1.05, 1.05,])

In [None]:
with plt.xkcd():
    plt.plot(x, y)
    plt.axis([0, 2*np.pi, -1.05, 1.05,])

### Creating your own style

Many, many options can be configured.

In [None]:
plt.rcParams

In [None]:
%%file foo.mplstyle
axes.grid: True
axes.titlesize : 24
axes.labelsize : 20
lines.linewidth : 3
lines.markersize : 10
xtick.labelsize : 16
ytick.labelsize : 16

In [None]:
with plt.style.context('foo.mplstyle'):
    plt.plot(x, y)
    plt.axis([0, 2*np.pi, -1.05, 1.05,])

### Customizing plots

In [None]:
plt.rcParams.update({'font.size': 22})

fig = plt.figure(figsize=(8,6))
ax = plt.subplot(1,1,1)
plt.plot(x, y, color='red', linewidth=2, linestyle='dashed', label='sine curve')
plt.plot(x, np.cos(x), 'b-', label='cosine curve')
plt.legend(loc='best', fontsize=14)
plt.axis([0, 2*np.pi, -1.05, 1.05,])
plt.xlabel('x')
plt.ylabel('sin(x)')
plt.xticks([0,0.5*np.pi,np.pi,1.5*np.pi,2*np.pi], 
           [0, r'$\frac{\pi}{2}$', r'$\pi$', r'$\frac{3\pi}{2}$', r'$2\pi$'])
plt.title('Sine and Cosine Plots')
plt.text(0.45, 0.9, 'Empty space', transform=ax.transAxes, ha='left', va='top')
pass

In [None]:
x = np.random.randn(100)
plt.hist(x, bins=25, histtype='step', density=True)
mu, sigma = stats.norm.fit(x)
xp = np.linspace(*plt.xlim(), 100)
plt.plot(xp, stats.norm(mu, sigma).pdf(xp))
plt.xlabel('x')
plt.ylabel('Density')
plt.title('MLE fit for normal distribution', fontsize=14)
pass

### Layouts

In [None]:
fig, axes = plt.subplots(2,2,figsize=(8,8))
axes[0,0].plot(x,y, 'r')
axes[0,1].plot(x,y, 'g')
axes[1,0].plot(x,y, 'b')
axes[1,1].plot(x,y, 'k')
for ax in axes.ravel():
    ax.margins(0.05)
pass

In [None]:
ax1 = plt.subplot2grid((3,3), (0,0), colspan=3)
ax2 = plt.subplot2grid((3,3), (1,0), colspan=2)
ax3 = plt.subplot2grid((3,3), (1,2), rowspan=2)
ax4 = plt.subplot2grid((3,3), (2,0), colspan=2)
axes = [ax1, ax2, ax3, ax4]
colors = ['r', 'g', 'b', 'k']
for ax, c in zip(axes, colors):
    ax.plot(x, y, c)
    ax.margins(0.05)
plt.tight_layout()

## Seaborn

Seaborn is a library for statistical visualizations. 

In [None]:
sns.set_context("notebook", font_scale=1.5, rc={"lines.linewidth": 2.5})

### Density plots

In [None]:
xs = rng.normal(0,1,100)

fig, axes = plt.subplots(1, 2, figsize=(8,4))
sns.distplot(xs, hist=False, rug=True, ax=axes[0]);
sns.distplot(xs, hist=True, ax=axes[1])
pass

### Kernel density estimate

In [None]:
sns.kdeplot(np.r_[rng.normal(0,1,50), rng.normal(4,0.8,100)])
pass

In [None]:
iris = sns.load_dataset('iris')

In [None]:
iris.head()

### Joint distribution plot

In [None]:
sns.jointplot(x='petal_length', y='petal_width', data=iris, kind='kdeplot')
pass

### Box and violin plots

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(8,4))

sns.boxplot(x='species', y='petal_length', data=iris, ax=axes[0])
sns.violinplot(x='species', y='petal_length', data=iris, ax=axes[1])
pass

### Composite plots

In [None]:
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv'
titanic = pd.read_csv(url)

In [None]:
titanic.head()

In [None]:
sns.lmplot(x='fare', y='survived', col='alone', row='sex', data=titanic, logistic=True)
pass

In [None]:
g = sns.PairGrid(titanic,
                 y_vars=['fare', 'age'],
                 x_vars=['sex', 'class', 'embark_town' ],
                 aspect=1, size=5.5)
g.map(sns.stripplot, jitter=True, palette="bright")
pass

### Sometimes you need to manipulate the data frame before plotting

In [None]:
iris_tall = pd.melt(iris, id_vars='species')
iris_tall.head()

In [None]:
g = sns.catplot(data=iris_tall, 
                x='variable', 
                y='value', 
                col='species',
                kind='swarm')
g.set_xticklabels(rotation=45)
pass

### Seaborn styles (themes)

In [None]:
sns.set_style('ticks')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(8,4))
sns.distplot(xs, hist=False, rug=True, ax=axes[0]);
sns.distplot(xs, hist=True, ax=axes[1])
pass

In [None]:
sns.set_style('dark')

In [None]:
xs = rng.normal(0,1,100)

fig, axes = plt.subplots(1, 2, figsize=(8,4))
sns.distplot(xs, hist=False, rug=True, ax=axes[0]);
sns.distplot(xs, hist=True, ax=axes[1])
pass

In [None]:
sns.set_style('darkgrid')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(8,4))
sns.distplot(xs, hist=False, rug=True, ax=axes[0]);
sns.distplot(xs, hist=True, ax=axes[1])
pass

## plotnine


If you just want a clone of `ggplot2` in Python, check out [plotnine](https://plotnine.readthedocs.io/en/latest/gallery.html)

### Grammar of Graphics

![](https://blog.gramener.com/wp-content/uploads/2018/11/7-layers-of-grammar-of-graphics-to-tell-powerful-data-stories-3.png)

Source: http://blog.gramener.com/grammar-of-graphics-data-stories/

In [None]:
from plotnine import ggplot, geom_point, aes, stat_smooth, facet_wrap, theme_xkcd
from plotnine.data import mtcars

(
    ggplot(mtcars, aes('wt', 'mpg', color='factor(gear)'))
    + geom_point()
    + stat_smooth(method='lm')
    + facet_wrap('~gear')
    + theme_xkcd()
)