# Common plots

## Introduction

In this chapter, we'll look at some of the most common plots that you might want to make--and how to create them using the most popular libraries. If you need an introduction to these libraries, see the previous chapter.

Throughout, we'll assume that the data are in a tidy format (one row per observation, one variable per column). First, though, let's import the libraries we'll need.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
import altair as alt
from vega_datasets import data

# Set seed for reproducibility
np.random.seed(10)
# Set max rows displayed for readability
pd.set_option('display.max_rows', 6)
# Nicer matplotlib fonts
plt.style.use({'mathtext.fontset': 'stix',
               'font.family': 'STIXGeneral',
               'figure.figsize': (8, 3)})

## Scatter plot

In this example, we see a simple scatter plot with categories using the cars data:

In [None]:
cars = data.cars()
cars.head()

### Matplotlib

In [None]:
colormap = plt.cm.Set1
colorst = [colormap(i) for i in
           np.linspace(0, 0.9, len(cars['Origin'].unique()))]
fig, ax = plt.subplots()
for i, origin in enumerate(cars['Origin'].unique()):
    cars_sub = cars[cars['Origin'] == origin]
    ax.scatter(cars_sub['Horsepower'],
               cars_sub['Miles_per_Gallon'],
               color=colorst[i],
               label=origin,
               edgecolor='grey')
ax.set_ylabel('Miles per Gallon')
ax.set_xlabel('Horsepower')
ax.legend()
plt.show()

### Seaborn

In [None]:
fig, ax = plt.subplots()
sns.scatterplot(data=cars,
                x="Horsepower",
                y="Miles_per_Gallon",
                hue="Origin",
                ax=ax)
ax.set_ylabel('Miles per Gallon')
ax.set_xlabel('Horsepower')
plt.show()

### Plotnine

In [None]:
(
    ggplot(cars, aes(x="Horsepower",
                     y="Miles_per_Gallon",
                     color='Origin'))
    + geom_point()
    + ylab('Miles per Gallon')
)

### Altair

(with interactivity)

In [None]:
alt.Chart(cars).mark_circle(size=60).encode(
    x='Horsepower',
    y=alt.Y('Miles_per_Gallon', axis=alt.Axis(title='dollar amount'))
    color='Origin',
    tooltip=['Name', 'Origin', 'Horsepower', 'Miles_per_Gallon']
).interactive()

## Line plot

First, let's get some data on GDP growth:

In [None]:
import pandas_datareader.data as web

ts_start_date = pd.to_datetime('1999-01-01')

df = pd.concat([web.DataReader('ticker=RGDP' + x, 'econdb', start=ts_start_date) for x in ['US', 'UK']], axis=1)
df.columns = ['US', 'UK']
df.index.name = 'Date'
df = 100*df.pct_change(4)
df = pd.melt(df.reset_index(),
             id_vars=['Date'],
             value_vars=df.columns,
             value_name='Real GDP growth, %',
             var_name='Country')
df = df.set_index('Date')
df.head()

### Matplotlib

Note that **Matplotlib** prefers data to be one variable per column, in which case we could have just run

```python
fig, ax = plt.subplots()
df.plot(ax=ax)
ax.set_title('Real GDP growth, %', loc='right')
ax.yaxis.tick_right()
```

but we are working with tidy data here, so we'll do the plotting slightly differently.

In [None]:
colormap = plt.cm.Set1
colorst = [colormap(i) for i in
           np.linspace(0, 0.9, len(df['Country'].unique()))]
fig, ax = plt.subplots()
for i, country in enumerate(df['Country'].unique()):
    df_sub = df[df['Country'] == country]
    ax.plot(df_sub.index,
               df_sub['Real GDP growth, %'],
               color=colorst[i],
               label=country,
               lw=2)
ax.set_title('Real GDP growth, %', loc='right')
ax.yaxis.tick_right()
ax.legend()
plt.show()

### Seaborn

Note that **seaborn** prefers not to work with an index value so we use `df.reset_index()` to make the 'date' index column into a regular column in the snippet below:

In [None]:
fig, ax = plt.subplots()
sns.lineplot(x="Date", y="Real GDP growth, %",
             hue="Country",
             data=df.reset_index(),
             ax=ax)
ax.yaxis.tick_right()
plt.show()

### Plotnine

In [None]:
(
    ggplot(df.reset_index(), aes(x='Date',
                                 y='Real GDP growth, %',
                                 color='Country'))
    + geom_line()
)

### Altair

In [None]:
alt.Chart(df.reset_index()).mark_line().encode(
    x='Date:T',
    y='Real GDP growth, %',
    color='Country',
    strokeDash='Country',
)

## Bar chart

Let's see a bar chart, using the 'barley' dataset.

In [None]:
barley = data.barley()
barley = pd.DataFrame(barley.groupby(['site'])['yield'].sum())
barley.head()

### Matplotlib

In [None]:
fig, ax = plt.subplots()
ax.bar(barley.index, barley['yield'], 0.35)
ax.set_title('Yield', loc='left')
plt.show()

### Seaborn

In [None]:
sns.catplot(
    data=barley.reset_index(),
    kind="bar",
    x="site", y="yield"
)

### Plotnine

In [None]:
(
    ggplot(barley.reset_index(), aes(x='site', y='yield'))
    + geom_col()
)

### Altair

In [None]:
alt.Chart(barley.reset_index()).mark_bar().encode(
    x='site',
    y='yield',
).properties(
    width=alt.Step(40)  # controls width of bar.
)

## Kernel density estimate



In [None]:
diamonds = sns.load_dataset("diamonds").sample(1000)
diamonds.head()

In [None]:
alt.Chart(diamonds).transform_density(
    density='carat',
    as_=['carat', 'density'],
    groupby=['cut']
).mark_area(fillOpacity=0.5).encode(
    x='carat:Q',
    y='density:Q',
    color='cut:N',
)