# Common Plots

## Introduction

In this chapter, we'll look at some of the most common plots that you might want to make--and how to create them using the most popular libraries. If you need an introduction to these libraries, see the previous chapter.

Bear in mind that for many of the **matplotlib** examples, using the `df.plot.*` syntax can get the plot you want more quickly! To be more comprehensive, the solution for any kind of data is shown in the examples below.

Throughout, we'll assume that the data are in a tidy format (one row per observation, one variable per column). Remember that all Altair plots can be made interactive by adding `.interactive()` at the end.

First, though, let's import the libraries we'll need.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
import altair as alt
from vega_datasets import data
import os

# Set seed for reproducibility
np.random.seed(10)
# Set max rows displayed for readability
pd.set_option('display.max_rows', 6)
# Nicer matplotlib fonts
plt.style.use({'mathtext.fontset': 'stix',
               'font.family': 'STIXGeneral',
               'figure.figsize': (10, 5.5),
               'xtick.labelsize': 20,
               'ytick.labelsize': 20,
               'font.size': 20})

## Facets

This applies to all plots, so in some sense is common! Facets, aka panels or small multiples, are ways of showing the same chart multiple times. Let's see how to achieve them in a few of the most popular plotting libraries.

We'll use the cars dataset for this.

In [None]:
df = sns.load_dataset('tips')
df.head()

### Matplotlib

There are many ways to create facets using Matplotlib, and you can get facets in any shape or sizes you like. 

The easiest way, though, is to specify the number of rows and columns. This is achieved by specifying `nrows` and `ncols` when calling `plt.subplots`. It returns an array of shape `(nrows, ncols)` of `Axes` objects. For most purposes, you'll want to flatten these to a vector before iterating over them.

In [None]:
# This part just to get some colours
colormap = plt.cm.Dark2

fig, axes = plt.subplots(nrows=1, ncols=4, sharex=True, sharey=True)
flat_axes = axes.flatten() # Not needed with 1 row or 1 col, but good to be aware of

facet_grp = list(df['day'].unique())
colorst = [colormap(i) for i in
           np.linspace(0, 0.9, len(facet_grp))]
for i, ax in enumerate(flat_axes):
    sub_df = df.loc[df['day'] == facet_grp[i]]
    ax.scatter(sub_df['tip'],
               sub_df['total_bill'],
               s=30,
               color=colorst[i])
    ax.set_title(facet_grp[i])
fig.text(0.5, 0.01, 'Tip', ha='center')
fig.text(0.0, 0.5, 'Total bill', va='center', rotation='vertical')
plt.tight_layout()
plt.show()

Different facet sizes are possible in numerous ways. In practice, it's often better to have evenly sized facets laid out in a grid--especially each facet is of the same x and y axes. But, just to show it's possible, here's an example that gives more space to the weekend than to weekdays using the tips dataset: 

In [None]:
# This part just to get some colours
colormap = plt.cm.Dark2

fig = plt.figure(constrained_layout=True)
ax_dict = fig.subplot_mosaic(
    [['Thur', 'Fri', 'Sat', 'Sat', 'Sun', 'Sun']])
facet_grp = list(df['day'].unique())
colorst = [colormap(i) for i in
           np.linspace(0, 0.9, len(facet_grp))]
for i, grp  in enumerate(facet_grp):
    sub_df = df.loc[df['day'] == facet_grp[i]]
    ax_dict[grp].scatter(sub_df['tip'],
               sub_df['total_bill'],
               s=30,
               color=colorst[i])
    ax_dict[grp].set_title(facet_grp[i])
    ax_dict[grp].set_ylim(0., df['total_bill'].max())
    ax_dict[grp].set_xlim(0., df['tip'].max())
fig.text(0.5, 0.01, 'Tip', ha='center')
fig.text(0.0, 0.5, 'Total bill', va='center', rotation='vertical')
plt.tight_layout()
plt.show()

As well as using lists, you can also specify the layout using an array or using text, eg

In [None]:
axd = plt.figure(constrained_layout=True).subplot_mosaic(
    """
    ABD
    CCD
    CC.
    """
)
kw = dict(ha="center", va="center", fontsize=60, color="darkgrey")
for k, ax in axd.items():
    ax.text(0.5, 0.5, k, transform=ax.transAxes, **kw)

### Seaborn

Seaborn makes it easy to quickly create facet plots. Note the use of `col_wrap`.

In [None]:
sns.relplot(
    data=df, y="total_bill", x="tip",
    col="day", hue="day", col_wrap=2,
    kind="scatter"
);

A nice feature of seaborn that is much more fiddly in (base) matplotlib is the ability to specify rows and columns separately: (smoker)

In [None]:
sns.relplot(
    data=df, y="total_bill", x="tip",
    col="day", row="smoker", hue="smoker",
    kind="scatter"
);

### Plotnine

Plotnine has several ways to wrap facets but perhaps the most delightful is to specify a formula for the variations to be encoded in the facets.

In [None]:
(
    ggplot(df, aes(x='tip', y='total_bill', color='smoker'))
    + geom_point()
    + facet_wrap('~ smoker + day', nrow=2) # use ~ + to add additional faceting variables
)

### Altair



In [None]:
alt.Chart(df).mark_point().encode(
    x='tip:Q',
    y='total_bill:Q',
    color='smoker:N',
    facet=alt.Facet('day:N', columns=2),
).properties(
    width=200,
    height=100,
)

## Scatter plot

In this example, we see a simple scatter plot with categories using the cars data:

In [None]:
cars = data.cars()
cars.head()

### Matplotlib

In [None]:
fig, ax = plt.subplots()
for origin in cars['Origin'].unique():
    cars_sub = cars[cars['Origin'] == origin]
    ax.scatter(cars_sub['Horsepower'],
               cars_sub['Miles_per_Gallon'],
               label=origin)
ax.set_ylabel('Miles per Gallon')
ax.set_xlabel('Horsepower')
ax.legend()
plt.show()

### Seaborn

In this first example, I'll also show how to tweak the labels by using the underlying matplolib `Axes` object, here called `ax`.

In [None]:
fig, ax = plt.subplots()
sns.scatterplot(data=cars,
                x="Horsepower",
                y="Miles_per_Gallon",
                hue="Origin",
                ax=ax)
ax.set_ylabel('Miles per Gallon')
ax.set_xlabel('Horsepower')
plt.show()

### Plotnine

In [None]:
(
    ggplot(cars, aes(x="Horsepower",
                     y="Miles_per_Gallon",
                     color='Origin'))
    + geom_point()
    + ylab('Miles per Gallon')
)

### Altair

For this first example, we'll also show how to make the altair plot interactive with movable axes and more info on mouse-hover.

In [None]:
alt.Chart(cars).mark_circle(size=60).encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
    tooltip=['Name', 'Origin', 'Horsepower', 'Miles_per_Gallon']
).interactive()

## Bubble plot

This is a scatter plot where the size of the point carries an extra dimension of information.

### Matplotlib



In [None]:
fig, ax = plt.subplots()
scat = ax.scatter(cars['Horsepower'],
               cars['Miles_per_Gallon'],
               s = cars['Displacement'],
               alpha=0.4)
ax.set_ylabel('Miles per Gallon')
ax.set_xlabel('Horsepower')
ax.legend(*scat.legend_elements(prop="sizes", num=4), loc="upper right", title="Displacement", frameon=False)
plt.show()

### Seaborn



In [None]:
sns.scatterplot(data=cars,
                x="Horsepower",
                y="Miles_per_Gallon",
                size="Displacement");

### Plotnine

In [None]:
(
    ggplot(cars, aes(x="Horsepower",
                     y="Miles_per_Gallon",
                     size='Displacement'))
    + geom_point()
)

### Altair


In [None]:
alt.Chart(cars).mark_circle().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    size='Displacement'
)

## Line plot

First, let's get some data on GDP growth:

In [None]:
import pandas_datareader.data as web

ts_start_date = pd.to_datetime('1999-01-01')

df = pd.concat([web.DataReader('ticker=RGDP' + x, 'econdb', start=ts_start_date) for x in ['US', 'UK']], axis=1)
df.columns = ['US', 'UK']
df.index.name = 'Date'
df = 100*df.pct_change(4)
df = pd.melt(df.reset_index(),
             id_vars=['Date'],
             value_vars=df.columns,
             value_name='Real GDP growth, %',
             var_name='Country')
df = df.set_index('Date')
df.head()

### Matplotlib

Note that **Matplotlib** prefers data to be one variable per column, in which case we could have just run

```python
fig, ax = plt.subplots()
df.plot(ax=ax)
ax.set_title('Real GDP growth, %', loc='right')
ax.yaxis.tick_right()
```

but we are working with tidy data here, so we'll do the plotting slightly differently.

In [None]:
colormap = plt.cm.Set1
colorst = [colormap(i) for i in
           np.linspace(0, 0.9, len(df['Country'].unique()))]
fig, ax = plt.subplots()
for i, country in enumerate(df['Country'].unique()):
    df_sub = df[df['Country'] == country]
    ax.plot(df_sub.index,
               df_sub['Real GDP growth, %'],
               color=colorst[i],
               label=country,
               lw=2)
ax.set_title('Real GDP growth, %', loc='right')
ax.yaxis.tick_right()
ax.legend()
plt.show()

### Seaborn

Note that **seaborn** prefers not to work with an index value so we use `df.reset_index()` to make the 'date' index column into a regular column in the snippet below:

In [None]:
fig, ax = plt.subplots()
sns.lineplot(x="Date", y="Real GDP growth, %",
             hue="Country",
             data=df.reset_index(),
             ax=ax)
ax.yaxis.tick_right()
plt.show()

### Plotnine

In [None]:
(
    ggplot(df.reset_index(), aes(x='Date',
                                 y='Real GDP growth, %',
                                 color='Country'))
    + geom_line()
)

### Altair

In [None]:
alt.Chart(df.reset_index()).mark_line().encode(
    x='Date:T',
    y='Real GDP growth, %',
    color='Country',
    strokeDash='Country',
)

## Bar chart

Let's see a bar chart, using the 'barley' dataset.

In [None]:
barley = data.barley()
barley = pd.DataFrame(barley.groupby(['site'])['yield'].sum())
barley.head()

### Matplotlib

Just remove the 'h' in `ax.barh` to get a vertical plot.

In [None]:
fig, ax = plt.subplots()
ax.barh(barley['yield'].index, barley['yield'], 0.35)
ax.set_xlabel('Yield')
plt.show()

### Seaborn

Just switch x and y variables to get a vertical plot.

In [None]:
sns.catplot(
    data=barley.reset_index(),
    kind="bar",
    y="site", x="yield",
)

### Plotnine

Just omit `coord_flip()` to get a vertical plot.

In [None]:
(
    ggplot(barley.reset_index(), aes(x='site', y='yield'))
    + geom_col()
    + coord_flip()
)

### Altair

Just switch x and y to get a vertical plot.

In [None]:
alt.Chart(barley.reset_index()).mark_bar().encode(
    y='site',
    x='yield',
).properties(
    width=alt.Step(40)  # controls width of bar.
)

## Grouped bar chart



In [None]:
barley = data.barley()
barley = pd.DataFrame(barley.groupby(['site', 'year'])['yield'].sum()).reset_index()
barley.head()

### Matplotlib

In [None]:
labels = barley['site'].unique()
y = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots()
ax.barh(y - width/2, barley.loc[barley['year'] == 1931, 'yield'], width, label='1931')
ax.barh(y + width/2, barley.loc[barley['year'] == 1932, 'yield'], width, label='1932')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_xlabel('Yield')
ax.set_yticks(y)
ax.set_yticklabels(labels)
ax.legend(frameon=False)
plt.show()

### Seaborn

In [None]:
sns.catplot(
    data=barley,
    kind="bar",
    y="site", x="yield",
    hue="year"
)

### Plotnine

In [None]:
(
    ggplot(barley, aes(x='site', y='yield', fill='factor(year)'))
    + geom_col(position='dodge')
    + coord_flip()
)

### Altair


In [None]:
alt.Chart(barley.reset_index()).mark_bar().encode(
    y='year:O',
    x='yield',
    color='year:N',
    row='site:N'
).properties(
    width=alt.Step(40)  # controls width of bar.
)

## Stacked bar chart



### Matplotlib 

In [None]:
labels = barley['site'].unique()
y = np.arange(len(labels))  # the label locations
width = 0.35  # the width (or height) of the bars

fig, ax = plt.subplots()
ax.barh(y, barley.loc[barley['year'] == 1931, 'yield'], width, label='1931')
ax.barh(y, barley.loc[barley['year'] == 1932, 'yield'], width, label='1932', left=barley.loc[barley['year'] == 1931, 'yield'])

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_xlabel('Yield')
ax.set_yticks(y)
ax.set_yticklabels(labels)
ax.legend(frameon=False)
plt.show()

### Seaborn

As far as I know, there's no easy way of doing this.

### Plotnine



In [None]:
(
    ggplot(barley, aes(x='site', y='yield', fill='factor(year)'))
    + geom_col()
    + coord_flip()
)

### Altair

In [None]:
alt.Chart(barley.reset_index()).mark_bar().encode(
    y='site',
    x='yield',
    color='year:N',
).properties(
    width=alt.Step(40)  # controls width of bar.
)

## Kernel density estimate

We'll use the diamonds dataset to demonstrate this.

In [None]:
diamonds = sns.load_dataset("diamonds").sample(1000)
diamonds.head()

### Matplotlib

Technically, there is a way to do this but it's pretty inelegant if you want a quick plot. That's because **matplotlib** doesn't do the density estimation itself. [Jake Vanderplas](https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html) has a nice example but as it relies on a few extra libraries, I won't reproduce it here.

### Seaborn



In [None]:
sns.displot(diamonds,
            x="carat", kind="kde", hue='cut',
            fill=True);

### Plotnine



In [None]:
(
   ggplot(diamonds, aes(x='carat', fill = 'cut', colour = 'cut')) +
   geom_density(alpha=0.5)
)

### Altair

In [None]:
alt.Chart(diamonds).transform_density(
    density='carat',
    as_=['carat', 'density'],
    groupby=['cut']
).mark_area(fillOpacity=0.5).encode(
    x='carat:Q',
    y='density:Q',
    color='cut:N',
)

## Histogram or probability density function

For this, let's go back to the penguins dataset.

In [None]:
penguins = sns.load_dataset("penguins")
penguins.head()

### Matplotlib

The `density=` keyword parameter decides whether to create counts or a probability density function.

In [None]:
fig, ax = plt.subplots()
ax.hist(penguins['flipper_length_mm'], bins=30, density=True, edgecolor='k')
ax.set_xlabel('Flipper length (mm)')
ax.set_ylabel('Probability density')
fig.tight_layout()
plt.show()

### Seaborn

In [None]:
sns.histplot(data=penguins, x="flipper_length_mm", bins=30, stat='density');

### Plotnine



In [None]:
(
    ggplot(penguins, aes(x='flipper_length_mm', y='stat(density)'))
    + geom_histogram(bins=30) # specify the binwidth
)

### Altair



In [None]:
alt.Chart(penguins).mark_bar().encode(
    alt.X("flipper_length_mm:Q", bin=True),
    y='count()',
)

## Marginal histograms



### Maplotlib

[Jaker Vanderplas's excellent notes](https://jakevdp.github.io/PythonDataScienceHandbook/04.08-multiple-subplots.html) have a great example of this, but now there's an easier way to do it with Matplotlib's new `constrained_layout` options.

In [None]:
fig = plt.figure(constrained_layout=True)
# Create a layout with 3 panels in the given ratios
axes_dict = fig.subplot_mosaic([['.', 'histx'], ['histy', 'scat']],
                         gridspec_kw={'width_ratios': [1, 7],
                                      'height_ratios': [2, 7]})
# Glue all the relevant axes together
axes_dict['histy'].invert_xaxis()
axes_dict['histx'].sharex(axes_dict['scat'])
axes_dict['histy'].sharey(axes_dict['scat'])
# Plot the data
axes_dict['scat'].scatter(penguins['bill_length_mm'], penguins['bill_depth_mm'])
axes_dict['histx'].hist(penguins['bill_length_mm'])
axes_dict['histy'].hist(penguins['bill_depth_mm'], orientation='horizontal');

### Seaborn

In [None]:
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm");

### Plotnine

I couldn't find an easy way to do this in plotnine but you can make rug plots, which have some similarities in terms of information conveyed.

In [None]:
(ggplot(penguins, aes(x='bill_length_mm', y='bill_depth_mm')) +
  geom_point() +
  geom_rug())

### Altair

This is a bit fiddly.

In [None]:
base = alt.Chart(penguins)

xscale = alt.Scale(domain=(20, 60))
yscale = alt.Scale(domain=(10, 30))

area_args = {'opacity': .5, 'interpolate': 'step'}

points = base.mark_circle().encode(
   alt.X('bill_length_mm', scale=xscale),
   alt.Y('bill_depth_mm', scale=yscale)
)

top_hist = base.mark_area(**area_args).encode(
    alt.X('bill_length_mm:Q',
          # when using bins, the axis scale is set through
          # the bin extent, so we do not specify the scale here
          # (which would be ignored anyway)
          bin=alt.Bin(maxbins=30, extent=xscale.domain),
          stack=None,
          title=''
         ),
    alt.Y('count()', stack=None, title='')
).properties(height=60)

right_hist = base.mark_area(**area_args).encode(
    alt.Y('bill_depth_mm:Q',
          bin=alt.Bin(maxbins=30, extent=yscale.domain),
          stack=None,
          title='',
         ),
    alt.X('count()', stack=None, title=''),
).properties(width=60)

top_hist & (points | right_hist)

## Heatmap

Heatmaps, or sometimes known as correlation maps, represent data in 3 dimensions by having two axes that forms a grid showing colour that corresponds to (usually) continuous values.

We'll use the flights data to show the number of passengers by month-year:

In [None]:
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers").T
flights.head()

### Matplotlib



In [None]:
fig, ax = plt.subplots()
im = ax.imshow(flights.values, cmap='inferno')
cbar = ax.figure.colorbar(im, ax=ax)
ax.set_xticks(np.arange(len(flights.columns)))
ax.set_yticks(np.arange(len(flights.index)))
# Labels
ax.set_xticklabels(flights.columns, rotation=90)
ax.set_yticklabels(flights.index)
plt.show()

### Seaborn

In [None]:
sns.heatmap(flights);

### Plotnine

Plotnine uses tidy data, rather than the wide data preferred by **matplotlib**, so we need to first get the original format of the flights data back:

In [None]:
flights = sns.load_dataset("flights")
(ggplot(flights, aes('month', 'factor(year)', fill='passengers'))
 + geom_tile()
 + scale_y_reverse()
)

### Altair

In [None]:
alt.Chart(flights).mark_rect().encode(
    x=alt.X('month', type='nominal', sort=None),
    y='year:O',
    color='passengers:Q'
)

## Boxplot

Let's use the tips dataset:

In [None]:
tips = sns.load_dataset("tips")
tips.head()

### Matplotlib

There isn't a very direct way to create multiple box plots of different data in matplotlib in the case where the groups are unbalanced, so we create several different boxplot objects.


In [None]:
colormap = plt.cm.Set1
colorst = [colormap(i) for i in
           np.linspace(0, 0.9, len(tips['time'].unique()))]

fig, ax = plt.subplots()
for i, grp in enumerate(tips['time'].unique()):
    bplot = ax.boxplot(tips.loc[tips['time']==grp, 'tip'],
               positions=[i],
               vert=True,  # vertical box alignment
               patch_artist=True,  # fill with color
               labels=[grp]) # X label
    for patch in bplot['boxes']:
        patch.set_facecolor(colorst[i])

ax.set_ylabel('Tip')
plt.show()

### Seaborn


In [None]:
sns.boxplot(y=tips['tip'], x=tips['time']);

### Plotnine



In [None]:
(
    ggplot(tips)
    + geom_boxplot(aes(y='tip', x='time', fill='time'))
)

### Altair

In [None]:
alt.Chart(tips).mark_boxplot(size=50).encode(
    x='time:N',
    y='tip:Q',
    color='time:N'
).properties(width=300)

## Violin plot

We'll use the same data as before, the tips dataset.

### Matplotlib

In [None]:
colormap = plt.cm.Set1
colorst = [colormap(i) for i in
           np.linspace(0, 0.9, len(tips['time'].unique()))]

fig, ax = plt.subplots()
for i, grp in enumerate(tips['time'].unique()):
    vplot = ax.violinplot(tips.loc[tips['time']==grp, 'tip'],
               positions=[i],
               vert=True)
labels = list(tips['time'].unique())
ax.set_xticks(np.arange(len(labels)))
ax.set_xticklabels(labels)
ax.set_ylabel('Tip')
plt.show()

### Seaborn

In [None]:
sns.violinplot(y=tips['tip'], x=tips['time']);

### Plotnine

In [None]:
(
    ggplot(tips, aes(x='time', y='tip', fill='time'))
    + geom_violin()
)

### Altair

In [None]:
alt.Chart(tips).transform_density(
    'tip',
    as_=['tip', 'density'],
    groupby=['time']
).mark_area(orient='horizontal').encode(
    y='tip:Q',
    color='time:N',
    x=alt.X(
        'density:Q',
        stack='center',
        impute=None,
        title=None,
        axis=alt.Axis(labels=False, values=[0],grid=False, ticks=True),
    ),
    column=alt.Column(
        'time:N',
        header=alt.Header(
            titleOrient='bottom',
            labelOrient='bottom',
            labelPadding=0,
        ),
    )
).properties(
    width=100
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
)

## Lollipop

In [None]:
df = (sns.load_dataset('planets')
         .groupby('year')['number']
         .count())
df.head()

### Matplotlib


In [None]:
fig, ax = plt.subplots()
ax.stem(df.index, df)
ax.yaxis.tick_right()
ax.set_ylim(0, 200)
ax.set_title('Number of exoplanets discovered per year', loc='right')
plt.show()

### Seaborn

I couldn't find a way to do this in **seaborn**, but the **matplotlib** is so simple that it doesn't seem there's much call for it.

### Plotnine

In [None]:
(
  ggplot(df.reset_index(), aes(x='year', y='number')) +
  geom_point() + 
  geom_segment(aes(x='year', xend='year', y=0, yend='number')) +
  ggtitle('Number of exoplanets discovered per year')
)

### Altair

I couldn't find an easy way to do this in Altair.

## Stacked Area plot

For this, let's look at the dominance of the three most used methods for detecting exoplanets.

In [None]:
df = sns.load_dataset('planets')
most_pop_methods = (df.groupby(['method'])['number']
                      .sum()
                      .sort_values(ascending=False)
                      .index[:3]
                      .values)
df = df[df['method'].isin(most_pop_methods)]
df.head()

### Matplotlib

The easiest way to do this in matplotlib is to adjust the data a bit first and then use the built-in **pandas** plot function. (This is true in other cases too, but in this case it's much more complex otherwise).

In [None]:
(df.groupby(['year', 'method'])['number']
   .sum()
   .unstack()
   .plot
   .area(alpha=0.6)
   .set_title('Planets dicovered by top 3 methods', loc='left'));

### Seaborn

I couldn't find an option for this.

### Plotnine



In [None]:
(
    ggplot(df.groupby(['year', 'method'])['number'].sum().reset_index(),
           aes(x='year', y='number', fill='method', order='method')) + 
    geom_area(alpha=0.5)
)

### Altair


In [None]:
alt.Chart(df.groupby(['year', 'method'])['number'].sum().reset_index()).mark_area().encode(
    x="year:T",
    y="number:Q",
    color="method:N"
)

## Slope chart

A slope chart has two points connected by a line and is good for indicating how relationships between variables have changed over time.

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")
df = (pd.melt(df, id_vars=['continent'], value_vars=df.columns[1:],
              value_name='GDP per capita', var_name='Year')
        .rename(columns={'continent': 'Continent'}))
df.head()

### Matplotlib

There isn't an off-the-shelf way to do this in matplotlib but the example below shows that, with matplotlib, where there's a will there's a way! It's where the 'build-what-you-want' comes into its own. Note that the functino that's defined returns an `Axes` object so that you can do further processing and tweaking as you like.

In [None]:
from matplotlib import lines as mlines

def slope_plot(data, x, y, group, before_txt='Before', after_txt='After'):
    if(len(data[x].unique())!=2):
        raise ValueError('Slope plot must have two unique periods.')
    wide_data = data[[x, y, group]].pivot(index=group, columns=x, values=y)
    x_names = list(wide_data.columns)
    klass = ['red' if (y1-y2) < 0
             else 'green' for y1, y2 in zip(wide_data[x_names[0]], wide_data[x_names[1]])]
    fig, ax = plt.subplots()
    def newline(p1, p2, color='black'):
        ax = plt.gca()
        l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='red' if p1[1]-p2[1] > 0 else 'green', marker='o',                          markersize=6)
        ax.add_line(l)
        return l
    
    # Vertical Lines
    y_min = data[y].min()
    y_max = data[y].max()
    ax.vlines(x=1, ymin=y_min, ymax=y_max, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
    ax.vlines(x=3, ymin=y_min, ymax=y_max, color='black', alpha=0.7, linewidth=1, linestyles='dotted')
    # Points
    ax.scatter(y=wide_data[x_names[0]], x=np.repeat(1, wide_data.shape[0]), s=15, color='black', alpha=0.7)
    ax.scatter(y=wide_data[x_names[1]], x=np.repeat(3, wide_data.shape[0]), s=15, color='black', alpha=0.7)
    # Line Segmentsand Annotation
    for p1, p2, c in zip(wide_data[x_names[0]], wide_data[x_names[1]], wide_data.index):
        newline([1,p1], [3,p2])
        ax.text(1-0.05, p1, c, horizontalalignment='right', verticalalignment='center', fontdict={'size':14})
        ax.text(3+0.05, p2, c, horizontalalignment='left', verticalalignment='center', fontdict={'size':14})
    # 'Before' and 'After' Annotations
    ax.text(1-0.05, y_max + abs(y_max)*.1, before_txt, horizontalalignment='right', verticalalignment='center',             fontdict={'size':18, 'weight':700})
    ax.text(3+0.05, y_max + abs(y_max)*.1, after_txt, horizontalalignment='left', verticalalignment='center',               fontdict={'size':18, 'weight':700})
    # Decoration
    ax.set(xlim=(0, 4), ylabel=y, ylim=(y_min - 0.1*abs(y_min), y_max + abs(y_max)*.1))
    ax.set_xticks([1, 3])
    ax.set_xticklabels(x_names)
    # Lighten borders
    for ax_pos in ['top', 'bottom', 'right', 'left']:
        ax.spines[ax_pos].set_visible(False)
    return ax

slope_plot(df, x='Year', y='GDP per capita', group='Continent');

### Seaborn

In [None]:
sns.pointplot(x="Year", y="GDP per capita", hue='Continent', data=df)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);

### Plotnine

In [None]:
(
ggplot(df, aes(x='Year', y = 'GDP per capita', group = 'Continent')) +
  geom_line(aes(color = 'Continent', alpha = 1), size = 2) +
  geom_point(aes(color = 'Continent', alpha = 1), size = 4)
)

### Altair

In [None]:
alt.Chart(df).mark_line().encode(
    x='Year:O',
    y='GDP per capita',
    color='Continent'
)

## Dumbbell Plot

These are excellent for showing a change in time with a large number of categories, as we will do here with continents and mean GDP per capita.

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/gdppercap.csv")
df = (pd.melt(df, id_vars=['continent'], value_vars=df.columns[1:],
              value_name='GDP per capita', var_name='Year')
        .rename(columns={'continent': 'Continent'}))
df.head()

### Matplotlib

Again, no off-the-shelf method--but that's no problem when you can build it yourself.

In [None]:
from matplotlib import lines as mlines

def dumbbell_plot(data, x, y, change):
    if(len(data[x].unique())!=2):
        raise ValueError('Dumbbell plot must have two unique periods.')
    if(type(data[y].iloc[0])!=str):
        raise ValueError('Dumbbell plot y variable only works with category values.')
    wide_data = data[[x, y, change]].pivot(index=y, columns=x, values=change)
    x_names = list(wide_data.columns)
    y_names = list(wide_data.index)
    def newline(p1, p2, color='black'):
        ax = plt.gca()
        l = mlines.Line2D([p1[0],p2[0]], [p1[1],p2[1]], color='skyblue', zorder=0)
        ax.add_line(l)
        return l
    
    fig, ax = plt.subplots()
    # Points
    ax.scatter(y=range(len(y_names)), x=wide_data[x_names[1]],
               s=50, color='#0e668b', alpha=0.9, zorder=2,
               label=x_names[1])
    ax.scatter(y=range(len(y_names)), x=wide_data[x_names[0]],
               s=50, color='#a3c4dc', alpha=0.9, zorder=1,
               label=x_names[0])
    # Line segments
    for i, p1, p2 in zip(range(len(y_names)), wide_data[x_names[0]], wide_data[x_names[1]]):
        newline([p1, i], [p2, i])
    ax.set_yticks(range(len(y_names)))
    ax.set_yticklabels(y_names)
    # Decoration
    # Lighten borders
    for ax_pos in ['top', 'bottom', 'right', 'left']:
        ax.spines[ax_pos].set_visible(False)
    ax.set_xlabel(change)
    ax.legend(frameon=False, loc='lower right')
    plt.show()


dumbbell_plot(df, x='Year', y='Continent', change='GDP per capita')

## Polar

I'm not sure I've ever seen a polar plots in economics, but you never know.

Let's generate some polar data first:


In [None]:
r = np.arange(0, 2, 0.01)
theta = 2 * np.pi * r
polar_data = pd.DataFrame({'r': r, 'theta': theta})
polar_data.head()

### Matplotlib


In [None]:
ax = plt.subplot(111, projection='polar')
ax.plot(polar_data['theta'], polar_data['r'])
ax.set_rmax(2)
ax.set_rticks([0.5, 1, 1.5, 2])  # Fewer radial ticks
ax.set_rlabel_position(-22.5)  # Move radial labels away from plotted line
ax.grid(True)
plt.show()

### Seaborn



In [None]:

ax = plt.subplot(111, projection='polar')
sns.lineplot(polar_data['theta'], polar_data['r'], ax=ax);

## Radar (or spider) chart

Let's generate some synthetic data for this one.

In [None]:
df = (pd.DataFrame(dict(zip(['var' + str(i) for i in range(1, 6)],
                            [np.random.randint(30, size=(4)) for i in range(1, 6)]))
                   )
      )
df.head()

In [None]:
from math import pi


def radar_plot(data, variables): 
    n_vars = len(variables)
    # Plot the first line of the data frame.
    # Repeat the first value to close the circular graph:
    values=data.loc[data.index[0], variables].values.flatten().tolist()
    values += values[:1]
    # What will be the angle of each axis in the plot? (we divide / number of variable)
    angles = [n / float(n_vars) * 2 * pi for n in range(n_vars)]
    angles += angles[:1]
    # Initialise the spider plot
    ax = plt.subplot(111, polar=True)
    # Draw one axe per variable + add labels
    plt.xticks(angles[:-1], variables)
    # Draw ylabels
    ax.set_rlabel_position(0)
    # Plot data
    ax.plot(angles, values, linewidth=1, linestyle='solid')
    # Fill area
    ax.fill(angles, values, 'b', alpha=0.1)
    return ax


radar_plot(df, df.columns);

## Wordcloud

These should be used sparingly. Let's grab part of a famous text from Project Gutenberg:

In [None]:
book_text = open(os.path.join('data', 'smith_won.txt'), 'r').read()
# Print some lines
print('\n'.join(book_text.split(os.linesep)[107:117]))

In [None]:
from wordcloud import WordCloud
wordcloud = WordCloud(width=700, height=400).generate(book_text)
fig, ax = plt.subplots(facecolor='k')
ax.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.tight_layout();

We can also create a 'mask' for the wordcloud to shape it how we like, here in the shape of a book.

In [None]:
from PIL import Image
mask = np.array(Image.open(os.path.join("data", "book_mask.png")))
wc = WordCloud(width=700, height=400,
               mask=mask,
               background_color='white')              
wordcloud = wc.generate(book_text)
fig, ax = plt.subplots(facecolor='white')
ax.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.tight_layout();