# Plotting Basics

A plot is a graphical representation of a data set, showing the relationship between two or more variables. They are useful because we can quickly derive an understanding which would not come from raw lists of values. Inspecting a chart can make large amounts of information easy to digest and interpret.

The main purpose of a chart is to convey information clearly and accurately. Aesthetically pleasing charts often make consumption more enjoyable and can help deliver the message more effectively. But they carry meaning — never should clarity be sacrificed for visual embelishments.

Always try to keep your charts simple. Allow the audience to focus on the essential information, without overwhelming them with unnecessary noise.

> Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away

---

Semantic differences:
 - **graph**: shows a mathematical function (a line, usually continuous)
 - **plot**: observations marked on a coordinate system (points, usually continuous x/y axes)
 - **chart**: graphic representation of data (e.g.: bars, axes usually discrete)
 - **diagram**: an illustrative figure
 
but the terms are mostly interchangable.

---

### Table of contents


- Matplotlib
 - Fundamental Charts
   - Line
   - Bar
   - Scatter
   - No Pie
 - Options
   - Line Properties
   - Marker Properties
   - Figure Properties
   - Axes Limits
   - Figure Size and Aspect Ratio
   - Derivative Chart Types
 - Multiple Plots
   - Same Figure
   - Multiple Figures
   - Subplots
   - Twin Axes



- Seaborn
  - Relationship Plots
    - Line
    - Scatter
  - Categorical Plots
    - Swarmplots
    - Boxplots
    - Violinplots
  - Distribution Plots
    - Histograms
    - KDE
    - Bivariate
  - Multiple Plots
    - Plot Grids
    - Jointplots



- Pandas
  - Plotting Shortcuts
  - Table Styling




- Other Packages
  - Venn Diagrams
  - Joyplots
  - Network Graphs
  - More Types

---

In [None]:
import numpy as np # we'll use numpy to generate dummy data

**ℹ️ Tip**: packages often make use of other packages. Sometimes the developer does not keep them updated, leading to deprecation warnings. Since they are intended for the package's maintainer, there is nothing you can do to solve them. You can all such warnings (even ones caused by you) like this:

In [None]:
import warnings
warnings.simplefilter('ignore', FutureWarning)

## Matplotlib

Matplotlib is the de-facto Python library for creating attractive charts. It has a friendly interface, making basic plotting simple but also allowing power users to fully customize plots.

**👾 Trivia**: the package's name comes from _MATLAB-style plotting library_. This is because the syntax and functionality was initially heavily inspired by MATLAB's `plot` command.

In [None]:
import matplotlib.pyplot as plt

# magic to display in between cells
%matplotlib inline

**ℹ️ Tip**: increase the DPI of figures for crispier charts (but larger image size):

In [None]:
import matplotlib
matplotlib.rcParams['figure.dpi'] = 100
%config InlineBackend.figure_format = 'retina'

---

Plot some values:

In [None]:
plt.plot([1, 5, 4, 8]);

**ℹ️ Tip**: In other environments, it'd be required to call `plt.show()` in order to display the chart, but thanks to the `%matplotlib inline` magic, that is no longer needed. For cells that display multiple charts, `plt.show()` still has to be called after each chart. Also we are using the `;` trick to supress the output, which is just the memory address of a matplotlib object.

Provide both axes values:

In [None]:
x = 1, 2, 4
y = 5, 4, 7

In [None]:
plt.plot(x, y);

### Fundamental Charts

How and when to use the various types of basic charts

#### Line

Line charts display information as a series of data points connected by straight lines.  It is used to visualize a trend in data over a sequential variable, such chronologically. The horizontal axis is a (discretized) continuous variable and the vertical axis represents a measured value.

In [None]:
years = [2015, 2016, 2017, 2018, 2019, 2020]
price = [  70,   72,   79,   80,   85,   77]

In [None]:
plt.plot(years, price);

#### Bar

Bar charts present categorical data with bars of lengths proportional to the variable they represent. It is used to compare discretely-indexed values. The horizontal axis shows a discrete variable and the vertical axis represents a measured value.

In [None]:
companies = ['Stark Ind.', 'Wayne Ent.', 'Lexcorp', 'Oscorp', 'ACME']
valuation = [31, 27, 21, 19, 17]

In [None]:
plt.bar(companies, valuation);

**ℹ️ Tip**: unless there is a specific reason not to, order data points

#### Scatter

A scatter plot displays values for two variables as a collection of points. It is used to identify the relationship between two variables. The position of each point is by one variable on the horizontal axis and the other variable on the vertical axis.

In [None]:
height = [146, 152, 161, 164, 166, 171, 174, 178, 185, 195]
weight = [ 54,  59,  79,  75,  73,  82,  85,  84,  92,  94]

In [None]:
plt.scatter(weight, height);

#### No Pie

Pie charts appear deceitfuly friendly, but in reality, the data can almost always be presented in a better format. The human eye has evolved to be able to compare linear distances. It is difficult to tell the difference between angles, especially when precision is important.

In [None]:
plt.pie(valuation, labels=companies)
plt.gca().set_aspect('equal')

Whichone is larger, _ACME_ or _Lexcorp_? As shown before, the data can be much more efficiently presented in the form of a bar chart. Even for such few data points, angular distances are hard to interpret. The only time pie charts can be efficient is when you have very very few slices:

In [None]:
plt.pie([.73, .27], labels=['Female', 'Male'])
plt.gca().set_aspect('equal')

Even in this case, just showing the single number is more effective: _73% Female_. 

**AVOID PIE CHARTS!**

**ℹ️ Tip**: area charts, in general, are less efficient, because it is harder to compare areas than lengths. We usually underestimate the size of bigger shapes and overestimate the size of the smaller ones, because we instinctively judge the lengths or widths and not their areas.

### Options

Matplotlib features a wealth of customization options, which allow you to draw plots just the way you need them. All of the options in the following sub-sections can be combined together.

#### Line Properties

Common line (trace) options:

In [None]:
x = np.linspace(-np.pi, np.pi, num=50)
y = np.sin(x)

plt.plot(
    x,
    y,

    color='C3',
    linestyle='--',
    linewidth=5,
);

Options can also be quickly given in short-form:

In [None]:
plt.plot([3, 1, 6], 'g--', lw=3);  # green, dashed line, width of 3

Default color palette:


![pic](https://i.imgur.com/aGT8APH.png)

**👾 Trivia**: the default color palette is designed to both readable, pleasing and have distinctive colors. It was originally developed at [Tableau](https://www.tableau.com), a powerful standalone data visualization suite. You can find out more about the choice of default style in Matplotlib [here](https://www.youtube.com/watch?v=xAoljeRJ3lU).

#### Marker Properties

Common marker (points) options. Every property can be constant, the same for all points (like `alpha`, `marker` or `edgecolor` below) or per-point (like `s` or `color` below):

In [None]:
n_points = 20
plt.scatter(
    x=np.random.randn(n_points), 
    y=np.random.randn(n_points), 
    
    s=np.random.uniform(100, 1_000, n_points), # size
    color=np.random.choice(['C0', 'C1'], n_points),
    
    alpha=.3,  # transparency
    marker='p',  # pentagon
    edgecolor='gray',
);

**ℹ️ Tip**: markers can also be displayed for line plots. See available [marker shapes](https://matplotlib.org/api/markers_api.html).

**👾 Trivia**: both the American and British spellings of `gray`/`grey`. XKCD (the web-comic maker) ran a survey to see what labels do people assign to various colors. Read about the [results](https://blog.xkcd.com/2010/05/03/color-survey-results/) or explore them in this [interactive visualization](https://colors.luminoso.com).

#### Figure Properties

Nearly every element of the chart can be customized:

![pic](https://matplotlib.org/_images/anatomy.png)

**ℹ️ Tip**: refer to the [Parts of a Figure](https://matplotlib.org/tutorials/introductory/usage.html#parts-of-a-figure) for more detailed descriptions about each part of the figure.

In [None]:
plt.plot([2, 1450, 18400, 5700, 9000],
         label='Line name')

plt.title('Figure Title')
plt.legend(title='Legend title', loc='lower right')

plt.xlabel('X-axis label')
plt.ylabel('Y-axis label')

plt.grid(alpha=.25)
plt.gca().set_yscale('log')

plt.gca().set_xticklabels([f'{x:.0%}' for x in plt.gca().get_xticks()], rotation=45);

**ℹ️ Tip**: by default, the legend is placed in the location with the most available whitespace. Manually place it (even outside of the figure) with `ax.legend(bbox_to_anchor=(1, 1.2))`, where `(1, 1.2)` is the top-left coordinate of the legend box (in terms of figure width and height, with `(0, 0)` being bottom left).

**ℹ️ Tip**: aim for a high [ink-to-data ratio](https://infovis-wiki.net/wiki/Data-Ink_Ratio), avoid over-encumbering the chart with too many elements.

**ℹ️ Tip**: don't let the audience guess what the data represents. Annotate charts with a title and axis labels at the least. If a person randomly opens the book and lands on your chart, they should be able to understand what's going on. If possible, add a short description and an accompanying conclusion.

**ALWAYS PROVIDE LABELS!**

---

Remove all spines and ticks:

In [None]:
plt.plot([1, 5, 4, 8])
plt.axis('off');

**ℹ️ Tip**: remove just tick markers and labels (not axis labels or grids) with `plt.xticks([])`.

#### Axes Limits

Change axes limits with `plt.xlim`/`plt.ylim`

In [None]:
def plot_companies_worth():
    plt.bar(companies, valuation);
    plt.ylabel('Worth (billions)')
    plt.gca().set_yticklabels([f'${y:.0f}' for y in plt.gca().get_yticks()])

plot_companies_worth()
plt.title('Zero-baselined')
plt.show()

plot_companies_worth()
plt.title('Truncated')
plt.ylim(16);

**ℹ️ Tip**: Bar charts ask viewers compare the relative lengths. Showing just the tips of the bars to exaggerate differences in the data, their relative lengths changes. So people are either misled and take away the wrong message, or end up having to read the numbers, which defeats the purpose of the chart. There are exceptions.

**ℹ️ Tip**: provide effective labels:
- do not mistake the measurement with the units — in the above example, we are showing the _worth_, not the _dollars_ for each company.
- declutter the tick marks, while provide the same information — in the above example, since all valuations are provided in billions, it is easier to specify it in the axis label
- show appropriate units on the tick labels (e.g.: `$`, `★`, etc)
- show percentage values as percentages (e.g.: show `50\%` instead of the way it is stored, `0.5`)
- show an appropriate amount of decimals — more decimal digits might indicate more precision, but they reduce readability without adding actual value
- sometimes it is useful to provide the value labels at the top of each bar

#### Figure Size and Aspect Ratio

Change the figure size (and aspect ratio):

In [None]:
def plot_price_evolution():
    plt.plot(years, price)
    plt.gca().set_yticklabels([f'${y:.0f}' for y in plt.gca().get_yticks()])
    plt.ylabel('Price')

plt.figure(figsize=(3, 7))  # in "inches"
plot_price_evolution()
plt.title('Elongated')

plt.figure(figsize=(8, 1.5))
plot_price_evolution()
plt.title('Widened');

**ℹ️ Tip**: Line charts show a trend. Stretching the height of the graph can overblow the message, while stretching the width can understate it. This leads to misleading the audience.

#### Derivative Chart Types

The basic charts can be expanded with options into derivative ones

##### Stacked Bar Charts

Show two series for the same index:

In [None]:
companies = ['Stark Ind.', 'Wayne Ent.', 'Lexcorp', 'Oscorp', 'ACME']
valuation = [31, 27, 21, 19, 17]
potential = [ 3,  9,  6,  6,  4]

In [None]:
plt.bar(companies, valuation, label='worth')
plt.bar(companies, potential, bottom=valuation, label='growth')

plt.legend();

##### Grouped Bar Charts

Same data, but highlight differences in the second series, not in the overall sum:

In [None]:
width = .3
x = np.arange(len(companies))

plt.bar(x,         valuation, width=width, label='worth')
plt.bar(x + width, potential, width=width, label='growth')

plt.gca().set_xticks(x + width / 2)
plt.gca().set_xticklabels(companies)
plt.legend();

##### Area Chart

Line chart where the bottom is filled:

In [None]:
plt.fill_between(years, price)
plt.ylim(min(price) * .9);

##### Stacked Area Chart

Similar to stacked bar charts, but for continously-indexed series:

In [None]:
years = [2015, 2016, 2017, 2018, 2019, 2020]
price = [  70,   72,   79,   80,   85,   77]
tax   = [   2,    4,    3,    9,    1,    2]

In [None]:
plt.stackplot(years, price, tax, 
              labels=['price', 'tax'])

plt.ylim(min(price) * .9)
plt.legend();

### Multiple Plots

Chart multiple sets of data at once, either in the same figure, different ones, or composite figures.

#### Same figure

Plot multiple lines/points in a single figure by placing multiple statements in the same cell (or between `plt.show()` calls):

In [None]:
plt.plot([0, 9])
plt.plot([5, 2]);

#### Multiple Figures

Preserve the same Y-axis limits:

In [None]:
plt.plot([0, 9])
y_limits = plt.ylim()
plt.show()

plt.ylim(y_limits)
plt.plot([5, 2], c='C1');

#### Subplots

A figure is composed of one or more axes. When only one is present, `plt.` refers to it, otherwise it refers to the current figure & axis. You can get the figure & axis handles either by instantiating a new figure with `fig = plt.figure()`, creating subplots (see below) or directly from traces, `ax = plt.plot(...)`.

In [None]:
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, 
                               sharey=True, # size of the entire figure, not of individual plots
                               figsize=(12, 4))

ax1.plot([0, 9])  # note that we plot using `ax.plot` not `plt.plot`
ax2.plot([5, 2], c='C1')

plt.subplots_adjust(wspace=.3)

ax1.set_title('Left Title')  # note the slightly changed syntax `ax.set_title(...)` instead of `plt.title(...)`
ax2.set_title('Right Title')
plt.suptitle('Super Title');

**ℹ️ Tip**: get the current axis with `plt.gca()` and the current figure with `plt.gcf()`.

Alternatively, in short-form:

In [None]:
plt.subplot(121)  # now it's `subplot`, not `subplots`
plt.plot([0, 9])  # now it's `plt.plot`
plt.title('Left Title')

plt.subplot(122)
plt.plot([5, 2], c='C1');
plt.title('Right Title')

plt.suptitle('Super Title');

**ℹ️ Tip**: read more about [working with multiple figures and axes](https://matplotlib.org/tutorials/introductory/pyplot.html#working-with-multiple-figures-and-axes).

#### Twin Axes

Plot two sets of data on completely different scales:

In [None]:
x = np.linspace(0, 10, 100)
y1 = np.exp(x)
y2 = np.sin(x)

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.plot(x, y1, 'C0')
ax1.set_ylabel('exp', color='C0')
ax1.tick_params('y', colors='C0')

ax2.plot(x, y2, 'C3')
ax2.set_ylabel('sin', color='C3')
ax2.tick_params('y', colors='C3')

**ℹ️ Tip**: charting with two different scales can be confusing and suggest a relationship that may not exist. The viewer compares the magnitude of values between the two sets of data, which is meaningless given that the scales (and potentially units) are different.

**👾 Trivia**: check out more (oftentimes humorous) [spurious correlations](http://www.tylervigen.com/spurious-correlations) such as this one:

![a](http://www.tylervigen.com/chart-pngs/2.png)

---

**ℹ️ Tip**: simplicity is most effective, remove anything that doesn't support the message and design for comprehension. Matplotlib provides sensible defaults, but exessive coloring and other customizations can still degrade a chart.

Example transformation (could also reduce the precision, and do away with the color):

![pic](https://132q6j40a81047nmwg1az6v8-wpengine.netdna-ssl.com/wp-content/uploads/2017/06/data-visualization-design-5-1.png)

---

Save a figure to disk `plt.savefig(path, dpi, format)`, or drag-n-drop it directly from the Jupyter interface.

## Seaborn

Seaborn is a library for statistical data visualization, built on top of Matplotlib. It provides a higher-level interface for more complex visualizations, and a slightly changed style.

In [None]:
import seaborn as sns
sns.set()  # apply Seaborn's style to future charts

- Relationship
 - scatter
 - line
- Categorical
 - between categories
 - within a category

TODO: 
 - talk about seaborn' default style
 - example of sns despine

### Relationship Plots

To make examples more meaningful, throughout this section, we'll plot actual datasets. One such dataset is the `tips` one, which logs the bills and tips in a restaurant:

In [None]:
tips = sns.load_dataset('tips')
tips.head()

#### Scatter

Similar to Matplotlib's counterpart, but with a slightly changed style:

In [None]:
sns.relplot(data=tips, x='total_bill', y='tip');

Quickly add additional information such as the time of the meal (color), the customer's gender (shape) and the party size (size of marker), from the underlying dataset:

In [None]:
sns.relplot(
    data=tips,
    x='total_bill', 
    y='tip', 
    hue='time',
    style='sex',
    size='size',
);

#### Line

Another example dataset, of continuous measurements (over time):

In [None]:
fmri = sns.load_dataset('fmri')
fmri.head()

In [None]:
sns.relplot(data=fmri, x='timepoint', y='signal');

Aggregating it into a line, with mean and confidence interval (95%) is more informative:

In [None]:
sns.relplot(data=fmri, x='timepoint', y='signal', kind='line');

Show additional information: the region (color) and event (line style):

In [None]:
sns.relplot(
    kind='line',
    data=fmri,
    x='timepoint',
    y='signal',
    hue='region',
    style='event',
    alpha=.75,
);

### Categorical plots

We'll exemplify on the same, `tips` dataset

#### Between categories

The x axis is categorical, so points are grouped together and are jittered a little as to not overlap, but still show the amount of points in each category/total bill segment. This is called a swarmplot:

In [None]:
sns.catplot(x='day', y='total_bill', kind='swarm', data=tips, color='C0');

Show additional information:

In [None]:
sns.catplot(
    kind='swarm',
    data=tips,
    x='day',
    y='total_bill',
    hue='time',
);

#### Distribution

Boxplot: the box shows the three quartiles, and whiskers extend to show the smallest and largest values, excepting outliers which are plotted separatedly

The three quartiles are:
 1. lower quartile (25% of elements are less than it)
 2. median (50% of elements are less than it)
 3. upper quartile (75% elements are less than it)
 
A point is considered an outlier if it is farther than 1.5 IQR from the lower and upper quartiles.
IQR, the inter-quartile range, is simply the distance between the lower and upper quartiles.

In [None]:
tips.groupby('day').total_bill.describe()

In [None]:
sns.catplot(kind='box', data=tips, x='day', y='total_bill');

Similar to the boxplot, but shows more information about the distribution. Instead of the quartiles and ranges, it shows a KDE. Think of it as a continuous histogram. Its shape allows it to show data for two types of observations for each x-axis categorical value:

In [None]:
sns.catplot(
    kind='violin',
    split=True,
    
    data=tips,
    x='day',
    y='total_bill',
    hue='sex',
    scale='count',
);

The width of each KDE shows the amount of observations falling in that segment

### Distributions

We'll exemplify on the famous `iris` dataset, containing measurements of various species of flowers:

In [None]:
iris = sns.load_dataset('iris')
iris.head()

#### Univariate

A **histogram** (the columns) shows how many observations fall in each _bin_.

A **KDE**, Kernel Density Estimation, fits a probability density function over the distribution. You can think of it as a continuous approximation of the histogram.

In [None]:
sns.distplot(iris.sepal_length)
plt.gca().xaxis.grid(False)
plt.ylabel('% Samples');

**ℹ️ Tip**: It seems like our distribution is made up of multiple composing distributions. Since the data comes from natural phenomena, we expect it to be somewhat normally shaped. Plotting the KDE for each species reveals the underlying distributions:

In [None]:
for species, sub_df in iris.groupby('species'):
    sns.kdeplot(sub_df.sepal_length, label=species)

plt.legend(title='Species')
plt.xlabel('Sepal length')
plt.ylabel('Amount of samples')
plt.yticks([])
plt.title('Length Distribution by Species');

#### Bivariate

Scatterplot in the center with univariate histograms on the sides:

In [None]:
sns.jointplot(data=iris, x='sepal_length', y='sepal_width');

Bivariate (2D) analogous of KDE:

In [None]:
sns.jointplot(kind='kde', data=iris, x='sepal_length', y='sepal_width', shade_lowest=False);

Similarly, we can decompose the distributions:

In [None]:
with sns.axes_style('white'):
    for species, sub_df in iris.groupby('species'):
        sns.kdeplot(sub_df.sepal_length, sub_df.sepal_width,  label=species, 
                    shade=True, shade_lowest=False, alpha=.5)

plt.legend(title='Species')
plt.title('Length and Width Distribution by Species');

More than two variables: just have multiple pairwise plots

In [None]:
g = sns.PairGrid(iris, diag_sharey=False, hue='species')

g.map_diag(sns.kdeplot)
g.map_upper(plt.scatter, alpha=.5)
g.map_lower(sns.kdeplot, alpha=.75, shade=True, shade_lowest=False)

g.add_legend(title='Species');

#### Linear relationships

Best-fit line and confidence interval:

In [None]:
sns.regplot(data=tips, x='total_bill', y='tip');

Show histograms on the sides:

In [None]:
sns.jointplot(kind='reg', data=tips, x='total_bill', y='tip');

#### Heatmaps

We'll use the `flights` dataset, which contains the number of passagers for some flights over a period of time:

In [None]:
flights = sns.load_dataset('flights').pivot('month', 'year', 'passengers')

Present data in three dimensions. The z-axis (color intensity) represents the number of passengers:

In [None]:
sns.heatmap(flights, cbar_kws=dict(label='# Passangers'));

**ℹ️ Tip**: it is intuitive to represent larger values by darker colors:

In [None]:
sns.heatmap(
    flights, 
    cbar_kws=dict(label='# Passangers'),
    cmap='Blues',
);

**ℹ️ Tip**: Reverse any colormap by appending `_r` to its name.

Other sequential colormaps:

![pic](https://i.imgur.com/oqfPvJX.png)

---

Sometimes you have diverging data, such as the correlation: two variables can be correlated either positively (both increase and decrease at the same time) or negatively (when one increases, the other decreases). So we adapt to a diverging colormap.

In [None]:
crashes = sns.load_dataset('car_crashes')
crashes.head()

In [None]:
corr = crashes.corr()

mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True

with sns.axes_style("white"):
    ax = sns.heatmap(corr, 
                     mask=mask,
                     cbar_kws=dict(label='Correlation'),
                     cmap='RdYlGn', 
                     center=0, #vmin=-.5,
                     annot=True, 
                     fmt='.1f', 
                     lw=1,
                     square=True)

Other diverging colormaps:

![pic](https://i.imgur.com/9H9J71j.png)

**ℹ️ Tip**: see the rest of available colormaps and color palettes: [matplotlib](https://matplotlib.org/tutorials/colors/colormaps.html), [seaborn](https://seaborn.pydata.org/tutorial/color_palettes.html). Use tools such as [Color Brewer](http://colorbrewer2.org) to help pick color schemes. [Adobe Color Wheel](https://color.adobe.com/create/color-wheel/) is a good tool for general-purpose palette selection. Online [palette generators](https://coolors.co/app) make exploring colors easy.

---

Set the syle back to the original Matplotlib defaults:

In [None]:
sns.reset_orig()

## Pandas

In [None]:
import pandas as pd

In [None]:
weather = pd.read_csv('example_files/weather.csv')

month_names = 'January February March April May June July August September Octomber November December'.split()
cities = ['New York City', 'Los Angeles']

### Plotting Shortcuts

Pandas DataFrames integrates directly with Matplotlib, providing convenient plotting shortcuts:

In [None]:
weather[weather.month == 1][cities].plot()

# continue customizing the chart
plt.title('January Temperature')
plt.xlabel('Day of Month')
plt.ylabel('Temperature (°F)');

They make labeling, grouping and other tedious tasks easier:

In [None]:
weather.groupby('month')[cities].mean().plot(
    kind='barh',  # horizontal bar chart
    figsize=(4, 7),
    title='Average Temperatures',
)

plt.gca().set_yticklabels(month_names)
plt.xlabel('Temperature (°F)');

Most non-specialty chart types are supported:

In [None]:
weather['Los Angeles'].plot(kind='hist')

plt.title('Year-long Temperature Distribution')
plt.xlabel('Temperature (°F)')
plt.ylabel('#Days observed');

**ℹ️ Tip**: [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html) about supported chart types and options

### Table Styling

Lightweight visualizations can also be incorporated directly inside tables

In [None]:
df = pd.DataFrame(np.random.randn(7, 3), columns=list('ABC'))
df.iloc[1, 1] = np.nan

df

Set a caption for your table:

In [None]:
df.style.set_caption('Example Data')

Modify the precision:

In [None]:
df.round(3)

Set global options:

In [None]:
pd.set_option('precision', 2)

**ℹ️ Tip**: [read more](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html) about Pandas options

---

Change the style of specific elements:

In [None]:
df.style.highlight_null()

Restrict to only a subset of rows/columns:

In [None]:
df.style.highlight_max(subset=['A', 'B'], axis=0)

Arbitrary functions and function chaining:

In [None]:
def highlight_negatives(val):
    """ Make negative values bold red """
    color = 'red' if val < 0 else 'black'
    weight = 'bold' if val < 0 else 'normal'
    return f'color: {color}; font-weight: {weight}'  # css syntax

In [None]:
df.style\
    .set_precision(3)\
    .applymap(highlight_negatives)

---

Even inline charts:

In [None]:
df.style.bar(subset='C')

In [None]:
df.style.background_gradient(cmap='Greens')

**ℹ️ Tip**: read more about [table styling](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html)

**ℹ️ Tip**: watch a [short animation](http://i.imgur.com/ZY8dKpA.gif) on (slightly overdone) table styling

## Other Packages

While Matplotlib is the most widely used library (seconded by Seaborn), there are many other ones, most with overlapping functionality (line, bar charts etc). But there are also those that offer specific kinds of visualizations

### Joyplots

Joyplots show distributions over an ordinal variable or discretized time:

### Venn Diagrams

Show logical relations between a finite collection of sets:

In [None]:
from matplotlib_venn import venn2

In [None]:
venn2(subsets = (10, 5, 2), set_labels = ('Group A', 'Group B'));

In [None]:
from joypy import joyplot

**👾 Trivia**: they got their name from Joy Division's [album](https://itunes.apple.com/us/album/unknown-pleasures-remastered/544363171) that used such a plot on as their cover. Otherwise known as a ridgeplot. More recently popularized by Tensorflow's display of weights distributions over time.

In [None]:
fig, axes = joyplot(
    weather, by='month', column=['New York City', 'Los Angeles'],
    alpha=.75, range_style='own', grid='y', linecolor='white', 
    figsize=(6, 6), title='Monthly Temperature', legend=True,
)

axes[-1].set_xlabel('Temperature (°F)')
for month, ax in zip(month_names, axes):
    ax.set_yticklabels([month])

### Network Graphs

NetworkX is the de-facto library for storing graphs

In [None]:
import networkx as nx

Provides simple plotting:

In [None]:
G = nx.gnm_random_graph(7, 15)
nx.draw(G)

But also complex customizations:

In [None]:
G = nx.random_geometric_graph(200, 0.125)
# position is stored as node attribute data for random_geometric_graph
pos = nx.get_node_attributes(G, 'pos')

# find node near center (0.5,0.5)
dmin = 1
ncenter = 0
for n in pos:
    x, y = pos[n]
    d = (x - 0.5)**2 + (y - 0.5)**2
    if d < dmin:
        ncenter = n
        dmin = d

# color by path length from node near center
p = dict(nx.single_source_shortest_path_length(G, ncenter))

plt.figure(figsize=(8, 8))
nx.draw_networkx_edges(G, pos, nodelist=[ncenter], alpha=0.4)
nx.draw_networkx_nodes(G, pos, nodelist=list(p.keys()),
                       node_size=80,
                       node_color=list(p.values()),
                       cmap=plt.cm.Reds_r)

plt.xlim(-0.05, 1.05)
plt.ylim(-0.05, 1.05)
plt.axis('off');

### More Types

The largest areas we haven't touched upon are interactive charts and map charts.  While this workshop focused on static charts, we can take advantage of the Jupyter environment and plot these as well. 

The most widely used library for this is [Plotly](https://plot.ly/python/). Check out these features if interested:
 - [zooming, slicing, hover information](https://plot.ly/python/line-charts/)
 - [advanced Jupyter features](https://plot.ly/python/ipython-notebook-tutorial/)
 - more chart types: 
    - [3D scatterplot](https://plot.ly/python/3d-network-graph/) (navigatable)
    - [sankey](https://plot.ly/python/parallel-categories-diagram/)
    - [choropleth](https://plot.ly/python/maps/)
    - [chord diagram](https://plot.ly/python/filled-chord-diagram/)
    - [treemap](https://plot.ly/python/treemaps/)
    - [wind rose](https://plot.ly/python/wind-rose-charts/)

## Further Reading
 - Python Data Visualization Packages: [talk](https://www.youtube.com/watch?v=FytuB8nFHPQ)
 - Python Graphs: [gallery](https://python-graph-gallery.com) (not exclusively Matplotlib/Seaborn)
 - Articles providing concise and useful tips and caveats:
  - https://www.geckoboard.com/learn/data-literacy/data-visualization-tips/
  - https://www.data-to-viz.com/caveats.html
  - https://www.columnfivemedia.com/25-tips-to-upgrade-your-data-visualization-design
  - https://www.lovesdata.com/blog/data-visualization-tips
  - https://www.tableau.com/about/blog/2016/5/5-tips-effective-visual-data-communication-54174
  - https://www.dataquest.io/blog/design-tips-for-data-viz/
 - Collections of beautiful, effective visualizations:
  - [r/DataIsBeautiful](https://www.reddit.com/r/dataisbeautiful/top/?sort=top&t=all) (user voted)
  - [Tableau Gallery](https://public.tableau.com/en-us/s/gallery) (more intricate)
  - [The Pudding](https://pudding.cool) (visual essays)
  - [Seeing Theory](https://seeing-theory.brown.edu) (visual introduction to probability and statistics)
 - Books:
  - [Edward Tufte](https://www.edwardtufte.com/tufte/books_vdqi) (renown statistician and data visualization pioneer)
  - [How to Lie with Statistics](https://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728) (explores mis-uses of data visualizations)
  - [Universal Principles of Design](https://www.amazon.com/Universal-Principles-Design-William-Lidwell/dp/1592530079) (pertaining to an abundance of fields and contexts, not just data visualization)