# Visualization

The visualization landscape is scattered.<br>
There are many libraries available:<br><br>
<div><img src="plots/PythonVisLandscape.jpg" width="700" align="left"/></div>

There are roughly three areas of interest:
1. [Matplotlib](https://matplotlib.org/3.2.1/contents.html) and all derived packages like [Seaborn](https://seaborn.pydata.org/)
2. packages build around JavaScript, which offer interactivity in the browser. There exist a lot of different packages, but below are couple that I feel are important ones.
  * [Bokeh](https://docs.bokeh.org/en/latest/index.html) which has it's own BokehJS Javascript library.
  * [HoloViews](https://holoviews.org/Reference_Manual/index.html) a high level API that interfaces lower level libraries like Bokeh and Matplotlib.
  * [Altair](https://altair-viz.github.io/) build on top of JavaScript Vega library.
3. packages build around OpenGL (or Vulcan) and WebGL (like [Pyodide](https://github.com/iodide-project/pyodide/)) which offer speed at the price of complexity (usefull in high load applications: real-time 3D image rendering (games, medical imaging, ...). This group of tools is out of scope here.

Note: both HoloViews and Altair work more or less like ggplot (of which there is a python [version](https://github.com/yhat/ggpy)) in the R world.<br>
They folow the process as outlined in the book **Grammar of Graphics**, a grammar used to describe and create a wide range of statistical graphics.<br><br>

Before diving in, this notebook focusses on visualizing data and not on the related topic of dashboards.<br>
Bokeh provides its own server and is definitely capable of producing interactive web-applications, offering full two way interaction between client and server. Towards the end we'll tinkering around the edges.<br>
A big name in this space is [Plotly](https://plotly.com/) a commercial company offering the very powerfull [Dash](https://plotly.com/dash/) (Nationwide is using the free version on AWS cloud to buid some POC examples) and [cufflings](https://plotly.com/python/cufflinks/) (a third party wrapper around plotly)<br><br>

The developement are fast, causing a scattered landscape (as is clear from the plot above). There have been multiple efforts to unify this landscape, but not very successfull (I guess one of few downsides of open-source). An initiative with a lot of cloud, trying to encourage and bring together a diverse set of tools under a single website is [PyViz](https://pyviz.org/index.html). The goal is to make everyone doing viz in Python more productive.<br><br>

This notebook is concerned with exposing you to a range of tools, so you can start your own journey and go deeper into some parts while the need occurs.<br><br>

This notebook consists of:
1. Some plotting functionality that the pandas library provides. This functionality is build on top of Matplotlib, and give a quick way to do some basic (and not so basic) plotting.
2. A dive into the more low level matplotlib library. Matplotlib has an extensive codebase that can be daunting. However, most of matplotlib can be understood with a fairly simple conceptual framework and knowledge of a few important points.
3. Once the basics are known, we'll come out at the other end by highlighting another more user friendly API build on top of Matplotlib, seaborn.
4. We'll finish by going into some libararies that interface with modern JavaScript visualization, enabling high quality interactive plots.

But first, let's start with what we have already seen ...

# Visualization Using Pandas

Lets start with some of the plotting functionality that comes with [pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) and that we've seen before.

In [None]:
import numpy as np
import pandas as pd

In [None]:
help(pd.DataFrame.plot)

## Simple scatter plot

Give the x & y coordinates of the plot. Can also give a size and color column.<br>
When using the size and color, need to pass in as series and not as column names (suspect this is a bug that will get fixed).

In [None]:
df = pd.DataFrame({'x':np.random.randn(25), 'y':np.random.randn(25)})

In [None]:
df.plot.scatter(x='x', y='y')

In [None]:
# s: size & c: color
df.plot.scatter(x='x', y='y', s=100, c='red')

In [None]:
df = pd.DataFrame({'x':  np.random.randn(25), 
                   'y':  np.random.randn(25), 
                   'sz': np.random.choice([10,100,1000], size=25),
                   'cl': np.random.choice(['red','blue','green','orange'], size=25)
                  })

In [None]:
df.plot.scatter(x='x', y='y', s=df.sz, c=df.cl, alpha=0.4, figsize=(20,10), title='Some Cheerfull Bubbles')

## TimeSeries

Each column is a timeseries.

In [None]:
df = pd.read_csv('data/GOOG.csv', index_col='Date')

In [None]:
df.index.dtype

In [None]:
df.index = pd.to_datetime(df.index)

In [None]:
df[['Low','High']].plot.line(figsize=(20,5))

## Area plot

Each column is a timeseries, that get stacked on top of each other and filling the area between.

In [None]:
dates = pd.date_range(start='2018/01/01', end='2018/05/01', freq='W')
df = pd.DataFrame({
    'soup':     np.random.choice(np.arange(10,21,5), size=len(dates)),
    'chips':    np.random.choice(np.arange(10,31,5), size=len(dates)),
    'sandwich': np.random.choice(np.arange(10,31,2), size=len(dates))
}, index=dates)

In [None]:
df[:3]

In [None]:
df.plot.area(color=['red','blue','orange'], title='Total Sales')

## Pie Chart

Produces a Pie for column `y` where each category is defined in a row with label row index.

In [None]:
df = pd.DataFrame({'count':  [0.4, 0.6], 'ca_balance': [2e9, 0.5e9]}, index=['committed', 'not committed'])
df

In [None]:
df.plot.pie(subplots=True, figsize=(10,10))

In [None]:
# or, if we start with raw data ...
df = pd.DataFrame({'committed': np.random.choice(['yes','no'], p=[0.7,0.3], size=100),
                   'main':      np.random.choice(['yes','no','unknown'], p=[0.3,0.6,0.1], size=100)
                  })

In [None]:
df.head()

In [None]:
# aggregate and then plot
df.committed.value_counts().plot.pie(title='Committed')

## Heatmaps - Hexbin

Use heatmaps when scatterplots get saturated because to much data is involved ...

In [None]:
df = pd.concat([
    pd.DataFrame({'x': +1 + 0.8 * np.random.randn( 5_000), 'y': +1 + 0.8 * np.random.randn( 5_000)}),
    pd.DataFrame({'x': -1 + 0.8 * np.random.randn( 5_000), 'y': -1 + 0.8 * np.random.randn( 5_000)}),
    pd.DataFrame({'x':  0 + 0.2 * np.random.randn(10_000), 'y':  0 + 0.2 * np.random.randn(10_000)}),
])

In [None]:
df.plot.scatter(x='x', y='y')

In [None]:
## using alpha helps is this scenario
df.plot.scatter(x='x', y='y', alpha=0.1)

In [None]:
## using alpha helps is this scenario
df.plot.scatter(x='x', y='y', alpha=0.1, figsize=(20,6))

In [None]:
## but is is better to use a heatmap
df.plot.hexbin(x='x', y='y', gridsize=100)

A common issue with heatmaps is that a few high values drown out all the details in the lower counts.<br>
Here, I generated data from a mixture, where the one in the middle is more frequent and less variable. This will drown out the other two ...<br>
A good way to regain that detail is to use a logarithmic normalisation of the counts.<br>
In the next section, we'll be going into the specifics of Matplotlib, here we just use the fact that the Pandas plotting is build on to of Matplotlib.<br>
Meaning we can adopt the Pandas plots using all the power of Matplotlib.

In [None]:
from matplotlib.colors import LogNorm
from matplotlib.cm import Oranges

In [None]:
df.plot.hexbin(x='x', y='y', gridsize=100, norm=LogNorm(), figsize=(20,6), cmap=Oranges)

Heatmaps are great tools and become more powerfull when there is more data. Exactly the case when scatterplots become less usefull. And, as an huge added bonus, they use a lot less memory when exported to a lot of formats, because they only store use the bincounts.  

## Histogram - KDE

In [None]:
df.x.plot.hist(bins=50, figsize=(20,5))

In [None]:
## pandas also gives a convenient way to get the Kernel Density Estimation of a variable ...
fig, ax = plt.subplots(figsize=(20,5))
##df.x.plot.hist(bins=300, figsize=(20,5), ax=ax)
df.x.plot.kde(figsize=(20,5), ax=ax)

## Before We Dive Into Matplotlib ...

The good folk of Anaconda have made it possible to use HoloViews as the 'backend' to produce HoloViews (which uses Bokeh) plots directly from Pandas.<br>
Don't want to spend too much time on this, but it is worth mentioning here, because HoloViews gives you some usefull goodies for 'free'.

In [None]:
pd.options.plotting.backend = 'holoviews'

In [None]:
df = pd.read_csv('data/GOOG.csv', index_col='Date')
df.index = pd.to_datetime(df.index)
df[['Low','High']].plot(width=1500, height=400)

# Matplotlib

In the examples above Pandas offered a convenient way to plot using the Matplotlib library (with the exception of the last plot).<br>
Matplotlib has been the core libarary for plotting for many years. It is very powerfull and flexible, and subsequently also complex.<br>
Allthough more convenient API's exist that wrap the Matplotlib functionality, it pays to have a good understanding of Matplotlib.<br>
It will enable you to fine-tune your visuals.

## Basics

Everything in matplotlib is organized in a hierarchy. At the top of the hierarchy is **matplotlib.pyplot** usually refered to with the alias **plt**.<br>
plt provides the entry point into the object-oriented api of matplotlib, it is used only for a few functions such as figure creation.

In [None]:
import matplotlib.pyplot as plt

plt provides the entry point into the object-oriented api of matplotlib, it is used only for a few functions such as figure creation.<br>
The user explicitly creates and keeps track of the **figure** and **axes** objects.<br>
Before matplotlib can render the output to the notebook we need to tell matplotlib to use the *inline backend* (or *notebook backend*)

In [None]:
## make plot appear right inside the notebook ...
%matplotlib inline
## also, to make plots high quality, I usually run the following
%config InlineBackend.figure_format = 'retina'

Lets' start by creating an empty figure ...

In [None]:
fig = plt.figure()
print(fig.__doc__)

Note that this figure contains 0 Axes. Axes ar the entities that hold the 'plots' in Matplotlib. To add an Axes:

In [None]:
## the argument [left, bottom, width, height] are quantities in fractions of figure width and height
ax1 = fig.add_axes([0.0, 0.0, 1.0, 1.0])
ax2 = fig.add_axes([1.0, 1.0, 1.0, 1.0])
print(ax1.__doc__)

In [None]:
fig

In [None]:
ax1.plot(np.arange(5), np.arange(5), 'xb')
ax2.plot(np.arange(5), np.arange(5)[::-1], 'or')
fig

In [None]:
fig.get_children()

The **figure** is the whole figure, and it functions as a container for child axes<br>
Each **axes** holds a 'plot'<br>
The axes contains two (or three in the case of 3D) **axis** objects which take care of the data limits<br>
To complete the concepts, everything you can see on the figure is an **artist** (even the Figure, Axes, and Axis objects).<br>
This includes Text objects, Line2D objects, collection objects, Patch objects ... (you get the idea)<br><br>
So the figure holds Axes that contain, for instance Line2d, Axis, ..., that contain, Titles, Labels, Tickes, ..., that contain Fonts, ...<br><br>
![the components that make up the basic object hierarchy of matplotlib](plots/anatomy.png)

The way the Axes ax1 & ax2 were created above, can get rather complex. You need to leave room for tickmarks, labels, ...<br>
So in practice, another way is used to create the Axes:
```python
fig, ax = plt.subplots(...)
```
This creates the Axes and returns a handle to the *figure* and the *axes* (or array of axes when nrows & ncols is used)<br><br>

To make the figure a little more interesting, let's add a 2x2 grid of axes the 'easy' way:

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(7,7))

In [None]:
type(ax[0][0])

In [None]:
ax[0][0].scatter(x=[0.25,0.50,0.75], y=[0.25,0.50,0.75])
fig

In [None]:
size=5
x = np.arange(1,6)
def random_y(): return(np.random.randint(1,5,size))
##
fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(18,6))
##
ax[0].plot(x, random_y(), '-g', x, random_y(), '--r')
ax[0].legend(labels=['green solid line', 'red dashed line'])
ax[1].bar(x, random_y(), color='#000080')
ax[2].pie(random_y(), labels=x, startangle=90)
ax[3].hexbin(np.random.randn(100), np.random.randn(100), gridsize=(10,10))

Once everything is set up, the axes provide a convenient way into the hierarchy of objects, where basically everything can be tweaked and changed.

In [None]:
fig, ax = plt.subplots(figsize=(14,7))
yr = random_y()
ax.plot(x, yr, '-')

In [None]:
## Let's dive a bit deeper, it really helps to become more effective
ax.get_children()

In [None]:
## take the first 'spine' and see what this allows us to do, and play around with it ...
[m for m in dir(ax.get_children()[1]) if 'set_' in m]

In [None]:
ax.get_children()[3].set_linewidth(10)
ax.get_children()[3].set_linestyle('-')
ax.get_children()[3].set_color('red')
fig

In [None]:
[m for m in dir(ax.get_children()[0]) if 'set_' in m]

In [None]:
ax.get_children()[0].set_marker('o')
ax.get_children()[0].set_markerfacecolor('red')
ax.get_children()[0].set_markersize(25)
ax.get_children()[0].set_linewidth(5)
ax.get_children()[0].set_linestyle('-.')
fig

This last step is to show that you can basically get to everything through the object hierarchy & change it.<br>
Often the properties of your plot you care about, can be set when you create it, but it's incredibly powerfull to become self sufficient in tweaking your graphics.<br>
Remember, the components in matplotlib are organized in an object hierarchy (where each element has children) and you can always ask python to list what is available using *dir* ...

In [None]:
## for instance to list all set_<...> methods on the first child of axes -> ax.get_children()[0]
[m for m in dir(ax.get_children()[0]) if m.startswith('set_')]

The 'normal' way of using Matplotlib consistes of roughly three steps
```python
fig, ax = plt.subplots(...)                        ## setting up the axes
ax.scatter(...) -or- ax.bar(...) -or- ax.hist(...) ## create the plots
ax.set_<some_property>(...)                        ## fine-tune the plot if necessary
```
It is not always clear how to tweak the thing you want to change ... Google / Stackoverflow are your friends.<br>
But, you now know quite a bit more than 80% of the folks posting questions on Stackoverflow after they copied some example.<br>
Which, off course, is the right thing to do :-)

## Saving Figures To File 

To save a figure to file, use: 
```python
figure.savefile(filename, ...) 
```
Most of the time SVG will give very good results, because the resulting figure is scalable and not too big.<br>
This works well, unless you have a lot of data in your plot. For instance a scatterplot with a lot of data.<br>
In these cases an SVG can become large, and a PNG will be better. In such cases it might be worth playing with the dpi= parameter.<br><br>

Finally it is worth to noting that information like axis labels titles, ..., can get cropped in the saved image.<br>
In general, I almost always use the **bbox_inches='tight'** parameter to prevent this from happening.

In [None]:
fig.savefig('plots/example_plot.svg', bbox_inches='tight')

In [None]:
!open plots/example_plot.svg

Now the basics are covered, let's look at some examples of predefined functions. Please note that Matplotlib is a vast library.<br>
There are many parts that will not get covered here. Bit the goal here is not to be exhausitive, which I would be woefully inapt for :-)<br>
Finally, Matplotlib gets some flag for being complex, which is unfair taken into account the depth & felexibility of the library.

## Scatter Plots

In [None]:
## you don't always need the figure, just use _ to ignore
x1 = np.random.randn(50) - 1
y1 = np.random.randn(50) - 1
## set up figure
fig, ax = plt.subplots(figsize=(20,3))
## create scatter plot
ax.scatter(x=x1, y=y1, marker='*', s=100)

In [None]:
## add some more poitns to the same axes
x2 = np.random.randn(50) + 1
y2 = np.random.randn(50) + 1
ax.scatter(x=x2, y=y2, marker='+', c='red', s=100)
fig

In [None]:
## fine-tune plot
fig, ax = plt.subplots(figsize=(20,5))

## plot the points
ax.scatter(x=x1, y=y1, marker='*', c='#cc66ff', s=100)
ax.scatter(x=x2, y=y2, marker='X', c='#ace600', s=100)

##
ax.set_xlim(-3,3)
ax.set_ylim(-3,3)

## add title
ax.set_title('A WONDERFULLY COLORFULL SCATTERPLOT', fontdict={'family':'bauhaus 93', 'color':'orange', 'weight':'normal', 'size': 24, })

## add some x & y axis labels
ax.set_xlabel('y-coord of random noise', fontdict={'family':'courier new', 'color':'#339933',  'weight':'bold', 'size': 16, })
ax.set_ylabel('x-coord of random noise', fontdict={'family':'courier new', 'color':'#ff6600',  'weight':'bold', 'size': 16, })

## change ticklabel formatting
ax.set_xticklabels(ax.get_xticks(), fontdict={'family':'serif', 'color':'purple',  'weight':'bold', 'size': 16, })
ax.set_yticklabels(ax.get_xticks(), fontdict={'family':'serif', 'color':'#00aaff', 'weight':'bold', 'size': 16, });

## Line Plots

In [None]:
ax.plot([-3,+3], [+3,-3], '--', c='green', linewidth=5)
fig

It's also possible to fill the area between two lines

In [None]:
ymin  = ax.get_ylim()[0]
##                   x,     y1,           y2
ax.fill_between([-3,3],[+3,-3], [ymin, ymin], color='lightblue', alpha=0.7)
fig

In [None]:
fig, ax = plt.subplots(ncols=3, figsize=(25,5))
x  = np.linspace(0,2*np.pi,num=100)
y0 = np.zeros(100)
ys = np.sin(x)
yc = np.cos(x)
ax[0].fill_between(x, ys, alpha=0.3);
ax[1].fill_between(x, yc, alpha=0.3);
ax[2].fill_between(x, ys, yc, alpha=0.3);

In [None]:
def sigmoid(x, beta): return np.exp(x - beta) / (1 + np.exp(x - beta))
x = np.linspace(1,10)
y = sigmoid(x, 3.5)
## another way to obtain a handle to an axes is by: 
fig = plt.figure(figsize=(20,5))
ax  = fig.add_subplot(111)
##
ax.plot(x, sigmoid(x,5), '-b');
plt.axvline(x=5.0, ymin=0, ymax=1, color='#4d79ff', linestyle=':')
plt.axhline(y=0.5, xmin=0, xmax=1, color='#4d79ff', linestyle=':')

In [None]:
ax.set_title('SIGMOID FUNCTION', fontdict={'size':28, 'color':'#000066'})
ax.text(x=2, y=0.75, s=r'$\frac{\exp^{(x - 5)}}{1+\exp^{(x - 5)}}$', fontdict={'size':24, 'color':'#000066'})
ax.grid(False)
fig.savefig('plots/sigmoid_fun.svg', format='svg')
fig

## Histograms

In [None]:
x = np.random.randn(500)
fig, ax = plt.subplots(figsize=(20,5))
ax.hist(x, bins=20);

In [None]:
from matplotlib.mlab import GaussianKDE

In [None]:
kde = GaussianKDE(x)
fig, ax = plt.subplots(figsize=(20,5))
x4plot = np.arange(-3,3,step=0.1)
y4plot = kde.evaluate(np.arange(-3,3,step=0.1))
ax.plot(x4plot, y4plot, '-', c='#0066cc');
## ax.fill_between(x4plot, y4plot, alpha=0.1);

In [None]:
ax = pd.Series(x).plot.kde(figsize=(20,5))
ax.set_xlim((-3,3))

### Grouped Bar Chart

In [None]:
df = pd.read_csv(r'data\iris.data', skiprows=0, names=['sepal_length','sepal_width','petal_length','petal_width','species'])

In [None]:
df.species = df.species.astype('category')

In [None]:
df_mean = df.groupby('species').mean()
df_mean

Let's assume we want to create a group of 3 bars (one for each species) for the 4 features:

In [None]:
np.linspace(0,12,4)

In [None]:
## x positions
x1 = np.linspace(0,12,4)+0.1
x2 = np.linspace(1,13,4)+0.1
x3 = np.linspace(2,14,4)+0.1
## setup plots
fig, ax = plt.subplots(figsize=(15,4))
## plot bars
ax.bar(x1, df_mean.loc['Iris-setosa'],     width=0.8, color='darkgreen',  label='Iris-setosa')
ax.bar(x2, df_mean.loc['Iris-versicolor'], width=0.8, color='darkorange', label='Iris-versicolor')
ax.bar(x3, df_mean.loc['Iris-virginica'],  width=0.8, color='darkblue',   label='Iris-virginica')
## add legend
ax.legend(prop={'size':16})
## ticks
ax.set_xticks([1,5,9,13])
ax.set_xticklabels(list(df_mean.columns), fontdict={'size':16})
##
ax.grid(False);

## Stem Plots

In [None]:
## simple example
fig, ax = plt.subplots(figsize=(20,5))
x4plot = np.arange(-3,3,step=0.1)
y4plot = kde.evaluate(np.arange(-3,3,step=0.1))
ax.stem(x4plot, y4plot, use_line_collection=True)

In [None]:
## timeline example
pandemics = pd.DataFrame([('COVID-19', 2020, 22e3),
                          ('Hong Kong Flu', 1968, 1e6),
                          ('Asian Flu', 1956, 2e6),
                          ('Influenza', 1918, 30e6),
                          ('Russian Flu',1889, 1e6),
                          ('3rd Cholera Pandemic', 1852, 1e6),
                          ('Black Death', 1346, 100e6),
                          ('Plague of Justinian', 541, 25e6),
                          ('Plague of Galen', -165, 5e6)
                         ], columns=['name', 'year', 'deathtoll']).set_index('name', drop=True)
fig, ax = plt.subplots(figsize=(30,10))
ax.stem(pandemics.year, pandemics.deathtoll, use_line_collection=True);
ax.set_xticks([])
ax.set_yticks([])
ax.grid(False)

for name, (year, deathtoll) in pandemics.iterrows():
    ax.text(year, deathtoll+1_000_000, f'{name}\n{int(year)} & {deathtoll/1e6:_.0f}M', horizontalalignment='center', verticalalignment='bottom', fontsize=14, backgroundcolor=(1., 1., 1., .3))

## Build Your Own Slider

In [None]:
fig, ax = plt.subplots(figsize=(10,2))
## plot the line
ax.plot([0, 100], [0,0], '-', c='#a6a6a6', linewidth=10)

In [None]:
## position and make pretty
ax.plot([0, 100], [0,0], '-', c='#a6a6a6', linewidth=10)
ax.plot([0, 100], [0,0], '-', c='#4d4d4d', linewidth=5)
ax.set_ylim(bottom=-0.3, top=0.7)
fig

In [None]:
## plot the markers
ax.plot(50, 0, 'o', color='#4d4d4d', markersize=47, markeredgewidth=3, markeredgecolor='#a6a6a6')
ax.plot(55, 0, 'o', color='orange',  markersize=47, markeredgewidth=3, markeredgecolor='#4d4d4d')
fig

In [None]:
## add the label & remove the axis & border
ax.text(0,0.4,'Main Account', horizontalalignment='left', verticalalignment='bottom', fontsize=24, color='#4d4d4d')
ax.grid(False)
ax.axis('off')
fig

Once your done, you can wrap it in a function, so next time you simple call the funnction!

In [None]:
def create_indexing_plot(name, scale_min, scale_max, pop_val, grp_val, grp_color, ax):
    ax.plot([scale_min, scale_max], [0,0], '-', c='#a6a6a6', linewidth=10)
    ax.plot([scale_min, scale_max], [0,0], '-', c='#4d4d4d', linewidth= 5)
    ax.set_ylim(bottom=-0.4, top=0.6)
    ## plot the markers
    ax.plot(pop_val, 0, 'o', color='#4d4d4d', markersize=22, markeredgewidth=3, markeredgecolor='#a6a6a6')
    ax.plot(grp_val, 0, 'o', color=grp_color, markersize=22, markeredgewidth=3, markeredgecolor='#4d4d4d')
    ## add the label
    ax.text(0, 0.35, name, horizontalalignment='left', verticalalignment='bottom', fontdict={'family':'Arial', 'color':'#4d4d4d', 'weight':'bold', 'size': 18, })
    ## remove the axis & border
    ax.grid(False)
    ax.axis('off');

In [None]:
fig, ax = plt.subplots(figsize=(5,1.2))
create_indexing_plot('Committed', 0, 100, 50, 10, 'red', ax)

In [None]:
fig, axs = plt.subplots(nrows=5, figsize=(5,5*1.2))
create_indexing_plot('Committed',     0, 100, 50, 70, '#ffcc00', axs[0])
create_indexing_plot('Sole Account',  0, 100, 70, 80, '#ffcc00', axs[1])
create_indexing_plot('Joint Account', 0, 100, 40, 55, '#ffcc00', axs[2])
create_indexing_plot('Credit Card',   0, 100, 20, 30, '#ffcc00', axs[3])
create_indexing_plot('Mortgage',      0, 100,  5, 23, '#ffcc00', axs[4])
fig.savefig('plots/example_matplotlib_BYO.svg', format='svg')

In [None]:
!open plots/example_matplotlib_BYO.svg

## Build Your Own Spider

In [None]:
lbl = ['Aaaa','Bbbbbb','Ccc Cccc','Dd','Eeeeee eee']
val = [   0.4,     0.5,       0.3, 0.8,         0.6]

In [None]:
## create figure and add polar axes
fig = plt.figure(figsize=(6, 6))
## note: dimensions [left, bottom, width, height] of the new axes are in fractions of figure width and height.
ax  = fig.add_axes([0.2, 0.2, 0.8, 0.8], polar=True)

In [None]:
## remove some fluff not needed
ax.get_yaxis().set_ticks([])
ax.get_xaxis().set_ticks([])
ax.spines['polar'].set_visible(False)
ax.grid(False)
fig

In [None]:
## draw the frame: an 'axis' for each dimension
theta = np.linspace(0, 2*np.pi, len(lbl), endpoint=False) + np.pi/2
## plot the 'frame' of the spiderplot
for ix in range(len(lbl)): 
    a = theta[ix] % (2*np.pi)
    ## plot the spokes using polar coordinates (angle, length)
    ax.plot([a,a], [0,1], '-', c='gray', linewidth=2)
    ## plot label
    align = 'center' if ((np.abs(a-np.pi/2)<0.1) or (np.abs(a-np.pi*3/2)<0.1)) else \
            'right'  if ((a>(np.pi/2)) and (a<(np.pi*3/2))) else \
            'left'
    ax.text(theta[ix], 1.1, lbl[ix], ha=align, fontdict={'family':'Arial', 'size':14, 'color':'gray'})
## plot the outer polygon
ax.plot(np.append(theta,theta[0]), np.ones(len(theta)+1), '-', c='gray', linewidth=2)
fig

In [None]:
## now add the spider
ax.plot(np.append(theta,theta[0]), np.append(val,val[0]), 'blue', linewidth=2)
ax.fill(np.append(theta,theta[0]), np.append(val,val[0]), 'blue', alpha=0.2)
ax.set_title('SEGMENT 1\n-------------------------------------', fontdict={'family':'Arial', 'color':'#4d4d4d', 'weight':'bold', 'size': 18})
fig

Again, once your done, you can wrap it in a function, and subsequently simply call the function ...

In [None]:
def spider(ttl, lbl, val, color, size=(6,6), fontsize=14, filename=None):
    # create figure
    fig = plt.figure(figsize=size)
    ax  = fig.add_axes([0.2, 0.2, 0.8, 0.8], polar=True)
    ## clean
    ax.get_yaxis().set_ticks([])
    ax.get_xaxis().set_ticks([])
    ax.spines['polar'].set_visible(False)
    ax.grid(False)
    # build frame
    theta = np.linspace(0, 2*np.pi, len(lbl), endpoint=False) + np.pi/2
    for ix in range(len(lbl)): 
        a = theta[ix] % (2*np.pi)
        ax.plot([a,a], [0,1], '-', c='gray', linewidth=2)
        align = 'center' if ((np.abs(a-np.pi/2)<0.1) or (np.abs(a-np.pi*3/2)<0.1)) else \
                'right'  if ((a>(np.pi/2)) and (a<(np.pi*3/2))) else \
                'left'
        ax.text(theta[ix], 1.1, lbl[ix], ha=align, fontdict={'family':'Arial', 'size':fontsize, 'color':'gray'})
    ## now add the spider
    ax.plot(np.append(theta,theta[0]), np.append(val,val[0]), color, linewidth=2)
    ax.fill(np.append(theta,theta[0]), np.append(val,val[0]), color, alpha=0.2)
    ## add the title
    ax.set_title(ttl + '\n' + '-'*50, fontdict={'family':'Arial', 'color':'#4d4d4d', 'weight':'bold', 'size': fontsize+2})
    return ax

In [None]:
s0 = spider('SEGMENT ONE',
            ['committed','main account','credit card','savings account','mortgage','personal loan'], 
            [        0.8,           0.6,          0.4,              0.7,       0.2,            0.1],
            '#ff3300', 
            size=(3,3), 
            fontsize=11
           )
s0.figure.savefig('plots/spider_start.svg', format='svg', bbox_inches='tight')

In [None]:
!open plots/spider_start.svg

## 3D

In [None]:
from mpl_toolkits.mplot3d import Axes3D

In [None]:
%matplotlib inline
fig = plt.figure()
ax  = fig.add_axes([0,0,1,1], projection='3d')

In [None]:
type(ax)

In [None]:
[m for m in dir(ax) if '3D' in m]

In [None]:
## changing the 'notebook' backend. gives extra functionality
%matplotlib notebook
fig  = plt.figure()
ax3d = fig.add_axes([0,0,1,1], projection='3d')
##
z_line = np.linspace(0, 15, 1000)
x_line = np.cos(z_line)
y_line = np.sin(z_line)
##
ax3d.plot3D(x_line, y_line, z_line, 'gray')

In [None]:
z_points = 15 * np.random.random(100)
x_points = np.cos(z_points) + 0.1 * np.random.randn(100)
y_points = np.sin(z_points) + 0.1 * np.random.randn(100)
pt_sizes = np.random.choice([10,100,1000], size=100)
ax3d.scatter3D(x_points, y_points, z_points, s=pt_sizes, c=z_points, cmap='hsv');
fig

Matplotlib has some pretty advanced functions for 3d plotting, see [mplot3d-toolkit](https://matplotlib.org/3.1.1/tutorials/toolkits/mplot3d.html#the-mplot3d-toolkit).<br>
For instance, use plot_trisurf to create a really smooth 3D surface using triangulation.

In [None]:
## define grid one (x, y) - plane using angles and radii
radii  = np.linspace(0.125, 1.0, 8)
angles = np.linspace(0, 2 * np.pi, 36, endpoint=False)
# Repeat all angles for each radius.
angles = np.repeat(angles[..., np.newaxis], len(radii), axis=1)
# Convert polar (radii, angles) coords to cartesian (x, y) coords.
x = np.append(0, (radii*np.cos(angles)).flatten())
y = np.append(0, (radii*np.sin(angles)).flatten())
z = np.sin(-1*x*y)
##
fig  = plt.figure()
ax3d = fig.add_axes([0,0,1,1], projection='3d')
##
ax3d.plot_trisurf(x, y, z, antialiased=True)

In [None]:
ax3d.figure.savefig('plots/3d_example.svg', format='svg', bbox_inches='tight')

In [None]:
!open plots/3d_example.svg

## Wrapping Up

I have only touched on the powerfull capabilities matplotlib has on offer. There are many more types of plots predefined.<br>
But it would leave no time for other plotting libraries that, in all likelyhood, will offer an easier API to get most things done in a few lines of code.<br><br>
A good location to dive deeper into Matplotlib is through the elaborate [documentation](https://matplotlib.org/3.2.1/contents.html)

# Seaborn

Seaborn is a Python data visualization library based on matplotlib, and provides a high-level interface for drawing attractive and informative statistical graphics.<br>
The goal is to make it simple to produce insightfull plots with only a few lines of code.<br>
Think of Seaborn as a very well written library of 'build your own' functions & classes that were written on top of Matplotlib to make your life a lot easier.<br>
It is often more difficult to tweak and change all aspects of the plot, but creating good looking plots with a pretty layout on top of Matplotlib is hard work!<br><br>
It is pretty standards to use alias **sns** and to also set a style immediately after loading:

In [None]:
import matplotlib.pyplot as plt

import seaborn as sns
sns.set(style="whitegrid")

For a good overview, of all Seaborn has on offer look at the [API documentation](https://seaborn.pydata.org/api.html#).<br>
Let's read some data to play with ...

In [None]:
## let's grab some example data
tips        = pd.read_csv('data/tips.csv')
tips.sex    = tips.sex.astype('category')
tips.smoker = tips.smoker.astype('category')
tips.day    = tips.day.astype('category')
tips.time   = tips.time.astype('category')
tips.head()

As a first example, use **relplot** to produce a figure showing the relationship between two variables with semantic mappings of subsets:
* scatterplot: kind="scatter"
* of bill x tip: *x="total_bill", y="tip"*
* columns split by the level's of time: *col="time"*
* and further symantic mappings:
  * colored by smoker: *hue="smoker"*
  * marker style by day: *style="day"*
  * sized by size: *size="size"*

In [None]:
sns.relplot(x="total_bill", y="tip", kind="scatter", col="time", hue="smoker", style="day", size="size", data=tips)

In general, it is a lot harder to tweak your figure using the underlying matplotlib functionality.<br>
Seaborn makes it a lot easier to produce complex plots with just a few lines of code, but it is more difficult to tweak your plot afterwards.<br>
On a positive note, most of the functions come with a wide range of parameters to tune the result.<br>
Also, Seaborn comes with a [seaborn.set()](https://seaborn.pydata.org/generated/seaborn.set.html) function you can use to configure it.

In [None]:
colors = ["#FF0B04", "#4374B3"]

sns.set(font_scale=1.5, palette=colors)

g = sns.relplot(x="total_bill", y="tip", kind="scatter", col="time", hue="smoker", style="day", size="size", sizes=(50, 500), alpha=0.5, data=tips, height=7, aspect=1.5)

## plt, as in the plt from: import matplotlib.pyplot as plt
## basically gives you an interface into the matplotlib world
plt.subplots_adjust(top=0.9) ## (1) make room for title
g.fig.suptitle('Total Bill -X- Tip -By- Time', fontdict={'color':'darkgreen', 'family':'Bauhaus 93'}) ## add the title

## undo change to font_scale for future plots
sns.set()

## FacetGrid

Facegrid let's you set up a grid, and susequently *map* a plotting function into each of the subplots in the grid ...

In [None]:
sns.set(font_scale=1.4, palette=colors)
g = sns.FacetGrid(tips, row="time", col="day", height=4, hue="sex", hue_kws={"marker": ["^", "v"]})
g.map(plt.scatter, "total_bill", "tip")
g.add_legend();
sns.set()

## Bubble Plot

In [None]:
sns.set(font_scale=1.5)
cmDay = {'Fri':'#66cc00', 'Sat':'#ffcc00', 'Sun':'#ff751a', 'Thur':'#b82e8a'}
_, ax = plt.subplots(figsize=(20,10))
sns.scatterplot(x='total_bill', y='tip', hue='day', size='size', sizes=(100,2000), palette=cmDay, alpha=0.5, data=tips, ax=ax)
ax.legend(loc='center left', bbox_to_anchor=(1.0, 0.5), ncol=2)
ax.set_title('Some cheerfull bubbles !!!')
sns.set()

print(type(ax))

Please note that **sns.scatterplot** returns a **matplotlib.axes._subplots.AxesSubplot**.<br> 
 It’s important to know about a major distinction between seaborn plotting functions, all the plots are either:
 * **figure-level**: in which case you need to use plt to get entry into the Matplotlib world
 * **axes-level**: in which case the recipe: ax = sns.some_plotting_function() ... ax.set_some_property() works best

## Violin Plot

A violin-plot gives insight into distributions split by some categorical. They are a refined version of the well-known box-plot.

In [None]:
fig, ax = plt.subplots(figsize=(15,5))
sns.violinplot(x='sex', y='tip', data=tips, ax=ax)

In [None]:
tips['gender_and_table_size'] = tips.sex.astype(str) + '_' + tips['size'].apply(str)

In [None]:
fig, ax = plt.subplots(figsize=(20,5))
sns.violinplot(x='gender_and_table_size', y='tip', data=tips.sort_values('gender_and_table_size'), ax=ax)

## Joint KDE Plot

A **joinplot** gives a nice bi-variate scatterplot with the marginals on to and on the side.<br>
Use *kind='reg'* to get obtain a scatteplot with a regression line:

In [None]:
sns.set(font_scale=1.5)
ax = sns.jointplot(x='total_bill', y='tip', data=tips, color='orange', kind='reg')
ax.fig.set_figwidth(10)
ax.fig.set_figheight(10)
sns.set(font_scale=1.0)
ax.fig.savefig('plots/sns_jointplot.svg')

In [None]:
!open plots/sns_jointplot.svg

Regretably, this cannot be combined with *hue* to split by different levels of a categorical.<br>

In [None]:
tips.day.cat.categories

In [None]:
axs.shape

In [None]:
_, axs = plt.subplots(ncols=4, figsize=(30,7))
colors = ['#9900cc', '#3333cc', '#e60073', '#ff9933']
for ix,(grp,df) in enumerate(tips.groupby('day')): 
    p = sns.jointplot('total_bill', 'tip', kind='reg', data=df, color=colors[ix], ax=axs[ix])
    plt.title(grp)

## Kernel Density Plot

## FaceGrid

In [None]:
ax = sns.FacetGrid(tips, col='day', height=5, aspect=0.6);
ax.map(sns.barplot, "sex", "total_bill", order=['Male','Female'], color='orange', alpha=0.5);

# Modern Visualization Using JavaScript

## Bokeh

Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords high-performance interactivity over large or streaming datasets.<br><br>

Bokeh can help anyone who would like to quickly and easily make interactive plots, dashboards, and data applications.<br><br>

Bokeh exposes two interface levels to users:
* bokeh.models: A low-level interface that provides the most flexibility to application developers.
* bokeh.plotting: A higher-level interface centered around composing visual glyphs.

I don't want to go too deep into the specifics, but it pays to have a rough understanding of the **components** and lingo of Bokeh.<br>
* <span style="color:blue;font-weight:bold">application</span>: a Bokeh application is a recipe for generating Bokeh documents. Typically this is Python code **run by a Bokeh server** when new sessions are created. The Bokeh server, open a whole new world of interactive dashboards, but is out of scope her.
* <span style="color:blue;font-weight:bold">BokehJS</span>: the JavaScript client library that actually renders the visuals and handles the UI interactions for Bokeh plots and widgets in the browser. Typically, users will not have to think about this aspect of Bokeh much (“We write the JavaScript, so you don’t have to!”) but it is good to have basic knowledge of this dichotomy.
* <span style="color:blue;font-weight:bold">documents</span>: organizing data structure for Bokeh applications. Documents contain all the Bokeh Models and data needed to render an interactive visualization or application in the browser.
* <span style="color:blue;font-weight:bold">embedding</span>: various methods of including Bokeh plots and widgets into web apps and pages, or the IPython notebook. 
* <span style="color:blue;font-weight:bold">glyphs</span>: the basic visual building blocks of Bokeh plots, e.g. lines, rectangles, squares, wedges, patches, etc. The bokeh.plotting interface provides a convenient way to create plots centered around glyphs.
* <span style="color:blue;font-weight:bold">models</span>: lowest-level objects that comprise Bokeh “scenegraphs”. Most users will not use this level of interface to assemble plots directly. However, ultimately all Bokeh plots consist of collections of models, so it is important to understand them enough to configure their attributes and properties.
* <span style="color:blue;font-weight:bold">server</span>: the Bokeh server is an optional component that can be used for sharing and publishing Bokeh plots and apps, for handling streaming of large data sets, or for enabling sophisticated user interactions based off of widgets and selections.
* <span style="color:blue;font-weight:bold">widgets</span>: user interface elements outside of a Bokeh plot such as sliders, drop down menus, buttons, etc. Events and updates from widgets can inform additional computations, or cause Bokeh plots to update.

Let's start with a simple example:

## HoloViews