# Table of Contents 
- **[Python Data Visualization Landscape](#Python-Data-Visualization-Landscape)**
- **[Matplotlib: plotting examples](#Matplotlib:-plotting-examples)**
- **[Plotting with pandas](#Plotting-with-pandas)**
- **[Seaborn](#Seaborn)**
- **[Plotly Express: Interactive Plots](#Plotly-Express:-interactive-plots)**



# Python Data Visualization Landscape

**Data visualization** is one of the most important step in the data mining process.
The choice of the correct plot depends on three aspects:
- which data you are expected to plot
- what is the goal of the visualization 
- for whom the plot is intended

[Anaconda Blog](https://www.anaconda.com/blog/python-data-visualization-2018-why-so-many-libraries): Python Data Visualization 2018: Why So Many Libraries?

![dataviz](https://files.speakerdeck.com/presentations/a2d86983ff634ac3871ad4e5a308a67b/slide_32.jpg)

Most of the libraries fall into the "InfoVis" group, focusing on visualizations of information in arbitrary spaces, not necessarily the three-dimensional physical world. 

InfoVis libraries use the two dimensions of the printed page or computer screen to make abstract spaces interpretable, typically with axes and labels. The InfoVis libraries can be further broken down into numerous subgroups:
- **`Matplotlib`**: One of the oldest and by far the most popular of the InfoVis libraries, released in 2003, with a very extensive range of 2D plot types and output formats

- **Matplotlib-based**: A variety of tools have built on Matplotlib's 2D-plotting capability over the years, either using it as a rendering engine for a certain type of data or in a certain domain (`pandas`, `NetworkX`, `Cartopy`, `yt`, etc.), or providing a higher-level API on top to simplify plot creation (`ggplot`, `plotnine`, `HoloViews`, `GeoViews`), or extending it with additional types of plots (`seaborn`, etc.).
- **JavaScript**: Once HTML5 allowed rich interactivity in browsers, many libraries arose to provide interactive 2D plots for web pages and in Jupyter notebooks, either using custom JS (`Bokeh`, `Toyplot`) or primarily wrapping existing JS libraries like D3 (`Plotly`, `bqplot`).


The most basic plot types are shared between multiple libraries, but others are only available in certain libraries. 
- **Hint**: look at the example galleries for each library. 

As a rough guide:
- *Statistical plots* (scatter plots, lines, areas, bars, histograms): Covered well by nearly all InfoVis libraries, but are the main focus for Seaborn, bqplot, Altair, ggplot2, plotnine
- *Images, regular grids, rectangular meshes*: Well supported by Bokeh, Datashader, HoloViews, Matplotlib, Plotly
- *Irregular 2D meshes* (triangular grids): Well supported by the SciVis libraries plus Matplotlib, Bokeh, Datashader, HoloViews
- *Geographical data*: Matplotlib (with Cartopy), GeoViews, ipyleaflet, Plotly
- *Networks/graphs*: NetworkX, Plotly, Bokeh, HoloViews, Datashader
- *3D (meshes, scatter, etc.)*: Fully supported by the SciVis libraries, plus some support in Plotly, Matplotlib, HoloViews, and ipyvolume.

# Matplotlib: plotting examples

Recommended reading: [sample plots in Matplotlib](https://matplotlib.org/stable/gallery/index.html)

Matplotlib is an excellent 2D and 3D graphics library for generating scientific figures. Some of the many advantages of this library include:
* Easy to get started
* Support for $\LaTeX$ formatted labels and texts
* Great control (programmatically) of every element in a figure, including figure size and DPI.
* High-quality output in many formats, including PNG, PDF, SVG, EPS, and PGF.


In [None]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import os

In [None]:
t = np.arange(0, 10, 0.1)
sin_t = np.sin(2 * np.pi * t / 5)
cos_t = np.cos(2 * np.pi * t / 5)

In [None]:
t

In [None]:
sin_t, cos_t

In [None]:
plt.plot(t, cos_t)
plt.plot(t, sin_t)

### Pyplot Functions

There are many `pyplot` functions available for us to customize our figures. For example:

| Fucntion | Description |
| ---: | :--- |
| `plt.xlim` | set $x$ limits |
| `plt.ylim` | set $y$ limits |
| `plt.grid` | add grid lines |
| `plt.title` | add a title |
| `plt.xlabel` | add label to the horizontal axis |
| `plt.ylabel` | add label to the vertical axis |
| `plt.axis` | set axis properties (`equal`, `off`, `scaled`, etc.) |
| `plt.xticks` | set tick locations on the horizontal axis |
| `plt.yticks` | set tick locations on the vertical axis |
| `plt.legend` | display legend for several lines in the same figure |
| `plt.savefig` | save figure (as .png, .pdf, etc.) to working directory |
| `plt.figure` | create a new figure and set its properties |

See the [pyplot documentation](https://matplotlib.org/api/pyplot_summary.html) for a full list of functions.

In [None]:
plt.figure(figsize = (15, 5)) 
# figsize is a tuple of the width and height of the figure in inches

plt.plot(t, cos_t, '.r', label = r"$cos(\theta)$") 
# ".r" is a format string which denotes linestyle=None, marker = '.' and color = 'red' 

plt.plot(t, sin_t, '--g', label = r"$sin(\theta)$")
# "--g" is a format string which denotes linestyle=dashed and color = 'green' 

plt.title('two functions')
plt.xlabel(r"$\theta$")
plt.ylabel('values')
plt.ylim([-1.1, 1.1])
plt.legend()
plt.grid(axis = 'y')

# plt.legend(loc = 0) # let matplotlib decide the optimal location
# plt.legend(loc = 1) # upper right corner
# plt.legend(loc = 2) # upper left corner
# plt.legend(loc = 3) # lower left corner
# plt.legend(loc = 4) # lower right corner
# # .. many more options are available

plt.savefig(os.path.join('out', 'example_figure.png'), format = 'png')
plt.show()

#### Colors

| Character | Color |
| :---: | :---: |
| `b` | blue |
| `g` | green |
| `r` | red |
| `c` | cyan |
| `m` | magenta |
| `y` | yellow |
| `k` | black |
| `w` | white |


#### Markers

| Character | Marker |
| :---: | :---: |
| `.` | point |
| `o` | circle |
| `v` | triangle down |
| `^` | triangle up |
| `s` | square |
| `p` | pentagon |
| `*` |	star |
| `+` | plus |
| `x` |	x |
| `D` | diamond |

#### Line Styles

| Character | Line Style |
| :---: | :---: |
| `-` | solid line style |
| `--` | dashed line style |
| `-.` | dash-dot line style |
| `:` | dotted line style |

See the [matplotlib.pyplot.plot documentation](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html) for more options.

### Subplots

We start by storing a reference to the newly created figure instance and the axes array in the `f` and `axarr` variables, respectively

In [None]:
# Two subplots, the axes array is 1-d
f, axarr = plt.subplots(2, sharex = True) 
axarr[0].plot(t, cos_t, '.r')
axarr[1].plot(t, sin_t, '.b')
plt.tight_layout()
plt.show()

# Plotting with pandas

Pandas builds on top of Matplotlib but exploits the knowledge included in Dataframes to improve the default output. 

Check the [user guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html)


In [None]:
composers_df = pd.read_excel('dataset/composers.xlsx', sheet_name = 'Sheet5')
composers_df = composers_df.dropna()
composers_df

In [None]:
composers_df.head()

- The `plot` method on `Series` and `DataFrame` is just a simple wrapper around `plt.plot()`

In [None]:
composers_df.birth.plot()

- On DataFrame, `plot()` is a convenience to plot all of the columns with labels

In [None]:
composers_df.plot()

- Plotting methods allow for a handful of plot styles other than the default line plot. These methods can be provided as the `kind` keyword argument to `plot()`, and include:

| kind | plot |
| :---: | :---: |
| `bar` / `barh` | bar plots |
| `hist` | histogram |
| `box` | boxplot |
| `kde` or `density` | density plot |
| `area` | area plots |
| `scatter` | scatter plots |
| `hexbin` | hexagonal bin plots |
| `pie` | pie plots |

- example of scatter plot

In [None]:
composers_df.plot(kind = 'scatter', x = 'birth', y = 'death')

- example of scatter plot with customization

In [None]:
composers_df.plot(kind = 'scatter', 
                  x = 'birth', 
                  y = 'death',
                  title = 'Composer birth and death',
                  grid = True,
                  fontsize = 15)
plt.show()

- example of histogram plot

In [None]:
composers_df.plot(kind = 'hist')

We can appreciate the advantage of using Pandas: without specifying anything, Pandas made a histogram of the two numeric columns, labelled the axis and even added a legend to the plot.


We can definitely improve visualization by adding transparency!

In [None]:
composers_df.plot(kind = 'hist', alpha = 0.4)

- We can also ask for subplots


In [None]:
composers_df.plot.hist(subplots = True, alpha = 0.5)  


- example of boxplot

In [None]:
composers_df.plot(kind = 'box')

In [None]:
composers_df.plot(kind = 'box', 
                  subplots = True, 
                  sharey = True)
plt.show()

In [None]:
composers_df.groupby('period').mean('numeric_only').plot(kind = 'bar')
plt.show()

In [None]:
composers_df['period'].value_counts().plot(kind = "bar")
plt.show()

In [None]:
composers_df['period'].value_counts().plot(kind = "pie")
plt.show()

# Seaborn

See the [overview](https://seaborn.pydata.org/tutorial/function_overview.html) and the [example gallery](https://seaborn.pydata.org/examples/index.html) for an overview on seaborn plotting options.

Seaborn is tightly integrated with matplotlib.

While you can be productive using only seaborn functions, full customization of your graphics will require some knowledge of matplotlib’s concepts and API. 

High quality data visualization products can be obtained by combining the two:
- **Seaborn** provides a powerful high-level interface for creating visually appealing plots quickly
- **Matplotlib** provides deep customizability 



In [None]:
import seaborn as sns

- example of `lmplot`: plot data and <ins>L</ins>inear <ins>M</ins>odel regression fits

In [None]:
g = sns.lmplot(x = "birth", y = "death", data = composers_df)

In [None]:
g = sns.lmplot(x = "birth", y = "death", data = composers_df, hue = "period") 
# hue = Grouping variable that will produce points with different colors

- example of `jointplot`: Draw a plot of two variables with bivariate and univariate graphs.


In [None]:
g = sns.jointplot(x = "birth", y = "death", data = composers_df)

Assigning a hue variable will add conditional colors to the scatterplot and draw separate density curves on the marginal axes:

internally, it uses `kdeplot()`: it plots univariate or bivariate distributions using kernel density estimation.
- A **kernel density estimate** (KDE) plot is a method for visualizing the distribution of observations in a dataset, analagous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.


In [None]:
g = sns.jointplot(x = "birth", y = "death", hue = "period", data = composers_df)

In [None]:
composers_df['age'] = composers_df['death'] - composers_df['birth']

In [None]:
sns.pairplot(data = composers_df, hue = 'period')
plt.show()

### Correlation Analysis

`SciPy` is a collection of mathematical algorithms and convenience functions built on the NumPy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data.

SciPy features includes, but are not limited to:
- statistics
- linear algebra
- fourier transform
- optimization algorithm
- ...


In [None]:
from scipy.stats import pearsonr
pearsonr(composers_df.birth, composers_df.death)

The `pearsonr` function returns:
- Pearson product-moment correlation coefficent.
- The p-value associated with the chosen alternative: it roughly indicates the probability of an uncorrelated system producing datasets that have a Pearson correlation at least as extreme as the one computed from these datasets.

Pearson correlation coefficient can also be obtained with `pandas.DataFrame.corr()`

In [None]:
composers_df[['age', 'birth', 'death']].corr()

In [None]:
f,ax = plt.subplots(figsize=(10, 8))
sns.heatmap(composers_df[['age','birth','death']].corr(), 
            annot=True, 
            linewidths=.5, 
            fmt= '.2f',
            ax=ax,
            vmin=-1, # important: otherwise the color code can be "misleading"
            vmax=1, # important: otherwise the color code can be "misleading"
            cmap = "coolwarm")
plt.show()

#### Categorical Data

`seaborn` axes-level functions for [plotting categorical data](https://seaborn.pydata.org/tutorial/categorical.html):
- categorical scatter plots
    - `stripplot()`
    - `swarmplot()`
- distribution plots
    - `boxplot()`
    - `violinplot()`
    - `boxenplot()`
- estimate plots
    - `pointplot()`
    - `barplot()`
    - `countplot()`

In [None]:
sns.countplot(x = 'period', data = composers_df, hue = 'period', palette = "pastel")
plt.show()

`seaborn` also provides a figure-level interface, `catplot()`, that gives unified higher-level access to the axes-level functions.

In [None]:
sns.catplot(x = "period", 
            y = 'birth', 
            kind = "box", 
            hue = 'period',
            palette = "pastel",
            data = composers_df)

# Plotly Express: interactive plots

[Plotly Express](https://plotly.com/python/plotly-express/) is a terse, consistent, high-level API for creating figures. 

In [None]:
import plotly.express as px

ImportError? Install it!
```bash
conda install -c plotly plotly_express
```

In [None]:
df = px.data.iris()
df

- example of scatter plot

In [None]:
fig = px.scatter(df, 
                 x="sepal_width", 
                 y="sepal_length", 
                 color="species",
                 size='petal_length', 
                 hover_data=['petal_width'],
                 height=600
                )
fig.show()

Lets add a dimension!
- example of 3D scatter plot

In [None]:
fig = px.scatter_3d(df, 
                    x="sepal_width", 
                    y="sepal_length", 
                    z="petal_length", 
                    size="petal_width",
                    color="species",
                    height=600)
fig.show()

In [None]:
df

In [None]:
fig = px.scatter_matrix(df.drop(['species_id'], 
                                axis = 1),
                        dimensions = df.drop(['species_id', 'species'], axis = 1),
                        color = 'species',
                        height = 800,
                        width = 1000) 
fig.show()

- example of pie chart

In [None]:
df = px.data.gapminder().query("year == 2007").query("continent == 'Europe'")
df.loc[df['pop'] < 2.e6, 'country'] = 'Other countries' # Represent only large countries
fig = px.pie(df, 
             values = 'pop', 
             names = 'country', 
             title = 'Population of European continent', 
             height = 600)
fig.show()

- example of sunburst charts

In [None]:
df = px.data.gapminder().query("year == 2007")
fig = px.sunburst(df, 
                  path = ['continent', 'country'], 
                  values = 'pop',
                  color = 'lifeExp', 
                  hover_data = ['iso_alpha'],
                  height = 600)
fig.show()

- example of GeoJSON maps

In [None]:
df = px.data.election()
df

In [None]:
geojson = px.data.election_geojson()
geojson

In [None]:

fig = px.choropleth_mapbox(df, 
                           geojson = geojson, 
                           color = "winner",
                           locations = "district", 
                           featureidkey = "properties.district",
                           center = {"lat": 45.5517, "lon": -73.7073},
                           mapbox_style = "carto-positron", 
                           zoom = 9,
                           height = 600)
fig.show()


In [None]:
df['coderre-joly'] = df.Coderre - df.Joly

In [None]:
fig = px.choropleth_mapbox(df, 
                           geojson = geojson, 
                           color = "coderre-joly",
                           locations = "district", 
                           featureidkey = "properties.district",
                           center = {"lat": 45.5517, "lon": -73.7073},
                           mapbox_style = "carto-positron", 
                           zoom = 9,
                           height = 600)
fig.show()

- example of outline symbol maps

In [None]:
df = px.data.gapminder()
df

In [None]:
fig = px.scatter_geo(df, 
                     locations = "iso_alpha", 
                     color = "continent", 
                     hover_name = "country", 
                     size = "pop",
                     animation_frame = "year", 
                     projection = "natural earth",
                     height = 600)
fig.show()