# __Data Visualization in Python__
## _Machine and Statistical Learning Club. Spring-2019_

This workshop is intended to introduce different ways to visualize and plot data using Python in the context of Data Science techniques. 

<hr/>


In [None]:
import numpy as np   #After running this line np will reference the numpy class
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.style as mplstyle

import seaborn as sns

# SeaBorn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing __attractive and informative statistical graphics__.

https://seaborn.pydata.org


There are a lot of small demonstration datasets. We are using FMRI, Tips and IRIS datasets.

#### FMRI Dataset
The Functional Magnetic Resonance Imaging dataset, measure signals on different areas of the brain.

<img width=30% src="./files/brain.png">

In [None]:
#Load fmri (Functional magnetic resonance imaging) dataset
fmri = sns.load_dataset("fmri")

In [None]:
fmri[90:100]

## Relationship plots
Statistical analysis is a process of understanding how variables in a dataset relate to each other and how those relationships depend on other variables. 
Visualization can be a core component of this process because, when data are visualized properly, the human visual system can see trends and patterns that indicate a relationship.

In [None]:
# Plotting subsets of data with semantic mappings
sns.relplot(x="timepoint", y="signal", hue="region", style="event", kind="line", data=fmri);

## Categorical Plot

In [None]:
#Tips Dataset
tips = sns.load_dataset("tips")
tips[:10]

#### Visualize the distribution of the data by date

In [None]:
sns.catplot(x="day", y="total_bill", data=tips);

#### Visualize the distribution of the data by date with a SWARM arrangement

In [None]:
sns.catplot(x="day", y="total_bill", kind="swarm", data=tips);

#### Visualize the distribution of the data by date, with a Swarm arrangement but diferentiating by Gender

In [None]:
sns.catplot(x="day", y="total_bill", hue="sex", kind="swarm", data=tips);

#### Visualize the distribution of the data by Party Size, for those parties with a size different than 3

In [None]:
sns.catplot(x="size", y="total_bill", kind="swarm", data=tips.query("size != 3"));

#### Visualize the distribution of the tips separating smokers from non-smokers

In [None]:
sns.catplot(x="smoker", y="tip", order=["No", "Yes"], data=tips);

## Distribution Plots

### BoxPlots
This kind of plot shows the three quartile values of the distribution along with extreme values


#### Visualize the Relation between the weekday and the amount spent.

In [None]:
sns.catplot(x="day", y="total_bill", kind="box", data=tips);

#### Visualize the Relation between the weekday and the amount spent, differentiating smokers from non-smokers

In [None]:
sns.catplot(x="day", y="total_bill", hue="smoker", kind="box", data=tips);

#### Visualize the Relation between the gender and the amount spent, differentiating Lunch from Dinner

In [None]:
sns.catplot(x="sex", y="total_bill", hue="time", kind="bar", data=tips);

### Plotting univariate distributions

#### Distribution (histogram) of the tips in the dataset. 

In [None]:
sns.distplot(tips.tip);

### Plotting bivariate distributions¶

#### Distribution of the tips vs. the amount spent

In [None]:
sns.jointplot(x="tip", y="total_bill", data=tips);

#### Adding regression and kernel desity fit

In [None]:
sns.jointplot("total_bill", "tip", data=tips, kind="reg")

#### Distribution of the tips vs. the amount spent with a cool B&W honeycomb

In [None]:
with sns.axes_style("white"):
    sns.jointplot(x=tips.tip, y=tips.total_bill, kind="hex", color="k");

#### Distribution of the tips vs. the amount spent with density estimation

In [None]:
sns.jointplot(x="tip", y="total_bill", data=tips, kind="kde")

In [None]:
sns.jointplot(x="tip", y="total_bill", data=tips, kind="kde", color="r");

#### Distribution of the tips vs. the amount spent with density estimation and datapoints

In [None]:
g = sns.jointplot(x="tip", y="total_bill", data=tips, kind="kde", color="r")
g.plot_joint(plt.scatter, color="grey", linewidth=1, marker="o",s=5) #s is marker size
g.ax_joint.collections[0].set_alpha(1)
g.set_axis_labels("$X$", "$Y$");

## Visualizing pairwise relationships in a dataset
#### IRIS Dataset

The iris dataset is a well-known dataset for data analysis and machine learning. https://archive.ics.uci.edu/ml/datasets/iris

The dataset describes several characteristics of Irises (flower) and the type of the corresponding sample.

<img width=30% src="./files/flower-labelled_med.jpeg">

1. sepal length in cm 
2. sepal width in cm 
3. petal length in cm 
4. petal width in cm 
5. class: 
  - Iris Setosa 
  - Iris Versicolour 
  - Iris Virginica

The content looks as follows:


In [None]:
iris = sns.load_dataset("iris")
iris[45:55]

### Pairwise relation of Iris Features

In [None]:
sns.pairplot(iris);

### Pairwise relation of Iris Features Highlighting the Species

In [None]:
sns.pairplot(iris,hue="species");

### Pairwise relation of Iris Features Highlighting the Species with further customization

In [None]:
sns.pairplot(iris,hue="species", 
             vars=["sepal_width", "sepal_length", "petal_width"],  #list of features
             kind="reg", #type of graph (regression w/kernel density)
             palette="husl", #color palette: husl, PuBuGn_d, etc...
             markers=["o", "s", "D"], 
             height=5);

### Pairwise relation of Iris Features customizing the graphs on Diagonal and off-diagonal

In [None]:
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.kdeplot, n_levels=10);

In [None]:
g = sns.PairGrid(iris)
g.map_diag(sns.swarmplot)
g.map_offdiag(sns.kdeplot, n_levels=3,cmap="Blues", shade=True, shade_lowest=False)