<a href="https://colab.research.google.com/github/cagBRT/Intro-to-Programming-with-Python/blob/master/C_11_Intro_to_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction to Seaborn**

**Pre-requisites**: <br>
* Matplotlib
* Pandas
* Numpy
* Python

Matplotlib is a 2D plotting library that allows you to create publication-quality figures. Seaborn,which is based on Matplotlib, provides a high-level interface to draw statistical graphics.

Seaborn helps resolve the two major problems faced by Matplotlib; the problems are −<br>

Default Matplotlib parameters<br>
Working with data frames<br>

**Seaborn is specially designed for statistical plotting.**

Dependencies of Seaborn −

Python <br>
numpy<br>
scipy<br>
pandas<br>
matplotlib<br>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
from scipy import stats

**List the Seaborn datasets** 

In [None]:
#List the datasets that are included with seaborn
sb.get_dataset_names()

**Load a dataset from the Seaborn library**

In [None]:
df = sb.load_dataset('tips')
print(df.head())

# **Assignment 1**: 
Load the dots dataset from Seaborn

# **Turn Matplotlib plots into Seaborn plots**

A matplotlib plot

In [None]:
#A matplotlib plot
def sinplot(flip = 1):
   x = np.linspace(0, 14, 100)
   for i in range(1, 5): 
      plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)
sinplot()
plt.show()

Add sb.set to make a Matplotlib plot into a Seaborn plot

In [None]:
sb.set() #add to the matplotlib plot
sinplot()
plt.show()

Seaborn splits the Matplotlib parameters into two groups

* Plot styles
* Plot scale

# **Seaborn Figure Styles**
The interface for manipulating the styles is set_style(). <br>
Using this function you can set the theme of the plot. 

The available themes:

* Darkgrid
* Whitegrid
* Dark
* White
* Ticks

**Change the darkgrid to a light grid**

In [None]:
sb.set_style("whitegrid")
sinplot()
plt.show()


In the white and ticks themes, to remove the top and right axis spines using the 
>despine() function.

# **Assignment 2**:
Plot the sinplot using a white theme and depine the plot. 

In [None]:
#Assignment 2

In [None]:
#@title 
def sinplot(flip=1):
   x = np.linspace(0, 14, 100)
   for i in range(1, 5):
      plt.plot(x, np.sin(x + i * .5) * (7 - i) * flip)
sb.set_style("white")
sb.despine()
sinplot()
plt.show()

# **Customize Seaborn styles**
If you want to customize the Seaborn styles, you can pass a dictionary of parameters to the **set_style()** function. <br>
Parameters available are viewed using **axes_style()** function.

In [None]:
#List the of setting of the styles
sb.axes_style()

**Put the grid on top of the plot**<br>
Note: the '{' and '}' in the set_style method.

In [None]:
sb.set_style("whitegrid")
sb.set_style({'axes.axisbelow':False})
sinplot()
plt.show()

In [None]:
sb.set_style({'axes.axisbelow':True})

# **Assignment 3**:
Experiment with different setting different styles. 


In [None]:
#Set different styles from the list above
sinplot()
plt.show()

# **Scaling Plot Elements**
Control the scale of plot using the <br>
>set_context() function. 

There are four preset templates for contexts, based on relative size, the contexts are named as follows

* paper
* notebook
* talk
* poster

By default, context is set to notebook

In [None]:
sb.set_context("talk")
sinplot()
sb.despine()
plt.show()

In [None]:
sb.set_context("notebook")
sinplot()
sb.despine()
plt.show()

# **Assignment 4:** 
Change the scale of the plot for: 
1. inclusion in a paper
2. inclusion on a poster

# **Violin and Box Plots**
Create violin plots with:
>violinplot(data= )
<br>

Create box plots with:<br>
>boxplot(data= )

In [None]:
df = sb.load_dataset('titanic')
print(df.columns)
df.pop('fare')
df.pop('age')
df.pop('sibsp')
df.pop('parch')
sb.violinplot(data=df)
sb.despine(offset=10, trim=True);

# **Assignment 5:**
1. Select a datset from the seaborn datasets
2. Create a boxplot for the dataset

# **Color Palettes**
Seaborn allows user to modify colors of plots. <br><br>
To modify the color of a plot use: 
>color_palette(palette= , n_colors= , desat= )<br>
<br>

**n_colors** - the number of colors in the palette. Default is 10 colors.  If none the it will fall back to the last palette specified. <br>
<br>
**desat**- desaturate each color<br>

[Colorbrewer ](https://colorbrewer2.org/#type=qualitative&scheme=Accent&n=6)is a webpage to show what various color palettes


# **Palettes**
The palette names are: <br>
* Deep<br>
* Muted<br>
* Bright<br>
* Pastel<br>
* Dark<br>
* Colorblind<br>

Custom palettes can also be created. 

**Print the colors of the current palette in a line**

In [None]:
current_palette = sb.color_palette("bright")
sb.palplot(current_palette)
plt.show()

# **Assignment 7**
Print each of the Seaborn palettes 

**Color palette types:** <br>

color_palette() types

* qualitative
* sequential
* diverging

In [None]:
current_palette = sb.color_palette()
sb.palplot(sb.color_palette("Purples"))
plt.show()

In [None]:
diverging_colors = sb.color_palette("RdBu",8)
sb.palplot(diverging_colors)

**Print the colors of a sequential color palette**<br>
Some of the colors are:<br>
> green, red, blue, orange, grey, purple

# **Assignment 6**
1. Print a sequential color palette with 10 shades of color. 
2. Print a diverging color palette with 8 shades of color.  

[Seaborn Bar charts](https://seaborn.pydata.org/generated/seaborn.barplot.htmls://)

In [None]:
df = sb.load_dataset('tips')
current_palette = sb.color_palette("Blues")
sb.set_style("white")
sb.set_palette(current_palette)
sb.barplot(x="day", y="total_bill", data=df)

In [None]:
current_palette = sb.color_palette("bright")
sb.set_style("white")
sb.set_palette(current_palette)
sb.barplot(x="day", y="total_bill", hue="sex", data=df)

# **Assignment 7**
1. Pick one of the Seaborn datasets
2. Select a color palette
3. Create plots for the dataset


# **Visualizing the distribution of a dataset**

**Distribution Plots**

In [None]:
sb.set(color_codes=True)
x = np.random.normal(size=100)
sb.distplot(x);

# **Assignment 8:**
1. Load the Titanic dataset
2. Plot the distribution of the passenger's ages, use 5 bins for ages and add tick marks on the x axis for each data point. 

In [None]:
#@title 
df = sb.load_dataset('titanic')
current_palette = sb.color_palette("bright")
sb.set_style("dark")
sb.set_palette(current_palette)
x = df['age']
sb.distplot(x, bins=5, rug=True);

# **Assignment 9:**
1. Load the titanic dataset
2. Create a bar chart
3. Plot the embark_town vs survived

In [None]:
#@title
df = sb.load_dataset('titanic')
current_palette = sb.color_palette("Greens")
sb.set_style("white")
sb.set_palette(current_palette)
sb.barplot(x="survived", y="embark_town", data=df)

# **Scatterplots and Jointplots**
To create a scatterplot: <br>
>scatterplot(x=, y=, data=)
<br>

To create a jointplot:<br>
>jointplot(x=,y=,data=)


In [None]:
df = sb.load_dataset('planets')
sb.jointplot(x="orbital_period", y="mass", data=df)

# **Assignment 10:**<br>
1. Load the car_crashes dataset
2. Create a jointplot using speeding and alcohol 


# **Visualizing Pairwise Relationships**
To visualize a pair relationships in datasets, use:<br>
>pairplot(dataset name)
<br>

This won't work for every dataset and every data column. 

In [None]:
iris = sb.load_dataset("iris")
sb.pairplot(iris, hue='species');

# **Assignment 10**
1. Download the titanic dataset, 
2. Perform a pairplot comparison. Does it work on the titanic dataset? 
3. Select another dataset and see if the pairplot works on it. 

Why does pairplot not work on some datasets? 


# **Linear Regression**
Two main functions in seaborn are used to visualize a linear relationship as determined through regression. <br>
> regplot() <br>
lmplot()<br> 

They are closely related, and share much of their core functionality. It is important to understand the ways they differ, however, so that you can quickly choose the correct tool for particular job.

In [None]:
tips = sb.load_dataset("tips")

In the simplest invocation, both functions draw a scatterplot of two variables, x and y, and then fit the regression model y ~ x and plot the resulting regression line and a 95% confidence interval for that regression:<br>
The resulting plots are identical, except that the figure shapes are different.

The main difference is:<br>
regplot() accepts the x and y variables in a variety of formats including simple numpy arrays, pandas Series objects, or as references to variables in a pandas DataFrame object passed to data. <br>

lmplot() has data as a required parameter and the x and y variables must be specified as strings. This data format is called “long-form” or “tidy” data. <br>

Other than this input flexibility, regplot() possesses a subset of lmplot()’s features.

In [None]:
sb.regplot(x="total_bill", y="tip", data=tips);

In [None]:
sb.lmplot(x="total_bill", y="tip", data=tips);

# **Assignment 11**
1. Select a dataset
2. Perform linear regression on two data columns in the dataset. 

# **Ascombe's Quartet**
Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers and other influential observations on statistical properties. (Wikipedia)

In [None]:
anscombe = sb.load_dataset("anscombe")

In [None]:
sb.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),
           ci=None, scatter_kws={"s": 80});

In [None]:
sb.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
           ci=None, scatter_kws={"s": 80});

**Higher Order Relationships**
In the presence of these kind of higher-order relationships, lmplot() and regplot() can fit a polynomial regression model to explore simple kinds of nonlinear trends in the dataset:

In [None]:
sb.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),
           order=2, ci=None, scatter_kws={"s": 80});

# **Handling outliers**<br>
In the presence of outliers, it can be useful to fit a robust regression, which uses a different loss function to downweight relatively large residuals:

In [None]:
sb.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),
           ci=None, scatter_kws={"s": 80});

In [None]:
sb.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),
           robust=True, ci=None, scatter_kws={"s": 80});


When the y variable is binary, simple linear regression also “works” but provides implausible predictions

In [None]:
tips["big_tip"] = (tips.tip / tips.total_bill) > .15
sb.lmplot(x="total_bill", y="big_tip", data=tips,
           y_jitter=.03);

The solution in this case is to fit a logistic regression, such that the regression line shows the estimated probability of y = 1 for a given value of x. <br>
This is computationally expensive, use ci=None for faster performance. 

In [None]:
sb.lmplot(x="total_bill", y="big_tip", data=tips,
           logistic=True, y_jitter=.03);

# **Assignment 12**<br>
1. Select a dataset
2. Perform linear regression on two variables in the dataset. 

# **Conditioning on other variables**
lmplot() combines regplot() with FacetGrid to provide an easy interface to show a linear regression on “faceted” plots that allow you to explore interactions with up to three additional categorical variables.

The best way to separate out a relationship is to plot both levels on the same axes and to use color to distinguish them:

In [None]:
sb.lmplot(x="total_bill", y="tip", hue="smoker",
           col="time", row="sex", data=tips);


# **Assignment 13:**
1. Select a dataset
2. Use implot to plot conditional relationships within the dataset

In [None]:
df = sb.load_dataset('titanic')
current_palette = sb.color_palette("bright")
sb.set_style("dark")
sb.set_palette(current_palette)
x = df['age']
y = df['alive']
sb.distplot(x, bins=5, rug=True);

In [None]:
sb.boxenplot(x, hue=None) #standard data inputs, only x is required

Check out the [Seaborn Official Tutorial](https://seaborn.pydata.org/tutorial.html) 

https://medium.com/analytics-vidhya/5-lesser-known-seaborn-plots-most-people-dont-know-82e5a54baea8
