# <center>Data visualization using Seaborn</center>

## This Notebook contains the usage of Seaborn plots. Below are the type of plots that are covered:
- <a href='#1'>Bar plot</a>
- <a href='#2'>Dist plot</a>
- <a href='#3'>Box plot</a>
- <a href='#4'>Strip plot</a>
- <a href='#5'>Pair Grid</a>
- <a href='#6'>Violin plot</a>
- <a href='#7'>Clustermap</a>
- <a href='#8'>Heatmap</a>
- <a href='#9'>Facet plot</a>
- <a href='#10'>Joint plot</a>
- <a href='#11'>Pair plot</a>

## Import libraries

In [None]:
import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

### Some of the palette names:
- coolwarm
- husl
- winter_r
- spring
- autumn

## <center> <a id='1'> Bar plot </a> </center>
- A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent.
- The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a line graph.


In [None]:
tips = sns.load_dataset('tips')

In [None]:
tips.head()

### Plot decribes the mean of tip on different days.
- Mean of all the tips on a particular day is calculatted and plotted on different days.
- If we don't mention anything, it calculates mean of that attribute.

In [None]:
sns.barplot(x='day', y='tip', data=tips)

### Visualize based on different categories in a column.
- 'hue' is used to visualize the data of different categories in one plot.
- 'palette' is used to change the colour of the plot.
- hue: 'sex' and palette: 'winter_r'

In [None]:
sns.barplot(x='day', y='total_bill', data=tips, hue='sex', palette='winter_r')

### When the hue is assigned to 'smoker'

In [None]:
sns.barplot(x='day', y='total_bill', data=tips, hue='smoker', palette='winter_r')

### Visualize vertically.
- Change the attributes of x-axis and y-axis.

In [None]:
sns.barplot(x='total_bill', y='day', data=tips, palette='spring')

### We can specify the order od bars using 'order' attribute. This will be very useful in some cases.

In [None]:
sns.barplot(x='day', y='tip', data=tips, palette='spring', order=['Sat', 'Sun', 'Thur', 'Fri'])

### Use 'median' instead of 'mean' to plot a bar graph. In this case, we have to change the estimator to 'median'.

In [None]:
from numpy import median
sns.barplot(x='day', y='total_bill', data=tips, estimator=median, palette='spring')

### Use all the attributes that we have learned and plot.

In [None]:
sns.barplot(x='smoker', y='tip', data=tips, estimator=median, hue='sex', palette='coolwarm')

### Change confidence interval 'ci'

In [None]:
sns.barplot(x='smoker', y='tip', data=tips, ci=100)
# ci - confience interval (error part)

### Change capsize of the error bars.

In [None]:
sns.barplot(x='day', y='total_bill', data=tips, capsize=0.3, palette='husl')

### See the difference when the capsize is changed.

In [None]:
sns.barplot(x='day', y='total_bill', data=tips, hue='sex', capsize=0.2, palette='husl')

In [None]:
tips.head()

### New palette name: 'autumn'

In [None]:
sns.barplot(x='size', y='tip', data=tips, capsize=0.15, palette='autumn')

### Change colour

In [None]:
sns.barplot(x='size', y='tip', data=tips, capsize=0.15, palette='husl')

### Use normal colours with saturation value to change colours of plots

In [None]:
sns.barplot(x='size', y='tip', data=tips, capsize=0.15, color='red', saturation=0.7)

## <center> <a id='2'> Dist plot </a> </center>
- The distribution plot visualizes the distribution of data.
- The distribution plot is suitable for comparing range and distribution for groups of numerical data. The distribution plot is not relevant for detailed analysis of the data as it deals with a summary of the data distribution.

In [None]:
num = np.random.randn(100)

### Dist plot using seaborn - Plots histogram and kde(Kernel Density Estimation) in one plot.

In [None]:
sns.distplot(num)

### Change colour of the plot

In [None]:
sns.distplot(num, color='red')

### Display label of x-axis - Use Pandas Series directly.

In [None]:
label_dist = pd.Series(num, name='variable x')

In [None]:
sns.distplot(label_dist)

### Change the orientation of the graph.

In [None]:
sns.distplot(label_dist, vertical=True, color='red')

### Plot only kde. Remove histogram plot.

In [None]:
# Univariate Kernel Density Estimate(KDE) plot.
sns.distplot(label_dist, hist=False)

### Use rug to display the distribution of data points at the bottom of the plot.

In [None]:
sns.distplot(label_dist, rug=True, hist=False, color='green')

## <center> <a id='3'> Box plot </a> </center>
- A simple way of representing statistical data on a plot in which a rectangle is drawn to represent the second and third quartiles, usually with a vertical line inside to indicate the median value. The lower and upper quartiles are shown as horizontal lines either side of the rectangle.

In [None]:
tips = sns.load_dataset('tips')
tips.head()

### Box plot describes the data very well using quaterlies. Easy to find outliers.

In [None]:
sns.boxplot(x=tips['size'])

In [None]:
sns.boxplot(x=tips['total_bill'])

In [None]:
tips['total_bill'].describe()

In [None]:
sns.boxplot(x='sex', y='total_bill', data=tips)

In [None]:
sns.boxplot(x='day', y='total_bill', hue='sex', data=tips, palette='husl')

In [None]:
tips.head()

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips, hue='time')

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips, order=['Sat', 'Sun', 'Thur', 'Fri'])

In [None]:
iris = sns.load_dataset('iris')

In [None]:
iris.head()

In [None]:
sns.boxplot(data=iris)

In [None]:
sns.distplot(iris.sepal_width)

In [None]:
sns.boxplot(data=iris, orient='horizontal', palette='husl')

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips)

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='black')

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='0.3')

## <center> <a id='4'> Strip plot </a> </center>
- A strip plot is a graphical data anlysis technique for summarizing a univariate data set. 
- The strip plot consists of: Horizontal axis = the value of the response variable.
- It is typically used for small data sets (histograms and density plots are typically preferred for larger data sets).

In [None]:
tips=sns.load_dataset('tips')

In [None]:
tips.head()

In [None]:
sns.stripplot(x=tips['tip'], color='green')

In [None]:
sns.stripplot(x=tips['total_bill'], color='blue')

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips)

In [None]:
sns.stripplot(x='day', y='total_bill', data=tips)

In [None]:
sns.stripplot(x='day', y='total_bill', data=tips, jitter=True)

In [None]:
sns.stripplot(x='day', y='total_bill', data=tips, jitter=0.2)

In [None]:
sns.stripplot(x='total_bill', y='day', data=tips, jitter=True)

In [None]:
sns.stripplot(x='total_bill', y='day', data=tips, linewidth=0.8, jitter=True)

In [None]:
sns.stripplot(x='day', y='total_bill', data=tips, hue='sex', jitter=True)

In [None]:
sns.stripplot(x='day', y='total_bill', data=tips, hue='sex', jitter=True, split=True)

In [None]:
sns.stripplot(x='day', y='total_bill', data=tips, hue='smoker', jitter=True, split=True)

In [None]:
sns.stripplot(x='sex', y='tip', hue='day', data=tips, marker='D', jitter=True)

In [None]:
sns.stripplot(x='sex', y='tip', hue='day', data=tips, marker='D', jitter=True, size=7)

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.stripplot(x='day', y='total_bill', data=tips)

In [None]:
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.stripplot(x='day', y='total_bill', data=tips, jitter=True)

In [None]:
sns.stripplot(x='day', y='total_bill', data=tips, jitter=True)
sns.violinplot(x='day', y='total_bill', data=tips, color='0.9')

## <center> <a id='5'> PairGrid plot </a>  </center>
- PairGrid allows us to draw a grid of subplots using the same plot type to visualize data. 
- Unlike FacetGrid, it uses different pair of variable for each subplot. 
- It forms a matrix of sub-plots. It is also sometimes called as “scatterplot matrix”.

In [None]:
iris = sns.load_dataset('iris')

In [None]:
x = sns.PairGrid(iris)
x = x.map(plt.scatter)

In [None]:
x = sns.PairGrid(iris)
x = x.map_diag(plt.hist)
x = x.map_offdiag(plt.scatter)

In [None]:
x = iris.petal_width.value_counts()
x = x.sort_index()
x.plot('bar')

In [None]:
x = sns.PairGrid(iris, hue='species')
x = x.map_diag(plt.hist)
x = x.map_offdiag(plt.scatter)

In [None]:
iris.species.value_counts()

In [None]:
x = sns.PairGrid(iris, hue='species')
x = x.map_diag(plt.hist)
x = x.map_offdiag(plt.scatter)
x.add_legend()

In [None]:
x = sns.PairGrid(iris, hue='species', palette='winter_r') # coolwarm, husl, winter_r, RdBu.
x = x.map_diag(plt.hist)
x = x.map_offdiag(plt.scatter)
x.add_legend()

In [None]:
x = sns.PairGrid(iris, hue='species', palette='winter_r') # autumn, coolwarm, husl, winter_r, RdBu.
x = x.map_diag(plt.hist, histtype='step', linewidth=4)
x = x.map_offdiag(plt.scatter)
x.add_legend()

In [None]:
x = sns.PairGrid(iris, vars=['petal_length', 'petal_width'])
x = x.map(plt.scatter)

In [None]:
x = sns.PairGrid(iris, hue='species', vars=['petal_length', 'petal_width'])
x = x.map_diag(plt.hist, edgecolor='black')
x = x.map_offdiag(plt.scatter, edgecolor='black')
x = x.add_legend()

In [None]:
x = sns.PairGrid(iris, x_vars=['petal_length', 'petal_width'],
                y_vars=['sepal_length', 'sepal_width'])
x = x.map(plt.scatter)

In [None]:
x = sns.PairGrid(iris)
x = x.map_diag(plt.hist)
x = x.map_upper(plt.scatter)
x = x.map_lower(sns.kdeplot)

In [None]:
x = sns.PairGrid(iris, hue='species')
x = x.map_diag(plt.hist, edgecolor='black')
x = x.map_upper(plt.scatter)
x = x.map_lower(sns.kdeplot)
x = x.add_legend()

In [None]:
x = sns.PairGrid(iris, hue='species', hue_kws={'marker': ['D', 's', '+']})
x = x.map(plt.scatter)
x = x.add_legend()

## <center> <a id='6'> Violin plot </a> </center>
- A violin plot is a method of plotting numeric data. It is similar to a box plot with a rotated kernel density plot on each side.
- A violin plot has four layers. The outer shape represents all possible results, with thickness indicating how common.
- The next layer inside represents the values that occur 95% of the time.

In [None]:
tips = sns.load_dataset('tips')

In [None]:
tips.head()

In [None]:
sns.violinplot(x=tips['tip'])

In [None]:
sns.violinplot(x='size', y='total_bill', data=tips)

In [None]:
sns.violinplot(x='day', y='total_bill', hue='sex', data=tips)

In [None]:
sns.violinplot(x='day', y='total_bill', hue='sex', data=tips, split=True)

In [None]:
sns.violinplot(x='day', y='total_bill', hue='sex', data=tips, split=True, 
               inner='quartile')

In [None]:
sns.violinplot(x='day', y='total_bill', hue='sex', data=tips, split=True, 
               inner='quartile', scale='count')

In [None]:
sns.violinplot(x='day', y='total_bill', hue='sex', data=tips, 
               inner='quartile', scale='count')

In [None]:
sns.violinplot(x='day', y='total_bill', hue='smoker', data=tips, 
               inner='quartile', scale='count')

In [None]:
sns.violinplot(x='day', y='total_bill', hue='smoker', data=tips, 
               inner='stick', scale='count')

In [None]:
sns.violinplot(x='day', y='total_bill', hue='smoker', data=tips, 
               inner='stick')

In [None]:
# Here, we can compare the number of customers on different days by width of violin plot.
sns.violinplot(x='day', y='total_bill', hue='smoker', data=tips, 
               inner='stick', scale='count', scale_hue=False, split=True)

In [None]:
sns.violinplot(x='day', y='total_bill', hue='smoker', data=tips, 
               inner='stick', scale='count', scale_hue=False, split=True, bw=0.7)

In [None]:
sns.violinplot(x='day', y='total_bill', hue='smoker', data=tips, 
               inner='stick', scale='count', scale_hue=False, split=True, bw=0.1)

## <center> <a id='7'> Clustermap plot </a> </center>
- Plots a matrix dataset as a hierarchically-clustered heatmap.

In [None]:
flights = sns.load_dataset('flights')

In [None]:
flights.head()

In [None]:
flights = flights.pivot('month', 'year', 'passengers')

In [None]:
flights

In [None]:
from matplotlib.colors import ListedColormap
sns.heatmap(flights)

In [None]:
sns.clustermap(flights)

In [None]:
sns.clustermap(flights, col_cluster=False)

In [None]:
sns.clustermap(flights, row_cluster=False)

In [None]:
sns.clustermap(flights, cmap='Blues_r', linewidth=1) # coolwarm, Blues_r

In [None]:
sns.clustermap(flights, cmap='coolwarm', linewidth=2, figsize=(8,6))

### Standardize across col or rows = 0/1 = rows/columns

In [None]:
sns.clustermap(flights, cmap='coolwarm', standard_scale=1) # 1 - columns

In [None]:
sns.clustermap(flights, cmap='coolwarm', standard_scale=0) # 0 - rows

### Normalize our dataset = z_score = 0/1 = rows/columns

In [None]:
sns.clustermap(flights, cmap='coolwarm', z_score=0) # 0 - rows

## <center> <a id='8'> Heatmap plot </a> </center>
- A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors. 
- It is a bit like looking a data table from above. It is really useful to display a general view of numerical data, not to.

In [None]:
normal = np.random.rand(12, 15)

In [None]:
sns.heatmap(normal, cmap='coolwarm')

In [None]:
sns.heatmap(normal, annot=True, cmap='coolwarm')

In [None]:
sns.heatmap(normal, vmin=0, vmax=2, cmap='coolwarm')

In [None]:
sns.heatmap(flights, cmap='coolwarm', annot=True, fmt='d', linewidths=0.3)

In [None]:
sns.heatmap(flights, cmap='coolwarm', annot=True, fmt='d', 
            linewidths=0.3, vmin=100, vmax=650)

In [None]:
sns.heatmap(flights, cmap='RdBu', annot=True, fmt='d') 
# color maps: RdBu, summer, coolwarm, winter_r

### Center the map at a value

In [None]:
sns.heatmap(flights, center=flights.loc['June'][1954], annot=True, 
            fmt='d', cmap='coolwarm')

In [None]:
sns.heatmap(flights, center=flights.loc['March'][1959], annot=True, 
            fmt='d', cmap='coolwarm', cbar=False)

## <center> <a id='9'> FacetGrid plot </a></center>

In [None]:
tips = sns.load_dataset('tips')

In [None]:
tips.head()

In [None]:
sns.FacetGrid(row='smoker', col='time', data=tips)

In [None]:
x = sns.FacetGrid(row='smoker', col='time', data=tips)
x = x.map(plt.hist, 'total_bill', edgecolor='black')

In [None]:
x = sns.FacetGrid(row='smoker', col='time', data=tips)
x = x.map(plt.hist, 'total_bill', edgecolor='black', color='green', 
          bins=15)

In [None]:
x = sns.FacetGrid(row='smoker', col='time', data=tips)
x = x.map(plt.scatter, 'total_bill', 'tip')

In [None]:
x = sns.FacetGrid(row='smoker', col='time', data=tips)
x = x.map(sns.regplot, 'total_bill', 'tip')

In [None]:
x = sns.FacetGrid(tips, col='time', hue='smoker')
x = x.map(plt.scatter, 'total_bill', 'tip')
x = x.add_legend()

In [None]:
x = sns.FacetGrid(tips, col='day')
x = x.map(sns.boxplot, 'total_bill', 'time')

In [None]:
x = sns.FacetGrid(tips, col='day', size=4, aspect=1)
x = x.map(sns.boxplot, 'time', 'total_bill')

In [None]:
x = sns.FacetGrid(tips, col='day', col_order=['Thur', 'Fri', 'Sat', 'Sun'], 
                  size=4, aspect=0.4)
x = x.map(sns.boxplot, 'time', 'total_bill', color='red')

In [None]:
x = sns.FacetGrid(tips, col='time', hue='smoker', palette='husl')
x = x.map(plt.scatter, 'total_bill', 'tip')
x = x.add_legend()

## <center> <a id='10'> Joint plot </a> </center>
- Plot different plots in the same plot.

In [None]:
tips = sns.load_dataset('tips')
tips.head()

In [None]:
iris = sns.load_dataset('iris')
iris.head()

In [None]:
sns.jointplot(x='total_bill', y='tip', data=tips)

In [None]:
sns.jointplot(x='sepal_length', y='sepal_width', data=iris)

In [None]:
sns.jointplot(x='total_bill', y='tip', data=tips, kind='reg',
             color='green')

In [None]:
sns.jointplot(x='total_bill', y='tip', data=tips, kind='hex')

In [None]:
sns.jointplot(x='total_bill', y='tip', data=tips, kind='kde')

In [None]:
sns.jointplot(x='sepal_length', y='sepal_width', data=iris, kind='kde')

### Stat function

In [None]:
from scipy.stats import spearmanr

In [None]:
sns.jointplot(x='total_bill', y='size', data=tips)

In [None]:
sns.jointplot(x='total_bill', y='size', data=tips, stat_func=spearmanr)

In [None]:
sns.jointplot(x='total_bill', y='size', data=tips, ratio=4, size=5)

## <center><a id='11'> Pair plot </a> </center>

In [None]:
iris = sns.load_dataset('iris')

In [None]:
sns.pairplot(iris)

In [None]:
tips = sns.load_dataset('tips')

In [None]:
sns.pairplot(tips)

In [None]:
sns.pairplot(iris, hue='species')

In [None]:
sns.pairplot(iris, hue='species', palette='husl', markers=['o', 'D', 's'])

In [None]:
sns.pairplot(iris, vars=['sepal_length', 'sepal_width'])

In [None]:
sns.pairplot(iris, size=3, vars=['sepal_length', 'sepal_width'])

In [None]:
sns.pairplot(iris, x_vars=['petal_length', 'petal_width'], 
             y_vars=['sepal_length', 'sepal_width'], hue='species')

In [None]:
sns.pairplot(iris, diag_kind='kde', palette='husl', hue='species')

### Fit regression line.
- kind = 'reg'

In [None]:
sns.pairplot(iris, diag_kind='kde', palette='husl', hue='species',
            kind='reg')

## References:
- [Link to the official Seaborn Documentation](https://seaborn.pydata.org/)
- [Python for Data Visualization - using Seaborn](https://www.youtube.com/playlist?list=PL998lXKj66MpNd0_XkEXwzTGPxY2jYM2d)

### I will add more plots in the future. Hope, this kernel helps you to visualize better and improve your skills.
### Do UPVOTE if you find it useful.