# Data Visualization with Seaborn 

## Import Python libraries


In this section, we import the necessary Python libraries Numpy, Pandas and Matplotlib with their usual shorthand notation.

In [None]:
# Ignore warnings

import warnings

warnings.simplefilter(action = "ignore", category = FutureWarning)


In [None]:
# Import the dependencies

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Import Seaborn


We can import Seaborn as

`import seaborn`

The shorthand notation of Seaborn is **sns**. So, we can import seaborn with usual shorthand notation.

`import seaborn as sns`

In [None]:
# Import seaborn

import seaborn as sns

## Import datasets


**Importing data as Pandas dataframe**

In this section, we will import a dataset. By default, this dataset loads as Pandas dataframe. 


In [None]:
data = pd.read_csv("pollution.csv")

We can view the first five lines of code with the following command. Also have a look at the other csv file for feature description. Determine the number of observations, as well as the number of attributes.

In [None]:
print(data.head())

## Set aesthetic parameters with set() method


We can set the aesthetic parameters of Seaborn plots with the **set()** method. The aesthetic parameters are context, style, palette, font, font_scale, color_codes, dictionary of rc parameters. 

So, we can set the default aesthetic parameters by calling Seaborn's **set()** method as follows:-

In [None]:
# Set the plot styles with set() method

sns.set()

## Seaborn colour palette


Colour plays very important role in data visualization. Colour adds various dimensions to a plot when used effectively. 
A palette means a flat surface on which a painter mixes paints.


Seaborn provides a function called **color_palette()**. It can be used to give colours to plots and adding aesthetic value to it. It return a list of colors defining a color palette.


There are several readily available Seaborn palettes. These are:-

•	Deep

•	Muted

•	Bright

•	Pastel

•	Dark

•	Colorblind

Besides these we can also create new palettes.

There is another function **seaborn.palplot()** which deals with color palettes. This function plots the color palette as a horizontal array.


**Qualitative Color Palettes**

Qualitative or categorical palettes are best suitable to plot the categorical data as follows:-


In [None]:
current_palette1 = sns.color_palette()

sns.palplot(current_palette1)

plt.show()


We can see the desired number of colors by passing a value to the n_colors parameter. Here, the **palplot()** function is used to plot the array of colors horizontally.



**Sequential Color Palettes**

Sequential plots are suitable to express the distribution of data ranging from relative lower values to higher values within a range. Appending an additional character "s" to the color passed to the color parameter will plot the Sequential plot. 

We need to append 's' to the parameter like 'Greens' as follows:-


In [None]:
current_palette2 = sns.color_palette()

sns.palplot( sns.color_palette("Greens"))

plt.show()


**Diverging Color Palette**

Diverging palettes use two different colors. Each color represents variation in the value ranging from a common point in either direction. 

We assume plotting the data ranging from -1 to 1. The values from -1 to 0 takes one color and 0 to +1 takes another color. 

By default, the values are centered from zero. We can control it with parameter center by passing a value as follows:-


In [None]:
current_palette3 = sns.color_palette()

sns.palplot( sns.color_palette("BrBG", 7))

plt.show()


**Default Color Palette**

We can set the default color palette of a Seaborn plot using **set_palette()** function.

The arguments are same for both **set_palette()** and **color_palette()** functions, but the default Matplotlib 
parameters are changed so that the palette is used for all plots.


In [None]:
def sinplot(flip=1): 

    x = np.linspace(0, 15, 100) 

    for i in range(1, 10): 

        plt.plot(x, np.sin(x + i * .5) * (12 - i) * flip) 


sns.set_style("white")

sns.set_palette("husl") 

sinplot() 

plt.show()


## Plotting Univariate Distribution with distplot()


The most important thing to do while analysing the data is to understand its distribution. Seaborn helps us to understand the univariate distribution of data. Note that we are going to visualize the "'CO2 Emissions(g/km)" feature in the forthcoming examples, but you must consider the other features (if meaningful) for a proper analysis.

The **distplot()** function provides a quick look at univariate distribution. This function will plot a histogram that fits the kernel density estimate of the data.

We can use the **distplot()** function as follows:-


In [None]:
sns.displot(data['CO2 Emissions(g/km)']) 

plt.show()


## Seaborn Histogram


Histograms represent the data distribution by forming bins along the range of the data and then drawing bars to show 
the number of observations that fall in each bin.

We can use the same **distplot()** function to plot a histogram as follows:-


In [None]:
sns.displot(data['CO2 Emissions(g/km)'], kde=True) 

plt.show()

The kde parameter is set to false. As a result, the representation of the kernel estimation plot will be removed and only histogram is plotted.


## Seaborn – Kernel Density Estimates


Kernel Density Estimation (KDE) is a way to estimate the probability density function of a continuous random variable. 
It is used for non-parametric analysis. 

Setting the hist parameter to false in distplot() function will yield the kernel density estimation plot.


In [None]:
sns.displot(data['CO2 Emissions(g/km)'], kind="kde") 

plt.show()


## Plotting Bivariate Distribution with jointplot()

Bivariate Distribution is used to determine the relation between two variables. This mainly deals with relationship between
two variables and how one variable is behaving with respect to the other. 

The best way to analyze Bivariate Distribution in seaborn is by using the **jointplot()** function. 

**Jointplot()** creates a multi-panel figure that projects the bivariate relationship between two variables and also the univariate distribution of each variable on separate axes.



In [None]:
sns.jointplot(x='CO2 Emissions(g/km)', y="Fuel Consumption Comb (mpg)", data=data)

plt.show()

## Seaborn - Scatter plot


A scatter plot can be used to demonstrate relationship between two variables x and y. A simple scatter plot can be drawn as follows:-

In [None]:
sns.scatterplot(x='CO2 Emissions(g/km)', y="Fuel Consumption Comb (mpg)", data=data)

plt.show()

The relationship between the variables can be shown for different subsets of the data using the `hue`, `size` and `style` parameters.

## Visualizing pairwise relationship with pairplot()


Some datasets contain many variables. In such cases, the relationship between each and every variable should be analyzed. 
So, we need to plot pairwise relationships in a dataset.

To plot multiple pairwise bivariate distributions, we can use **pairplot()** function. This shows the relationship for (n,2) combination of variable in a dataframe as a matrix of plots and the diagonal plots are the univariate plots.

We can plot a pairplot as as follows:-


In [None]:
sns.set_style("ticks")

sns.pairplot(data)

plt.show()

## Plotting categorical data

So far, we have covered histogram, scatter plot and kde plots. They are used to analyze the continuous variables under study. These plots are not suitable when the variable under study is categorical. 

When one or both the variables under study are categorical, we can use plots like striplot() and swarmplot() to plot categorical data.


## Seaborn – Strip Plot


A **stripplot()** is used to draw a scatterplot where one variable is categorical. It represents the data in sorted order along any one of the axis.

We can plot a stripplot as follows:-


In [None]:
sns.stripplot(x="Fuel Type", y="CO2 Emissions(g/km)", data=data)

plt.show()

We can add colour as a function of the number of cylinders

In [None]:
sns.stripplot(x="Fuel Type", y="CO2 Emissions(g/km)", data=data, hue="Cylinders")

plt.show()

## Linear relationships with Seaborn


Many datasets contain multiple quantitative variables, and the goal of an analysis is often to relate those variables to each other. We can use statistical models to estimate a simple relationship between two sets of observations. These are termed as regression models.


While building the regression models, we often check for multicollinearity, where we had to see the correlation between all the combinations of continuous variables and will take necessary action to remove multicollinearity if exists.


There are two main functions in Seaborn to visualize a linear relationship determined through regression. These functions are **regplot()** and **lmplot()**. There is a third function **residplot()** that plot the residuals of a linear regression model.


## Seaborn – Lm plot


The function **lmplot()** plot data and regression model fits across a FacetGrid. This function combines regplot() and FacetGrid. It is intended as a convenient interface to fit regression models across conditional subsets of a dataset.


We can plot a simple linear relationship between two variables using lmplot() as follows. Try to use the **hue** parameter on a third feature. What happens ?

Change the input variable, $x$, with other input variables, and keep the same output $y$. What can you remark about the importance of the input features ?


In [None]:
sns.lmplot(x="Engine Size(L)", y="CO2 Emissions(g/km)", markers="x", data=data)

plt.show()

The **regplot()** and **lmplot()** functions are closely related, but the former is an axes-level function while the latter 
is a figure-level function that combines **regplot()** and **FacetGrid**.

## Seaborn – Heat map


Seaborn heatmap() function plot rectangular data as a color-encoded matrix.


We can plot a correlation heatmap for a numpy array as follows. Interpret the observed values.


In [None]:

sns.heatmap(data.corr(numeric_only=True))
plt.show()

Now that you know a number of visualization tools using seaborn, you are able to conduct a light data analysis. You can also define one or two learning tasks on this data. Accord to the pairplot, choose an easy one and a complex one. You do not have to develop your own learning algorithms, but use the ones provided by the package `scikit-learn`