# Data Visualization with Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

For detailed documentation and examples of Seaborn, please visit: https://seaborn.pydata.org/

### Distribution plots with Seaborn
When dealing with a set of data, often the first thing you’ll want to do is get a sense for how the variables are distributed. Seaborn comes with a wide range of graphs that allows us to inspect the distribution of data.

Seaborn is usually imported with a nickname of sns.

In [None]:
import seaborn as sns

Seaborn comes with in-built datasets. To get a list of all datasets that comes with Seaborn you can use: ```sns.get_dataset_names()``` command.

In [None]:
tips = sns.load_dataset("tips")

In [None]:
tips.head()

#### Plotting univariate distribution
The most convenient way to take a quick look at a univariate distribution in seaborn is the distplot() function. By default, this will draw a [histogram](https://en.wikipedia.org/wiki/Histogram) and fit a [kernel density estimate (KDE)](https://en.wikipedia.org/wiki/Kernel_density_estimation).

In [None]:
sns.distplot(tips['total_bill'], bins=30) # bins = Specification of hist bins
# sns.distplot(data['total_bill'], kde=False, bins=30) # Kde=False disables the kernel density estimation line.

#### Bivariate distribution with jointplot()
It can also be useful to visualize a bivariate distribution of two variables. The easiest way to do this in seaborn is to just use the jointplot() function, which creates a multi-panel figure that shows both the bivariate (or joint) relationship between two variables along with the univariate (or marginal) distribution of each on separate axes.

In [None]:
sns.jointplot(x='total_bill', y='tip', data=tips)

The ```kind``` argument specifies the kind of variables we are comparing. The ```kind``` argument can the following values: “scatter” (default), “reg”, “resid”, “kde”, “hex”

In [None]:
# Hexbin plot
sns.jointplot(x='total_bill', y='tip', data=tips, kind='hex')

#### Visualizing pairwise relationships in a dataset
To plot multiple pairwise bivariate distributions in a dataset, you can use the ```pairplot()``` function. This creates a matrix of axes and shows the relationship for each pair of columns in a DataFrame. by default, it also draws the univariate distribution of each variable on the diagonal Axes.

In [None]:
# pairplot for tips data for male and female customers (hue='sex'). 
sns.pairplot(tips, hue='sex')

### Plotting categorical data
In the tips dataframe, we have some categorical variables (sex, smoker, day, time). In seaborn, there are several different ways to visualize a relationship involving categorical data. For a detailed description of all kinds of categorical plots available in seaborn please refer: https://seaborn.pydata.org/tutorial/categorical.html#categorical-tutorial

#### Barplot and Countplot
In seaborn, the barplot() function operates on a full dataset and applies a function to obtain the estimate (taking the mean by default). When there are multiple observations in each category, it also uses bootstrapping to compute a confidence interval around the estimate and plots that using error bars.

In [None]:
sns.barplot(x='sex',y='total_bill',data=tips)

The countplot() is similar to barplot() except the estimator is used to count the number of occurrences.

In [None]:
sns.countplot(x='sex', data=tips)

# the below graph shows a simple distribution of number of male and female customers.

#### Boxplots
Boxplot shows the three quartile values of the distribution along with extreme values. The “whiskers” extend to points that lie within 1.5 IQRs of the lower and upper quartile, and then observations that fall outside this range are displayed independently. This means that each value in the boxplot corresponds to an actual observation in the data.

In [None]:
sns.boxplot(x='day', y='total_bill', hue='sex', data=tips)

### Matrix plots
A matrix plot is a plot of matrix data. A matrix plot is a color-coded diagram that has rows data, columns data, and values. We can create a matrix plot in seaborn using the heatmap() function.

With the heatmap() function, we specify the matrix data that we want to plot.

In [None]:
tips.head()

In [None]:
# correlation info for the tips dataset
tips_corr = tips.corr()
tips_corr

The df.corr() function generates the correlation information for the numerical attributes in the dataset.
We will plot a heatmap to visualize the correlation between different attributes of the dataset.

In [None]:
sns.heatmap(tips_corr, annot=True)

The correlation between two variables represent the statistical relationship between them. The value of correlation lies between -1 and +1. A correlation could be positive, meaning both variables move in the same direction, or negative, meaning that when one variable's value increases, the other variables' values decrease. 

### Seaborn Exercises
For this exercise we will use the famous Titanic dataset. The dataset contains data for 887 of the real Titanic passengers. Each row represents one person. The columns describe different attributes about the person including whether they survived.

For now we will use the dataset for creating useful visualizations.

You don't have to download the dataset since it comes in-built with Seaborn.

In [None]:
# do not change the following lines of code.
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('whitegrid')
titanic = sns.load_dataset('titanic') # load Titanic dataset as a pandas dataframe

titanic.head() # check the first 5 rows of the dataframe.

### Exercise 1
Create a seaborn jointplot to represent the bivariate distribuation of ```fare``` and ```age``` columns of the titanic dataframe. Label the x-axis as 'fare' and y-axis as 'age'.

### Exercise 2
Plot a univariate distribution of ```fare``` columns of the titanic dataframe using the seaborn ```distplot()```. Set the number of bins=30 and the color=red.

### Exercise 3
Create a distribution of number of male and female passengers using seaborn ```countplot()```.

### Exercise 4
Create a correlation matrix plot for all the attributes of the Titanic dataset using the seaborn ```heatmap()``` and give it a title of "Titanic correlation".
