# Introduction

Seaborn is a Python data visualization library based on matplotlib, providing a high-level interface for drawing attractive and informative statistical graphics.

Seaborn uses fewer syntax and has stunning default themes, while Matplotlib is more easily customizable through accessing the classes.

Thus, use which package to visualize depends on your purpose.
Check out below website to see the comparison.
https://analyticsindiamag.com/comparing-python-data-visualization-tools-matplotlib-vs-seaborn/

Contents:
1. distplot
2. lineplot
3. How to set title & graph size in seaborn 
4. heatmap
5. countplot
6. stripplot
7. swarmplot
8. relplot
9. jointplot


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [None]:
train = pd.read_csv('../input/house-prices-advanced-regression-techniques/train.csv')
test = pd.read_csv('../input/house-prices-advanced-regression-techniques/test.csv')

In [None]:
#check whether columns of train & test are equivalent
test.columns.isin(train.columns).all()

In [None]:
train

In [None]:
#check basic statistics
#below uses sales price as example
train['SalePrice'].describe()

# distplot
- combines the matplotlib histogram with the seaborn kdeplot() and rugplot()
- kde: default as True, will show the gaussian kernel density estimate (blue line)
- can set parameter "fit" as normal to compare the distance between the dataset and the normal distribution.

In [None]:
from scipy.stats import norm

In [None]:
sns.distplot(train['SalePrice'],fit=norm)

3 characteristics can be drawn:
- unlike normal distribution
- right-skewed distribution (+ve skew)
- peak appear in lower sales price (compared to max sales price)


# lineplot
see also: https://seaborn.pydata.org/generated/seaborn.lineplot.html

In [None]:
trasns_per_yr = train.groupby("YrSold").Id.count().reset_index().rename(columns={"Id":"transactions no"})
trasns_per_yr['YrSold'] = trasns_per_yr['YrSold'].astype(str)
sns.lineplot(data=trasns_per_yr, x="YrSold", y="transactions no")

# Set title & graph size

In [None]:
#Method 1
#assign fig,ax to adjust the graph size
#set title by ax.set_title()

fig,ax = plt.subplots(figsize = (7,5))
sns.lineplot(data=trasns_per_yr, x="YrSold", y="transactions no")
ax.set_title("No of transaction from 2006 to 2010")

- tune the font size and style by setting fontsize= & fontweight=

In [None]:
#Method 2
#adjust the graph size by plt.figure()
#set title by plt.title()

plt.figure(figsize=(7,5))
sns.lineplot(data=trasns_per_yr, x="YrSold", y="transactions no")
plt.title("No of transaction from 2006 to 2010", fontsize=14, fontweight='bold')

# heatmap
- plot rectangular data as a color-encoded matrix.

see also: https://seaborn.pydata.org/generated/seaborn.heatmap.html

In [None]:
trasns_per_yr_month = train.groupby(["YrSold","MoSold"]).Id.count().reset_index().rename(columns={"Id":"transactions no"})

In [None]:
#first input as y-axis, second input as x-axis
trasns_per_yr_month = trasns_per_yr_month.pivot("YrSold","MoSold", "transactions no")

In [None]:
sns.heatmap(trasns_per_yr_month)

In [None]:
#Annotate with no & align each cells to square-shaped
sns.heatmap(trasns_per_yr_month, annot=True, square=True)

In [None]:
#change the colour
sns.heatmap(trasns_per_yr_month, annot=True, square=True, cmap='Reds')

I demonstrate by red colour. Actually, there are still many commands can be used, below are some examples:

- YlGnBu

yellow(lowest) to blue(highest) gradient

- Blues

blue gradient

- BuPu

blue(lowest) to purple(highest) gradient

- Greens

green gradient

- YlOrRd

yellow(lowest) to red(highest) gradient

- rainbow

rainbow colour: red(highest) & purple(lowest)

- copper

lightest(highest) & darkest(lowest)

# countplot
- Show the counts of observations in each categorical bin using bars.

see also: https://seaborn.pydata.org/generated/seaborn.countplot.html#seaborn.countplot

In [None]:
sns.countplot(x='MSSubClass', data=train)

In [None]:
sns.countplot(x='MSSubClass', data=train)

In [None]:
#crosscheck
train.groupby('MSSubClass').Id.count()

# stripplot
- Draw a scatterplot where one variable is categorical.

A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

see also: https://seaborn.pydata.org/generated/seaborn.stripplot.html

In [None]:
sns.stripplot(data=train, x="SalePrice", y="MSZoning")

# swarmplot
- similar to stripplot(), but the points are adjusted (only along the categorical axis) so that they don’t overlap

see also: https://seaborn.pydata.org/generated/seaborn.swarmplot.html

In [None]:
plt.figure(figsize=(20,10))
fig = sns.swarmplot(data=train, x="MasVnrArea", y="MasVnrType")

# relplot

- show the relationship between two variables
- similar to scatterplot()

see also: https://seaborn.pydata.org/generated/seaborn.relplot.html

In [None]:
#There is actually method 3 for adjusting graph size
#set "height" & "aspect"
#not applicable in all sns plot

sns.relplot(data=train, x="SalePrice", y="LotArea", 
            height=5, # make the plot 5 units high
            aspect=3) # height= 3 times width

# jointplot

- two variables with bivariate and univariate graphs
- from below graphs, change in "OverallQual" is more associated with the sales price 

see also: http://seaborn.pydata.org/generated/seaborn.jointplot.html

In [None]:
#OverallCond: Rates the overall condition of the house
sns.jointplot(data=train, x="SalePrice", y="OverallCond", kind="hex")

In [None]:
#OverallQual: Rates the overall material and finish of the house
sns.jointplot(data=train, x="SalePrice", y="OverallQual", kind="hex")