Seaborn is a library that builds on top of matplotlib.
- Intregrated with pandas
- High level interface to plot data
- Additional functionality, better defaults
- Less code than matplotlib 
- All of the functionality is accessible at the top level.

Cons?

In [None]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pydataset import data

Seaborn API Reference  
https://seaborn.pydata.org/api.html#

### Different type of plots in Seaborn

![seaborn_functions.png](attachment:seaborn_functions.png)

 - source: https://seaborn.pydata.org/tutorial/function_overview.html

In [None]:
# look for all datasets available in seaborn

print(sns.get_dataset_names())

In [None]:
# Option 1 : Load directly from sns dataset
tips = sns.load_dataset('tips')

In [None]:
# Option 2: Load using pydataset
tips = data('tips')

In [None]:
data('tips', show_doc = True)

In [None]:
tips.head()

In [None]:
tips.info()

#### Types of data:

- Continuous Data:- Numeric data with possibly infinite resolution
    - Height
    - Weight
    - total_bill and tip
    - temperature
- Categorical - Distinct categories
    - weekdays
    - Gender
    - 'smoker'
    - letter grades (A, B, C..)

- Discrete - Distinct numeric categories 
     - party_size - numeric but discete
     - number of customer complaints
     - number of flaws or defects.

### Relational Plots 

In [None]:
# scatterplot in matplotlib



In [None]:
# Explore relationship between total_bill and tip using relplot



In [None]:
# Update defaults (rc params in matplotlib)


sns.set()  # Alias for sns.set_theme()


In [None]:
# same plot above, but with different sns defaults (set using sns.set())



### hue, size and style arguments

In [None]:
# visualize if relationship is different for smoker vs non-smoker (use hue argument)

# sns.relplot(data = tips, x = 'total_bill', y = 'tip')

## Relplot with 'kind' argument

In [None]:
# lineplot for total bill vs tips - may not be appropriate in this case. Note use of 'kind' argument


# sns.relplot(data = tips, x = 'total_bill', y = 'tip')

In [None]:
# lineplot for discrete values. The lineplot will use mean value at each size to draw a line 
# shaded region is 95% CI. Can be turned off with 'ci = None'

# sns.relplot(data = tips, x = 'total_bill', y = 'tip')

https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

### Small multiple (https://en.wikipedia.org/wiki/Small_multiple)
- similar graphs or charts using the same scale and axes
- easy to compare


- Main idea: Pick a categorical features and create a chart for each category

In [None]:
#  FacetGrid with 'col'

# sns.relplot(data = tips, x = 'total_bill', y = 'tip')

In [None]:
# Facetgrid with lineplot

# sns.relplot(data = tips, x = 'total_bill', y = 'tip')

#### Key takeaways?
- Tip amount generally increases with total_bill
- Waiter works in evenings on Sun and sat
- Waiter works during lunchtime on Thursday


#### Ways to add DF columns to the chart (add new dimensions to the chart)

- hue
- col - creates subplots
- style


#### Using Scatterplot or lineplots instead of relplot

In [None]:
# scatterplot - returns an axes level object


## Distributions: displot

In [None]:
# displot for total_bill (hist, kde, rug)

# sns.displot(data = tips, x = 'total_bill')

In [None]:
# histogram with hue and stacking and palette


# sns.displot(data = tips, x = 'total_bill')

In [None]:
# use 'col' argument with 'sex' to create 'small multiples'

# sns.displot(data = tips, x = 'total_bill')

In [None]:
#histplot will return an axes level object



In [None]:
# Set kind = 'kde'. kde = probability of finding a observartions at particular value of x. 
# Smooth out version of histogram. Estimating probabilty density function (PDF) in a non-parametric way.



### Categorial Plots

#### Boxplots

In [None]:
# we can make boxplot with kind = 'box' argument. Returns a figure level object

sns.catplot(data = tips, y = 'tip')

In [None]:
# descriptive statistics for tip

tips.tip.describe()

In [None]:
# Create a figure with 4 boxplots



In [None]:
# swarmplot 



In [None]:
# violin plot


In [None]:
# bar plot 'tip' by gender. Shows mean for each category instead with CI



#### Pairplot

In [None]:
tips.dtypes

In [None]:
# pairplot for whole dataframe

sns.pairplot(tips)

In [None]:
# argument corner = True will not render duplicate plots

sns.pairplot(tips, corner = True)

In [None]:
# use hue argument to visualize relationship based of different categories



In [None]:
# use different plot type. 'reg' plot instead of 'scatter'



In [None]:
# we can limit the number of variable to plot using vars argument



#### Jointplot

In [None]:
# joint plot total_bill vs tip

# sns.jointplot(data = tips, x = 'total_bill', y = 'tip')

In [None]:
# jointplot with regression line


#### Heatmap

In [None]:
#crosstab of time vs smoker
ctab = pd.crosstab(tips.time, tips.smoker)
ctab

In [None]:
#create a heatmap



Seaborn color palettes: https://seaborn.pydata.org/tutorial/color_palettes.html  
Check out this great post too: https://medium.com/@morganjonesartist/color-guide-to-seaborn-palettes-da849406d44f

- SEQUENTIAL: e.g. different shades of same color.  Appropriate when data range from relatively low or uninteresting values to relatively high or interesting values (or vice versa)
- DIVERGING - highlight both low and high values

In [None]:
# Heatmap using mpg dataset

mpg = data('mpg')
mpg.head()

In [None]:
# calculate correlation for numeric variables using panda's .corr() method

mpg.corr()

In [None]:
labels = ['Displacement', 'Model Year', 'cylinders', 'City MPG', 'Highway MPG']

In [None]:
# heatmap for correlation table above

