# Seaborn

Seaborn is a Python data visualization library based on matplotlib.

In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

What would you like to shows?

- **Trend:** A trend is defined as a pattern of change.
    - Line plot (sns.lineplot),
        * 
- **Relationship:** between variables in your data.
    - Bar plot (sns.barplot),
        * comparing quantities corresponding to different groups.
    - Heat map (sns.heatmap),
    - Scatter plot (sns.scatterplot),
        * show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
    - swarmplot, 
        * Categorical scatter plots show the relationship between a continuous variable and a categorical variable.
    - sns.regplot,
        * Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
    - sns.lmplot,
         * To draw multiple regression lines, if the scatter plot contains multiple, color-coded groups.
- **Distribution:**
    - Histogram (sns.histplot),
        * show the distribution of a single numerical variable.
    - kernel density distribution (sns.kdeplot),
        * KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
    - jointplot,
        * simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

### Line plot

In [None]:
# plot whole data as a function of index
sns.lineplot(data=data, label="label")
# to plot a slice of data, use data["column_name"]
sns.lineplot(data=data["column_name"], label="label")

### Bar plot

In [None]:
sns.barplot(x=data.index, y=data['column_name'])
# x determines what to use on the horizontal axis
# y determines the height of each bar

### Heatmap

In [None]:
sns.heatmap(data=data, annot=True)
# This ensures that the values for each cell appear on the chart.
# Leaving this out removes the numbers from each of the cells!

### Scatter plot

Scatter plots is usually used to highlight the relationship between two continuous variables. To create a simple scatter plot, two values should be specified:

x: the horizontal axis,

y: the vertical axis

In [None]:
sns.scatterplot(x=data['column_name_1'], y=data['column_name_2'])

We can add **a regression line**, or the line that best fits the data by **sns.regplot**.

In [None]:
sns.regplot(x=data['column_name_1'], y=data['column_name_2'])

However, scatter plots can be used to display the relationships between not only two, but also three variables! One way of doing this is by color-coding the points.

hue: color

In [None]:
sns.scatterplot(x=data['column_name_1'], 
                y=data['column_name_2'], 
                hue=data['column_name_3'])

Even we can add **two regression lines** corresponding to hue (two colors) by **sns.lmplot**

The sns.lmplot command  wors slightly differently than other commands.

Instead of setting x=data['column_name_1'] to select the 'column_name_1' column in data, we set x='column_name_1' to specify the name of the column only, similarly for y and hue. We specify the dataset with data=data.

In [None]:
sns.lmplot(x="column_name_1", y="column_name_2", 
           hue="column_name_3", data=data)

### Categorical scatter plot

The scatter plot can be used to feature a categorical variable on one of the main axes by **sns.swarmplot**.

In [None]:
sns.swarmplot(x=data['column_name_categorical'],
              y=insurance_data['column_name'])

### Histogram

It shows the distribution in values for data['column_name']

In [None]:
sns.histplot(data['column_name'])

### Density plot
kernel density estimate (KDE) by **sns.kdeplot**

In [None]:
sns.kdeplot(data=data['column_name'], shade=True)

### 2D KDE plot
two-dimensional kernel density estimate plot with the **sns.jointplot**

In [None]:
sns.jointplot(x=data['column_name_1'], y=data['column_name_2'], kind="kde")

### Color-coded plots

To create color-coded histogram plot, we can should add **hue**.

data: provides the name of the variable that we used to read in the data,

x: sets the name of column with the data we want to plot,

hue: sets the column we'll use to split the data into different histograms.


In [None]:
sns.histplot(data=data, x="column_name_1" , hue="column_name_2")

Also for KDE plot, we can add **hue** and set **shade=True** to get color-coded.

In [None]:
sns.kdeplot(data=data, x="column_name_1" , hue="column_name_2", shade=True)

### The style of the figure:


Seaborn has five different themes: (1)"darkgrid", (2)"whitegrid", (3)"dark", (4)"white", and (5)"ticks", and you need only use the command **sns.set_style("dark")** to change it.

