# Seaborn Visualization: Intermediate

Instrutor: Chris Moffitt: Practical Business Python

* `matplotlib` provides the raw building blocks for Seaborn's visualiztions
* Seaborn supports complex visualizations of data
* It is built on matplotlib and works best with pandas dataframes
* Seaborn make's reasonable assumptions about colors and other visual elements to make visualizations that look more pleasing than the standard matplotlib plots
* Additionally, Seaborn performs statistical analysis on the data

#### Seaborn's `distplot`
* By default, generates a Gaussian Kernel Density Estimates (KDE)
* looks somewhat like a histogram
    * `sns.distplot(df['fmr_2'])`
* **Customizing distribution plots:**
* in order to plot a simple histgram: disable KDE and specify number of bins
    * `sns.distplot(df['alcohol'], kde= False, bins=10)`
* **rug plot**: [doc here](https://en.wikipedia.org/wiki/Rug_plot)
    * kde curve and rug plot can be combined
    * `sns.distplot(df['alcohol'], hist= False, rug=True)`
* the `distplot` function uses several functions, including `kdeplot` and `rugplot`
* It is possible to further customize a plot by passing arguments to the underlying function
* `sns.distplot(df['alcohol'], hist=False, rug=True, kde_kws={'shade':True})`
* `kws`: keywords

#### Regression plots
* Univariate analysis: looks at one variable
* **Regression Analysis** is bivariate (looks for relationships between two variables)
* **`regplot()`**
    * `regplot()` function generates a scatterplot with a regression line
    * Usage is similar to .distplot()
    * Must define: `data`, `x`, `y`
        * Since we're using a pandas DataFrame, the `x` and `y` variables refer to columns in the DataFrame
    * **`lmplot()`**: builds on top of regplot()
        * while regplot() is "low level," lmplot() is high "level"
        * lmplot() is much more flexible
        * lmplot() faceting:
            * organize data by colors (`hue`)
            * organize data by columns (`col`) or rows (`row`)
            * **Faceting:** the use of plotting multiple graphs while changing a single variable

#### Using Seaborn Styles
* Visualization's "aesthetics": layouts, labels, colors
* **`sns.set`** sets plot (pd or plt or sns) to default Seaborn style
* Seaborn has several default configurations that can be set with **`sns.set_style`**
    * These styles can override matplotlib and pandas styles as well
    * built-in styles (5): `white`, `whitegrid`, `dark`, `darkgrid`, `ticks`
* **In general, visualizations are more impactful if the amount of "excess chart junk" is removed
    * Common use case: remove the lines alomg axes called `spines` with **`despine`**
    * the default is to remove the top and right lines, but you can pass arguments specifying others
    * `sns.despine(left=True)`
* **`plt.clf()`** to clear a figure

#### Colors in Seaborn 
* Color is an extremely important component of creating effective visualizations
* Around 8% of the population is affected by color-blindedness (around 1 in 12 men but only around 1 in 200 women).
* Using color palettes that are colorblind-friendly can be very important
* Seaborn has several functions for creating, viewing, and configuring color palettes
* Because Seaborn is built on top of Matplotlib, it is able to interpret and apply Matplotlib color codes
    * To use matplotlib color codes, use:
    * `sns.set_style(color_codes=True)`
    * `sns.distplot(df['Tuition'], color=g)`
* To assign specific palette: `sns.set_palette()`
    * cycle through colors of a palette with:
    
```
for p in sns.palettes.SEABORN_PALETTES:
    sns.set_palette(p)
    sns.distplot(df['Tuition'])
```
* Seaborn has 6 default palettes, including:
    * deep
    * muted
    * pastel
    * bright
    * dark
    * colorblind
    #### Displaying palettes:
        * `sns.palplot()` function displays a palette
        * `sns.color_palette()` returns the current palette

```
for p in sns.palettes.SEABORN_PALETTES:
    sns.set_palette(p)
    sns.palplot(sns.color_palette())
    plt.show()
```
* There are three main types of color palettes:
    * **Circular color palettes:** used for categorial data that is not ordered
        * Example: `Paired`
    * **Sequential color palettes:** useful for when the data has a consistent range from high to low
        * Example: `Blues`
    * **Diverging color palettes:** for when both the low and high values are interesting
        * Example: `BrBG`
    * To print any of the palettes of 12 colors: `sns.palplot(sns.color_palette("Paired", 12))`