<img src="support_files/images/cropped-SummerWorkshop_Header.png">  

<div style="padding: 5px; padding-left: 10px;">
    
<center><h2>Table of Contents & Jumplinks</h2></center>

<h3>Effective Visualizations & Accessibility</h3>

* <a href='#vizfoundation'>proper foundation</a>
 
<h3>Coding Package Introductions</h3>

* <a href='#seaborn'>Seaborn background</a>
* <a href='#imports'>package imports</a>
* <a href='#load_dataset'>load sample dataset</a>
* <a href='#matplotlib_basics'>matplotlib coding basics</a>    
    
<h3>Exploratory and Basic Data Visualizations</h3>

* <a href='#bar'>Bar plots</a>
* <a href='#box'>Box plots</a>
* <a href='#violin'>Violin plots</a>
* <a href='#scatter'>Scatter plots</a>
* <a href='#distribution'>Distribution plots</a>
* <a href='#timeseries'>Line/timeseries plots</a>  
* <a href='#multiplots'>multiple plots per figure</a>
* <a href='#heat'>Heatmaps & Correlation Heatmaps</a>

<a id='foundation'></a>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<center><h1>Foundations of Data Visualizations</h1></center>
    
<img src="support_files/images/data_viz/Gia_Gunn_This_is_a_place_for_legends_okay.gif" width='600'/>     
    
The following is true for any data visualization, and is especially important when creating accessibile visualizations. A concise and coherent concept ensures users of all abilities can engage with your visualization. Put the numbers aside for a moment and ask yourself these questions:

**What story does the data tell?** 
* Make the relationships or features you want to highlight stand out!
* provide context but eliminate clutter or irrelevant data
* for complex stories: break it in to smaller simpler chunks

**How should the reader benefit from the visualization?** Consider how your graph will help the user understand insights from the data. Like in the previous example, layer in context to pull those learnings to the surface.

**Can you understand it in 5 seconds?** If the visual adds more complexity than your written word would otherwise, think about whether you need it in the first place.


<h3>Visualization Evalutation Checklist</h3>

**Legibility**

* Is it immediately understandable?  If not, is it **understandable after a short period of study**?
* Does it **provide insight or understanding** better than some alternative visualization would?  Or does it require excessive cognitive effort?
* Does it provide insight or understanding that was not obtainable with the original representation (text, table, etc)?
* Is the design visually appealing/aesthetically pleasing?

**Context**
* Does the visualization reveal trends, patterns, gaps, and/or outliers? Can the viewer **make effective comparisons**?
* Does the visualization successfully **highlight important information, while providing context for that information**?
* What kind of visualization might have been better?
* Is it memorable?


**Integrity**
* Does it use visual components properly? That is, does it **properly represent the data using lines, color, position**, etc? 
* Does it **use labels and legends** appropriately?
* Does it transform nominal, ordinal, and quantitative information properly?
* Does it distort the information?  If it transforms it in some way, is this misleading or helpfully simplifying?
* Does it omit important information?    



<a id='seaborn'></a>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<center><h1>Introduction to Seaborn</h1></center>

    
<img src="support_files/images/data_viz/seaborn_logo.png" width='600'/> 
<p>
<b>Seaborn</b> is a library for making statistical graphics in Python. It builds on top of <code>matplotlib</code> and integrates closely with <code>pandas</code> data structures.
<p>    
Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them.
<p>
<h3>Pros:</h3>
<ul>
<li>Simple declarative API is easy to get started with
<li>quick and easy to update plots
<li>handles data aggregation by default
<li><a href=" https://seaborn.pydata.org/api.html">Well documented API</a> 
<li>Works with Pandas dataframes and tabular data, lists & arrays.
<li>Comes with many prepackaged Python distros (anaconda, WinPython, etc.).
<li>Easily saves plots to image (.png, .bmp, etc.) and vector (.svg, .pdf, etc.) formats.
<li>Has an excellent set of examples (with code) within the API and  <a href="https://seaborn.pydata.org/examples/index.html">seaborn gallery</a>
</ul>

<p>
<h3>Cons:</h3>
<ul>
<li>Opinionated 
<li>Not as flexible as matplotlib
<li> "hidden" statistical analysis: some statistics are done behind the scenes so you have less control or visibility into what is being done.
</ul>
</div>

<a id='imports'></a>

In [None]:
import os
import numpy as np
import pandas as pd


# main plotting packages
import seaborn as sns
import matplotlib.pyplot as plt

<a id='load_dataset'></a>

In [None]:
# load the data
csv_path = os.path.join('support_files', 'datasets', 'cars_dataset.csv')
cars = pd.read_csv(csv_path)

# view the data
cars

<a id='matplotlib_basics'></a>

<div style="padding: 5px; padding-left: 10px; background: #e6f2ff">
<h3> Matplotlib Code Basics </h3>
    
* create a figure: <code>fig, ax = plt.subplots()</code>
    * Adjust figure size: <code>fig, ax = plt.subplots(figsize = (horizontal, vertical)</code>
    * Multiple plots per figure: <code>fig, ax = plt.subplots(n_rows, n_columns)</code>
        * Note!: we will cover this more in-depth later!

* add title: <code>ax.set_title('plot title')</code>
* set x label: <code>ax.set_xlabel('x label')</code>
* set y label: <code>ax.set_ylabel('y label')</code>
    
NOTE: Because Seaborn is essentially a wrapper for matplotlib there are many matplotlib conventions and functions that we will use on almost all of our plots regardless of whether they are seaborn or matplotlib plots

<a id='bar'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>Bar Plots</h3>

Seaborn has two options for barplots:

**[.countplot](https://seaborn.pydata.org/generated/seaborn.countplot.html):**
plots the frequency of categorical variables

* Basic countplot: <code>sns.countplot(data = df, x = 'column', )</code>


**[.barplot](https://seaborn.pydata.org/generated/seaborn.barplot.html#seaborn.barplot):**
Represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars. Seaborn handles aggregation automatically, and uses mean as the default aggregator.
* the <code>estimator</code> argument can be changed to different measurements of central tendancy such as median
* Basic barplot: <code>sns.barplot(data = df, x = 'column', y= 'column")</code>
    
**Helpful Parameters:**
* <code>x, y</code>: Axis to plot on, x is a vertical plot, y is a horizontal plot
* <code>order</code>: Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.
* <code>color</code>: Color for all of the elements, or seed for a gradient palette.

In [None]:
sns.countplot(x='Origin', data = cars)

In [None]:
sns.barplot(x='Origin', y='MPG', data = cars)

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**STUDENT EXERCISES:** 
Lets combine some pandas filtering and some plot aesthetics to create a barplot that looks like the "best" example. 
    
    
* filter the cars dataframe to just Japanese origin
* set up the axis so it plots horizontally
* make a countplot of "Make"
* Order the data by Make counts (hint, you can use <code>.value_counts().index</code>)

In [None]:
# SOLUTION

japan_cars = cars.loc[cars["Origin"]=="Japan"]

sns.countplot(data = japan_cars,
              y='Make', 
              color='b',
              order = japan_cars['Make'].value_counts().index,
              )

<a id='box'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>Box Plots</h3>

[.boxplot](https://seaborn.pydata.org/generated/seaborn.boxplot.html) will create a basic boxplot that will automatically use to column name as the label.

**Basic code:**
* single variable: <code>sns.boxplot(data = df, x = "column")</code>
* 2 variables:

```
sns.boxplot(data = df,
            x = "values_column",
            y = "category_column")
```
**Helpful Parameters:**
* <code>x, y</code>: Axis to plot on, x is a horizonta plot, y is a vertical plot
* <code>hue</code>: Further grouping/sub categories to plot
* <code>order</code>: Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.

In [None]:
# Lets recreate the basic boxplot of horsepower with seaborn
fig, ax = plt.subplots()
ax = sns.boxplot(data = cars,
                 x = "Horsepower",
                 )
ax.set_title("All Cars")

In [None]:
# now lets add Origin as a categorical variable
fig, ax = plt.subplots()
ax = sns.boxplot(data = cars,
                 x = "Horsepower",
                 y = "Origin",
                 )


In [None]:
# we can add even more by dilenating by number of cylinders
fig, ax = plt.subplots()
ax = sns.boxplot(data = cars,
                 x = "Horsepower",
                 y = "Origin",
                 hue = "Cylinders",
                 )

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**STUDENT EXERCISES:** 
Build on the momentum from the last exercise to recreate the "best" plot example. 
    
    
* Use the filtered dataframe from the last exercise (Filtered to Japan as the Origin)
* set up the axis so it plots horizontally
* Boxplots of "MPG" by "Make"
* Order the data median MPG per make
    * Hint: Use the same methodolgy to aggregate & sort values from the Pandas tutorial, using <code>.groupby()</code>, <code>.agg()</code> and <code>.sort_values()</code>, then use the <code>.index</code> from that sorted df as the order in the boxplot
    

In [None]:
# SOLUTION:

# sorting the values
make_agg = japan_cars[['Make','MPG']].groupby('Make').agg(['median'])
make_agg.sort_values(by=[('MPG', 'median')], ascending = False, inplace=True)

# making the plot
fig, ax = plt.subplots()
ax = sns.boxplot(data = japan_cars,
                 x = "MPG",
                 y = "Make",
                 order = make_agg.index,
                 )


<a id='violin'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>Violin Plots</h3>

**Description:** **<code>[.violinplot()](https://seaborn.pydata.org/generated/seaborn.violinplot.html#seaborn.violinplot)</code>** will create a basic violin that will automatically use to column name as the label. It uses all the same parameters as boxplot. 

**Basic code**
* single variable: <code>sns.violinplot(data = df, x = "column")</code>
* 2 variables:
 ```
sns.violinplot(data = df,
                 x = "values_column",
                 y = "category_column")
```

**Helpful Parameters:**
* <code>x, y</code>: Axis to plot on
    * categorical column/variable on x is a vertical plot
    * categorical column/variable on y is a horizontal plot
* <code>scale</code> (optional):  ['area', 'count', 'width']. The method used to scale the width of each violin.
    * area: each violin will have the same area
    * count: the width of the violins will be scaled by the number of observations in that bin. 
    * width: each violin will have the same width.
* <code>bw</code> (optional): ['scott', 'silverman', float] Either the name of a reference rule or the scale factor to use when computing the kernel bandwidth. The actual kernel size will be determined by multiplying the scale factor by the standard deviation of the data within each bin.
* <code>hue</code> (optional): Further grouping/sub categories to plot
* <code>order</code> (optional): Order to plot the categorical levels in, otherwise the levels are inferred from the data objects.

In [None]:
### Create a basic violin plot showing the distribution of miles per gallon by origin
### scale it by the count

sns.violinplot(data = cars,
               x = 'Origin', 
               y = 'MPG',
               scale = 'count',
               )

<a id='scatter'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>Scatter Plots</h3>

**Description:** **<code>[.scatter()]( https://seaborn.pydata.org/generated/seaborn.scatterplot.html#seaborn.scatterplot)</code>** Draw a scatter plot with possibility of several semantic groupings.

**Basic code**
```
 sns.scatterplot(data = df,
                  x = numeric column,
                  y = numeric column)
```

**Helpful Parameters:**
* <code>hue</code>: numeric, ordinal or categorical variable/column that will plotted as different colors
* <code>size</code>: numeric, ordinal or categorical variable/column that will produce points with different sizes. 
* <code>style</code>: categorical variable/column that will produce points with different markers.

In [None]:
# Scatter plots work well with row level data
sns.scatterplot(data = cars,
                x = 'Horsepower', 
                y = 'Weight',
                hue = 'Origin'
               )

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">

**STUDENT EXERCISES:** 
 
Recreate the following plot:  
<img src="support_files/images/data_viz/viz_student_exercise_mulivariate.png" width="500"/> 
    
Hint 2: take a look at the [seaborn scatterplot documentation and examples](https://seaborn.pydata.org/generated/seaborn.scatterplot.html#seaborn.scatterplot)
    
**Answer the following questions:** 
* How many variables are encoded?
* Is the encoding effective?
* Are there ways that it could be improved?

In [None]:
# Solution

fig, ax = plt.subplots(figsize=(8,6))

ax = sns.scatterplot(data=cars,
                     x="Horsepower", y="MPG", 
                     hue="Origin", size="Weight",
                     alpha=.5, palette="muted",
                     sizes=(40, 400))

<a id='distribution'></a>

<div style="padding: 5px; padding-left: 10px; background: #e6f2ff">
<h2> Distribution Plots</h2>

    
**Histograms**

A histogram is a classic visualization tool that represents the distribution. There are two basic types of histograms: frequency and density.  
    
Frequency histograms display one or more variables by counting the number of observations that fall within disrete bins.
    
Density histograms give rough sense of the density of the underlying distribution of the data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram used for probability density is always normalized to 1.
    
<img src="support_files/images/data_viz/viz_histogram_generic.png" width="500"/> 

    
**Kernel Density Estimates**

A [kernel density estimate](https://en.wikipedia.org/wiki/Kernel_density_estimation) (KDE) plot is a method for visualizing the distribution of observations in a dataset, analagous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.
 
Relative to a histogram, KDE can produce a plot that is less cluttered and more interpretable, especially when drawing multiple distributions. But it has the potential to introduce distortions if the underlying distribution is bounded or not smooth. Like a histogram, the quality of the representation also depends on the selection of good smoothing parameters.
 
<img src="support_files/images/data_viz/viz_hist_vs_KDE.png" width="500"/> 
    

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>Histograms: Seaborn</h3>

Seaborn has several ways of creating histogram-like visualizations.

**[.histplot]( https://seaborn.pydata.org/generated/seaborn.histplot.html#seaborn.histplot):** the most straightforward histogram plot function
* basic: <code>sns.histplot(data=df, x="column")</code>
* density: <code>sns.histplot(data=df, x="column", stat="density")</code> 
* basic with kde: sns.histplot(data=df, x="column", kde=True)
    
**[.kdeplot](https://seaborn.pydata.org/generated/seaborn.kdeplot.html#seaborn.kdeplot):** specializes in kernel density estimate plots
* basic: <code>sns.kdeplot(data=df, x="column")</code>
* multiple categories: <code>sns.kdeplot(data=df, x="column", hue="size")</code>
    * <code>bw_method</code> (optional): [string, scalar, or callable]  Method for determining the smoothing bandwidth to use; passed to scipy.stats.gaussian_kde.
    * <code>bw_adjust</code>(optional): number; Factor that multiplicatively scales the value chosen using bw_method. Increasing will make the curve smoother. See seaborn documentation.

**[.displot](https://seaborn.pydata.org/generated/seaborn.displot.html#seaborn.displot):** the most flexible of the distribution plot functions
* basic: <code>sns.displot(data=df, x="column", kind="hist")</code>
* multiple categories: 
    <code>sns.displot(data=df, x="column", hue = "category_column", kind="hist")</code>
* the <code>kind</code> argument can be set to "hist", "kde" or "ecdf"

In [None]:
# lets create a basic histogram of car weights
sns.histplot(data = cars, 
             x = "Weight")

In [None]:
# now lets create a kde plot of weights categorized by origin
sns.displot(data = cars, 
             x = "Weight", 
             hue = "Origin", 
             kind = "kde")

<a id='timeseries'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>Lineplots</h3>

**Description:**[.lineplot]( https://seaborn.pydata.org/generated/seaborn.lineplot.html#seaborn.lineplot) offers a line plot with possibility of several semantic groupings. It will automatically handle aggregation, and has built in functionality for drawing multiple lines based on category. 

**Basic code**
* single variable:
```
sns.lineplot(data = df,
               x =  numeric column,
               y = numeric column)
```

**Helpful Parameters:**
* <code>hue</code>: Grouping variable that will produce lines with different colors
* <code>size</code>: Grouping variable that will produce lines with different widths. 
* <code>style</code>: Grouping variable that will produce lines with different dashes and/or markers.
* <code>legend</code>:How to draw the legend. 
    * “auto”: choose between brief or full representation based on number of levels.
    * “brief”: numeric hue and size variables will be represented with a sample of evenly spaced values.
    * “full": every group will get an entry in the legend.
    * False: no legend data is added and no legend is drawn.
* <code>ci</code>: [int or “sd” or None] Size of the confidence interval to draw when aggregating with an estimator. 
    * “sd”: draw the standard deviation of the data. 
    * None: skips bootstrapping.
    
**Note:**
* When there are multiple y-values for each x-value, seaborn will aggregate and compute the mean y-value, along with a 95% confidence interval around the mean.

In [None]:
# Lets make the same plot but with seaborn 
fig, ax = plt.subplots()
ax = sns.lineplot(data = cars,
             x="Year", 
             y="MPG",
             hue = "Origin",
            )

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>Pair Grid</h3>

**Description:**[.pairgrid](https://seaborn.pydata.org/generated/seaborn.PairGrid.html) provides a grid for plotting pairwise relationships in a dataset
    
In it's most basic form- the pair grid will simply create a blank grid of subplots with each row and one column corresponding to a numeric variable in the dataset. At that point you must map a type of plot onto the grid, for example a scatter plot. This will fill all the subplots with scatter plots of each pairwise relationship. You would still have access to all the scatterplot arguments like :<code>hue</code> and <code>size</code>.  However, there are also options to divide the pairplot into sections and plot things like histograms and other density estimators. 
    
<img src="support_files/images/data_viz/pairgrid.png" width="500"/>     

**Basic code**

```
g = sns.PairGrid(df)
g.map(sns.scatterplot)
```
    
** Mapping Options**
<img src="support_files/images/data_viz/pairgrid_sections.png" width="500"/>    
* <code>.map(seaborn plot type call)</code>:
* <code>.map_diag(seaborn plot type call)</code>: will plot a specific plot type on the diagonal
* <code>.map_offdiag(seaborn plot type call)</code>: will plot a specific plot type on the non-diagonal
* <code>.map_lower(seaborn plot type call)</code>: will plot a specific plot type below the diagonal
* <code>.map_upper(seaborn plot type call)</code>: will plot a specific plot type above the diagonal


**Helpful Parameters:**
* <code>hue</code>: Grouping variable that will produce lines with different colors
* <code>size</code>: Grouping variable that will produce lines with different widths. 
* <code>style</code>: Grouping variable that will produce lines with different dashes and/or markers.
* <code>legend</code>:How to draw the legend. 

  

In [None]:
# lets make a pair grid with histograms in the diagonal and scatter plots in the off diagonal. Lets also color it by cylinders

g = sns.PairGrid(data = cars, 
                 hue="Cylinders")
g.map_diag(sns.histplot)
g.map_offdiag(sns.scatterplot)
g.add_legend(title="", adjust_subtitles=True)

<a id='basic_heatmaps'></a>

<div style="padding: 5px; padding-left: 10px; background: #e6f2ff">
<h2>Heatmaps & Correlation heatmaps</h2>

**Basic heatmap**
A heatmap is a type of matrix plot and graphical representation of data that uses a system of color-coding to represent different values. 
    
<img src="support_files/images/data_viz/viz_heatmap.png" width="300"/> 

    
**Correlation Heatmap**
A correlation heatmap specifically represents a correlation matrix, displaying the correlation between different variables. The value of correlation can take any value from -1 to 1. Correlation between two random variables or bivariate data does not necessarily imply a causal relationship. 
<img src="support_files/images/data_viz/viz_heatmap_correlation.png" width="500"/>    
To create the correlation matrix for the visualizations, we will be using the pandas function
 [.corr](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html)   
      

**Jumplinks:**
* <a href='#basic_heatmaps'>heatmaps with matplotlib and seaborn</a>
* <a href='#correlation_matrix'>correlation dataframe</a>    
* <a href='#correlation_heatmap'>correlation heatmaps with matplotlib and seaborn</a> 

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">
<h3>Basic Heatmaps</h3>

**With Matplotlib:**

If you have a table of values, making a very basic heatmap is incredibly easy! 
* <code>plt.imshow(df)</code><img src="support_files/images/data_viz/viz_heatmap_matplotlib.png" width="200"/>


Perhaps creating plots like this is fine when you just want a very quick by eye assessment of the data. However, you might notice from the image above that it is not very interpretable and is missing some really important information such as labels, annotated values, and color scale bar etc.
The matplotlib code example provided in the cells below goes over how to add all the bells and whistles
   
    
**With Seaborn:**

Heatmaps with seaborn are much more straightforward than with matplotlib. and uses the [.heatmap()](https://seaborn.pydata.org/generated/seaborn.heatmap.html#seaborn.heatmap) function. 

There's several optional parameters that can be used to easily adjust the colormap and color bar and other aesthetics. Please see the documentation for more information. 
    
<code>sns.heatmap(data = df)</code>

NOTE: We will create a dataframe with synthetic data to work with for the basic heatmaps. This will use the techniques covered in the intro to pandas jupyter notebook

In [None]:
# first we will create a dataframe with fake data to use for our heatmap
uniform_data = np.random.rand(6, 6)
heatmap_df = pd.DataFrame(uniform_data, columns=['col_1', 'col_2', 'col_3',
                                                 'col_4', 'col_5', 'col_6'])

In [None]:
# basic heatmap with SEABORN

fig, ax = plt.subplots()
ax = sns.heatmap(data = heatmap_df,
                 annot=True,          # annotate with correlation values
                 linewidths=.1,       # matrix line widths
                 linecolor ='grey',   # matrix line color
                 square=True)

ax.set_title("Basic Heatmap with Seaborn")
ax.set_xlabel('Columns')
ax.set_ylabel('Rows')

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

**Correlation matrix**

To create the correlation matrix for the visualizations, we will be using the pandas function
    **<code>[.corr()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html)</code>**  

This function computes pairwise correlation of columns, excluding NA/null values.

<code>correlation_df = df.corr()</code>
   
Optional Parameters:
* <code>method</code> is the type of correlation to run. Options are:
    * 'pearson': default,  standard correlation coefficient
    * 'kendall': Kendall Tau correlation coefficient
    * 'spearman': Spearman rank correlation
    

<a id='correlation_matrix'></a>

In [None]:
#Lets make a very basic correlation matrix/dataframe
correlation_df = cars.corr()
correlation_df

<a id='correlation_heatmap'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

**Correlation heatmaps**

When creating correlation plots, incorporating just a few elements will make them much more interpretable: 
* colorbar should span -1 to 1
* Diverging colormap:  with a neutral color in the middle to represent 0, some options are:
 <img src="support_files/images/data_viz/viz_diverging_colormaps.png" width="200"/> 
    

We only need to made a few tweaks to the [.heatmap()](https://seaborn.pydata.org/generated/seaborn.heatmap.html#seaborn.heatmap) code we used above to make a correlation based heatmap

In [None]:
# Correlation heatmap with Seaborn:
plt.figure(figsize=(6,6))

fig, ax = plt.subplots()
ax = sns.heatmap(data = correlation_df,
                 cmap = 'BrBG',          # brown-teal diverging color map
                 vmin = -1, vmax = 1,    # set the colorbar range (-1 - 1 for correlation)
                 center = 0,             # set the center of the color bar
                 annot = True,           # annotate with correlation values
                 linewidths =.1,         # matrix line widths
                 linecolor ='white',     # matrix line color
                 square = True)
ax.set_title("Correlation Heatmap with Seaborn")

<a id='multiplots'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h2> Multi Plot Figures</h2>
 
**Figures with Subplots**

A given figure can have more than one axis. The subplot command, which we used above, generates a single axis by default. But we can specify the number of axes that we want..
    
<img src="support_files/images/data_viz/viz_subplots.png" width="500"/> 

    
**Complex Figures with Gridspec**

[Gridspec](http://matplotlib.org/users/gridspec.html) is useful when you have uneven subplots. It can get tricky for more complex plots, so first try to use <code>ax.subplots()</code> (like in the previous examples) if possible.  
 
<img src="support_files/images/data_viz/viz_gridspec.png" width="500"/> 


<a id='subplots_matplotlib'></a>

In [None]:
# Because Seaborn is built on Matplotlib we can also put multiple
# seaborn plots in the same figure.
# We just need to add the ax = axes[int] parameter

# Lets make a scatter plot of MPG by weight and then the distributions
# for MPG and Weight

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(12,4))

sns.scatterplot(x = "MPG",
                y = 'Weight',
                data = cars,
                ax = axes[0])
axes[0].set_title("MPG by Weight")


sns.histplot(x = "Weight", 
             data = cars,
             ax = axes[1])
axes[1].set_title("Weight distribution")

sns.histplot(x = "MPG", 
             data = cars,
             ax = axes[2])
axes[2].set_title("MPG distribution")

<a id='gridspec_matplotlib'></a>

<div style="padding: 5px; padding-left: 10px; background:#e6e6e6">

<h3>gridspec: matplotlib</h3>

**[.subplots]( https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)**: creates a figure and a set of subplots. This utility wrapper makes it convenient to create common layouts of subplots, including the enclosing figure object, in a single call.

**General code example:** 

```
fig = plt.figure()
gridspec_fig = fig.add_gridspec(nrows=2, ncols=2)
ax0 = fig.add_subplot(gs[0, 0]) # Top left corner
ax1 = fig.add_subplot(gs[0, 1]) # Top right corner
ax2 = fig.add_subplot(gs[1, :]) # Bottom, span entire width
    
ax0.plot()
ax1.plot()
ax2.plot()
```
    
**helpful parameters:**
* <code> sharex/sharey = bool </code> forces plots to display the same range along the x-axis  or y-axis depending upon which parameter you call. Default = False

In [None]:
# Because Seaborn is built on Matplotlib we can also put multiple
# seaborn plots in the same figure.
# We just need to add the ax = axes[int] parameter

# Lets make a scatter plot of MPG by weight and then the distributions
# for MPG and Weight

us_cars = cars.loc[cars["Origin"]=="US"]

fig = plt.figure()
gs= fig.add_gridspec(nrows=2, ncols=2)
ax0 = fig.add_subplot(gs[0, 0]) # Top left corner
ax1 = fig.add_subplot(gs[0, 1]) # Top right corner
ax2 = fig.add_subplot(gs[1, :]) # Bottom, span entire width

sns.scatterplot(x = "MPG",
                y = 'Weight',
                data = us_cars,
                ax = ax0)
axes[0].set_title("MPG by Weight")


sns.histplot(x = "Weight", 
             data = us_cars,
             ax = ax1)
axes[1].set_title("Weight distribution")


sns.lineplot(x="Year", 
             y="MPG",
             data = us_cars,
             ax = ax2)

<a id='table_lens'></a>

<a id='heat'></a>

<a id='heatmap_seaborn'></a>