<b><font size=20, color='#A020F0'>Seaborn</font></b>

Hannah Zanowski<br>
11/3/25<br>

#### <span style="color:green">Learning Goals</span>
By the end of this notebook you will
1. Understand under what circumstances seaborn is a useful plotting tool
2. Practice making various types of plots in seaborn

#### Resources
[Seaborn Website](https://seaborn.pydata.org/index.html)<br>
[Seaborn Tutorial](https://seaborn.pydata.org/tutorial.html)<br>
[Seaborn API reference](https://seaborn.pydata.org/api.html)<br>

#### Acknowledgements
Much of today's lecture is adapted/borrowed from the [Seaborn Tutorial](https://seaborn.pydata.org/tutorial.html#user-guide-and-tutorial)

# A little about seaborn

Seaborn is a data visualization library for making statistical plots (histograms, box-and-whisker, scatterplots, etc). It is particularly useful for data exploration. Seaborn builds off of matplotlib (and plays nice with pandas!), but in such a way that its focus is more on _understanding_ data rather than figuring out how to plot it. In short: it's fairly easy to make nice looking plots in seaborn without much effort! 
><font color='blue'><b>Note:</b></font> If you want to be able to customize seaborn plots at very fine granularity, you still need to know matplotlib ;)

Let's begin by importing seaborn (and a few of our other favorites):

In [None]:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

---

## 1. Seaborn plotting functions
Seaborn is largely organized around three plot 'types': [relational](https://seaborn.pydata.org/tutorial/relational.html), [distributional](https://seaborn.pydata.org/tutorial/distributions.html), and [categorical](https://seaborn.pydata.org/tutorial/categorical.html), as in the image below.
<img src='https://seaborn.pydata.org/_images/function_overview_8_0.png'></img>
<br><font size=1>Image Credit: [Seaborn](https://seaborn.pydata.org/tutorial/function_overview.html)</font>

Both the general categories of plots (relational, distributional, categorical), as well as the individual types of plots associated with each category (e.g., scatterplot and lineplot for the relational category) are seaborn plotting functions, and you can achieve similar results by using either, but there are some important distinctions. Not everything that seaborn can do also fits into these three broad categories either!

To see some of the neat things that seaborn can do, check out the plotting [Gallery](https://seaborn.pydata.org/examples/index.html)

---

## 2. Data Formats
Seaborn generally handles any data in the form of pandas and numpy objects as well as some of the built-in datatypes like lists and sets. It does not interface with xarray objects. A few of the older seaborn functions handle fewer of the accepted forms, so watch out!

### Long-form vs wide-form data
An important part of using seaborn is making sure that your data are in a form that it can make sense of for plotting. Most data that seaborn can read in generally fit into two categories: [long-form and wide-form](https://seaborn.pydata.org/tutorial/data_structure.html#long-form-vs-wide-form-data). In general long-form data is data that is organized such that the columns are variables and the rows are individual observations. Wide-form data is organized such that the table itself represents the variable, with both rows and columns representing the value of that variable at some time or other metric, etc. To visualize this, we'll just use seaborn's example from one of it's built-in datasets below:

#### Long-form

In [None]:
flights = sns.load_dataset("flights")
flights.head()

#### Wide-form

In [None]:
flights_wide=flights.pivot(index='year',columns='month',values='passengers')
flights_wide.head()

<b> So why does this matter? <font color='red'>Because how you feed information to seaborn's plotting commands will change a little bit depending on the underlying organization of your data</font></b>. That means **YOU** have to figure out what form your data are in and **YOU** have to think about the commands you need to tell seaborn to make the plots that you want (or you need to figure out how to reshape your data into the form that you want). You can read more about the pros and cons of each form$\textemdash$and seaborn's treatment of them$\textemdash$[here](https://seaborn.pydata.org/tutorial/data_structure.html#long-form-vs-wide-form-data)

## 3. Axes-level functions
Axes-level functions (scatterplot, histplot, barplot, etc) behave similarly to matplotlib and can be thought of as replacements for their matplotlib counterparts. They act on a specific axis and are self-contained, in that they only modify the axis they are assigned to.

Let's read in some of the built-in data from seaborn to go through some examples:

In [None]:
penguins = sns.load_dataset("penguins") #load the dataset about penguins
penguins #print the data so we know what it looks like

Say we wanted to make a scatterplot of penguin bill length vs flipper length with the points colored by individual penguins species. If I wanted to do that in matplotlib, I'd have to do something like the following:

In [None]:
#Pick separate the three species in the dataset
adelie=penguins[penguins['species']=='Adelie']
gentoo=penguins[penguins['species']=='Gentoo']
chinstrap=penguins[penguins['species']=='Chinstrap']

fig,ax=plt.subplots(figsize=(6,4))
s1=ax.scatter(adelie.bill_length_mm,adelie.flipper_length_mm, marker='.',s=20,c='darkmagenta',label='Adelie')
s2=ax.scatter(gentoo.bill_length_mm,gentoo.flipper_length_mm, marker='.',s=20,c='darkgoldenrod',label='Gentoo')
s3=ax.scatter(chinstrap.bill_length_mm,chinstrap.flipper_length_mm, marker='.',s=20,c='teal',label='Chinstrap')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')
plt.legend(loc='best',frameon=False)
plt.title('Penguin Bill Length vs. Flipper Length');

That's fine, but it actually took me some time to set this plot up correctly because I had to think about how to map the different species of penguins to specific colors, because matplotlib's [scatterplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) function doesn't make this immediately easy.

Here's how we can do the something pretty similar in seaborn using the [seaborn scatterplot function](https://seaborn.pydata.org/generated/seaborn.scatterplot.html#seaborn.scatterplot):

In [None]:
sns.scatterplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',hue='species',palette=['darkmagenta','teal','darkgoldenrod'])

You could take that even further by additionally changing the point sizes based on additional categorical data such as penguin body mass or marker styles based on the sex category (but be careful doing this as it makes the plot harder to interpret):

In [None]:
sns.scatterplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',hue='species', size='body_mass_g',
                palette=['darkmagenta','teal','darkgoldenrod'],style='sex',markers=['.','D'])
#Use seaborn directly to move the legend off the plot:
sns.move_legend(plt.gca(), "upper left", frameon=False, bbox_to_anchor=(1, 1))

If you want to have matplotlib-level fine control over the plot, just set your seaborn plot up within a matplotlib axis instance like you normally would if you were just using matplotlib:

In [None]:
fig,ax=plt.subplots(figsize=(6,4))
s1=sns.scatterplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',hue='species', size='body_mass_g',
                palette=['darkmagenta','teal','darkgoldenrod'],style='sex',markers=['.','D'])
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')
#Matplotlib legend (you can still use the seaborn move legend command too)
plt.legend(loc='upper right',frameon=False,bbox_to_anchor=(1.3, 1)) #use bbox_to_anchor to move the legend off the plot
plt.title('Penguin Flipper Length vs. Bill Length');

### Additional axes-level plotting functions not in the main categories
Although there are many types of plots seaborn can make, not all of them are listed in the above categories. Here are a few that you might find useful:

#### Regression plots
You can add linear regression lines to a scatterplot with [regplot](https://seaborn.pydata.org/generated/seaborn.regplot.html#seaborn.regplot). Regplot first draws a scatterplot and then fits a line to the data and adds the 95% confidence interval in shading.

><b><font color='red'>CAUTION:</font></b> Just because seaborn can apply a linear regression to your data doesn't mean it _should_. At the end of the day, **you** are responsible for understanding under what circumstances various statistical methods are appropriate for your data.

In [None]:
fig,ax=plt.subplots(figsize=(6,4))
s1=sns.regplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',color='teal')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')

#### Adding error bars
You can add [various types of error bars](https://seaborn.pydata.org/tutorial/error_bars.html) to your plots if you have those estimates. Below we'll add a $\pm$1 standard deviation error bar to a bar plot of penguin species vs body mass.

In [None]:
sns.barplot(penguins, x='species', y='body_mass_g', hue='sex',palette=['darkmagenta','darkgoldenrod'],errorbar='sd',capsize=0.2)
plt.gca().set_xlabel('Species')
plt.gca().set_ylabel('Body Mass (g)')

#### Heat maps
These effectively require wide-form data (or at the very least a 2D dataset only). In the example below, I've made a heatmap using the  wide-form 'flights' dataset from Section 1.

In [None]:
sns.heatmap(flights_wide, cbar_kws={'label':'Passengers'})

---

## 4. Figure-level functions
We can recreate the same example above using seaborn's figure-level functions, in this case by using [relplot](https://seaborn.pydata.org/generated/seaborn.relplot.html#seaborn-relplot). The only thing we need to change is that we need to tell relplot what **kind** of relplot we want!

In [None]:
s1=sns.relplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',kind='scatter',hue='species', size='body_mass_g',
                palette=['darkmagenta','teal','darkgoldenrod'],style='sex',markers=['.','D'])
s1.set_axis_labels('Bill Length (mm)','Flipper Length (mm)') #can only get away with this because of the underlying FacetGrid

What did this really buy us though? One nice thing is that when using the figure-level function, the legend is automatically placed outside the figure axis, without us having to do so ourselves.

<b>The reason that the figure-level plotting functions behave this way is because they effectively 'own' the figure itself, so they can alter the space outside of the plotting axes, but this means you cannot use a figure-level function to draw a plot on an existing axis--they do not work that way.</b> 

><b><font color='red'>Note:</font></b> This also means you cannot plot multiple different types of plots on the same figure using figure-level functions, but you can instead do that using the axes-level plotting functions and regular ol' matplotlib axes. 

Let's see what happens when we try applying the figure-level function to the matplotlib axis I've set up below:

In [None]:
#Try plotting on a matplotlib axis instance
fig,ax=plt.subplots(figsize=(6,4))
sns.relplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',hue='species', size='body_mass_g',
                palette=['darkmagenta','teal','darkgoldenrod'],style='sex',markers=['.','D'],ax=ax)

What *IS* this figure-level plotting type then?

In [None]:
type(s1)

All seaborn figure-level plots return a [FacetGrid](https://seaborn.pydata.org/generated/seaborn.FacetGrid.html#seaborn-facetgrid) instance, which makes it easy to map data to multiple subplot axes (you just have to remember that it does not behave like matplotlib) in a straightforward way:

In [None]:
s1=sns.relplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',hue='species', size='body_mass_g',
                palette=['darkmagenta','teal','darkgoldenrod'],col='sex') #or use row='sex' to make it 2x1 grid
s1.set_axis_labels("Bill length (mm)", "Flipper length (mm)") #this is only able to be applied to a FacetGrid instance

Another example but now with penguin species as the columns and sex as the rows:

In [None]:
s1=sns.relplot(data=penguins, x='bill_length_mm',y='flipper_length_mm', color='teal',size='body_mass_g',row='sex',col='species') #or use row='sex' to make it 2x1 grid
s1.set_axis_labels("Bill length (mm)", "Flipper length (mm)"); #this is only able to be applied to a FacetGrid instance

### Jointplots and Pairplots
Seaborn has two types of figure-level plots that don't fit in the three main categories of plot types: the [jointplot](https://seaborn.pydata.org/generated/seaborn.jointplot.html#seaborn.jointplot) and the [pairplot](https://seaborn.pydata.org/generated/seaborn.pairplot.html#seaborn.pairplot),
both of which allow you to combine multiple views on your data. The difference is that instead of using a FacetGrid to combine multiple types of figures, they have their own respective object types, the [JointGrid](https://seaborn.pydata.org/generated/seaborn.JointGrid.html#seaborn.JointGrid) and [PairGrid](https://seaborn.pydata.org/generated/seaborn.PairGrid.html#seaborn.PairGrid), respectively.

<b>Jointplots</b> plot the joint distribution of your data with the individual distributions on the margins of the main plot:

In [None]:
s1=sns.jointplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',kind='scatter',hue='species',
                palette=['darkmagenta','teal','darkgoldenrod'])
s1.set_axis_labels("Bill length (mm)", "Flipper length (mm)");

<b>Pairplots</b> are similar to jointplots in that you can explore distributions, but they allow you to look at every possible pairwise comparison at the same time. You can change the types of plots in the pairplot with the `kind` kwarg. Plots on the diagonal are just the distribution of the data represented by each column.

In [None]:
s1=sns.pairplot(data=penguins, hue="species",palette=['darkmagenta','teal','darkgoldenrod'],kind='scatter')

### Regression plots again
[lmplot](https://seaborn.pydata.org/generated/seaborn.lmplot.html#seaborn.lmplot) is the figure-level version of regplot that we saw earlier. `lmplot` can even take a hue argument, so in the example below it does a linear regression for each species! 

><b><font color='blue'>Note:</font></b> Both `lmplot` and `regplot` can also fit higher order polynomials to data by setting the `order` kwarg.

In [None]:
s1=sns.lmplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',hue='species',
                palette=['darkmagenta','teal','darkgoldenrod'])
s1.set_axis_labels('Bill Length (mm)','Flipper Length (mm)') #can only get away with this because of the underlying FacetGrid

---

## 5. Setting up themes, contexts, etc for making nice plots
Seaborn prides itself on making nice plots with little effort. To that end, it has several built-in themes, contexts, and other ways to [control figure aesthetics](https://seaborn.pydata.org/tutorial/aesthetics.html) and [color](https://seaborn.pydata.org/tutorial/color_palettes.html#choosing-color-palettes) that you can use to make all of your plots in a given notebook. You can also create your own! 

<b><font color='darkmagenta'>Seaborn themes</font></b>: These are loadable figure presets that allow you to make all of your plots with certain default parameters. Seaborn currently has [five preset themes](https://seaborn.pydata.org/tutorial/aesthetics.html#seaborn-figure-styles)</br>
<b><font color='darkmagenta'>Seaborn contexts</font></b>: These are more or less the same thing but with certain defaults changed so that you can make the same set of plots for different use cases, such as a slideshow vs. for a paper. Seaborn currently has [four present contexts](https://seaborn.pydata.org/tutorial/aesthetics.html#scaling-plot-elements).

You can set a theme with the `set_style` command:
><b>Note:</b> You can also set things to plot in seaborn's default theme by calling `sns.set_theme()`

In [None]:
sns.set_style('dark') #options are darkgrid, whitegrid, dark, white, or ticks

In [None]:
fig,ax=plt.subplots(figsize=(6,4))
s1=sns.regplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',color='teal')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')

You can also make adjustments to the default styles on the go:

In [None]:
sns.set_style('dark', {'axes.facecolor': 'k'})
fig,ax=plt.subplots(figsize=(6,4))
s1=sns.regplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',color='teal')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')

To print all the current presets use the `axes_style` command:

In [None]:
sns.axes_style()

You can set a context with the `set_context` command:

In [None]:
sns.set_context('talk') #options are paper, poster, notebook, talk
fig,ax=plt.subplots(figsize=(6,4))
s1=sns.regplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',color='teal')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')

To see the presets for the current context, use the `plotting_context` command:

In [None]:
sns.plotting_context()

You can make changes to the context in the same way that you do for the style:

In [None]:
sns.set_context('talk', rc={'axes.labelsize':14}) #just adjust the axes label sizes as an example
fig,ax=plt.subplots(figsize=(6,4))
s1=sns.regplot(data=penguins, x='bill_length_mm',y='flipper_length_mm',color='teal')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')

---