# Seaborn - data visualization 

*Dr Andrew Meade and Dr Jo Baker *

Material modified from the original course by

*Dr Chas Nelson and Mikolaj Kundegorski*

*Part of https://github.com/ChasNelson1990/python-zero-to-hero-beginners-course*

## Objectives

* Know about the plotting functions provided by Seaborn (`seaborn`)
* Know how to plot a scatterplot (with a regression model) with Seaborn
* Know how to plot boxplots with Seaborn
* Know how to plot histogram with Seaborn


## Seaborn 

Seaborn (`seaborn`) is a high-level data visualization library, its high-level nature means it does a lot of the work for you when visualising data, compared to lower-level libraries such as matplotlib, which it is based on. 

Below is a gallery of basic examples, showing how Seaborn can be used to visualize data, it is based on the iris data set which was used in the pandas workbook. Seaborn is designed to work with pandas data frames.

The examples can be used as templates for plotting different data or extended for more complex plots. 




## Import the library and create the dataframe

The code below imports three library pandas, seaborn and Matplotlib ('matplotlib'), Matplotlib underpins seaborn and allow plots to be saved. 

The iris data is loeaded into a dataframe called iris and the first 5 rows are displayed. 

The last line sets the style of the plots seaborn produces. 

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

iris = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")
display(iris.head())

sns.set_theme()

## Simple scatter XY plot

Below is a simple scatter (XY) plot example, relating sepal_length (x) to sepal_width (y) it includes the regression line and plots the error bounds. 

The plot is saved as a png file called seaborn_simple_xy_plot.png


In [None]:
# 'height' controls the figure height in inches
# 'truncate' prevents the regression extending beyond the data
sns.lmplot(x="sepal_length", y="sepal_width", data=iris, height=5, truncate=True)
plt.savefig("seaborn_simple_xy_plot.png")

## Scatter XY plot by speices
Below is a scatter (XY) plot broken down by speices, relating sepal_length (x) to sepal_width (y) it includes the regression line and plots the error bounds.

The plot is saved as a png file called seaborn_xy_plot_by_species.png


In [None]:
# Create a plot of sepal_length vs sepal_width where colour (hue) is controlled by the species
#
# 'height' controls the figure height in inches
# 'truncate' prevents the regression extending beyond the data
sns.lmplot(x="sepal_length", y="sepal_width", data=iris, hue="species", height=5, truncate=True)
plt.savefig("seaborn_xy_plot_by_species.png")


## Separate XY plot by speices
below is a scatter (XY) plot with a separate plot per species, relating sepal_length (x) to sepal_width (y) it includes the regression line and plots the error bounds.

The plot is saved as a png file called seaborn_xy_plot_by_separate_species.png


In [None]:
# Create a plot of sepal_length vs sepal_width where colour (hue) is controlled by the species
# height controls the figure height in inches
# truncate prevents the regression extending beyond the data
sns.lmplot(x="sepal_length", y="sepal_width", data=iris, hue="species", col="species", height=5, truncate=True)

plt.savefig("seaborn_xy_plot_by_separate_species.png")


## Boxplots

Scatter and line plots are all part of Seaborn's relational plot tools. But sometimes we have categorical data (such as species) and might want to use box plots to explore this data.

The melt function create a long dateframe (more rows with fewer columns) from a wide dateframe (fewer rows but more columns) 

The plot is saved as a png file called seaborn_boxplot.png


<div style="border: 3px solid #d95f02; border-radius: 5px; padding: 10pt"><strong>Exercise 12.1:</strong> Read the cell below. This cells aims to create a boxplot using <code>seaborn</code> for the sepal widths of each species (each species should be a different colour). Create a new Markdown cell below and write down, in plain English, what each line is doing. What does <code>.melt()</code> do and why is it needed?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/rLcfMBMMNKM'>here</a> for a walkthrough.</div>

In [None]:
# 'Melt' the data
iris_melted = iris.melt(id_vars="species", value_vars=["sepal_length", "sepal_width", "petal_length", "petal_width"], var_name="measure", value_name="measurement")

# Plot the melted data
sns.catplot(x="species", y="measurement", col="measure", data=iris_melted, hue="species", kind="box", height=5, aspect=0.5)

# Save the plot
plt.savefig("seaborn_boxplot.png")

## Frequency histogram

Frequency histogram bin the data and plot the frequency of each bin. Frequency histogram are useful for understanding how ranges of values occur, identifying patterns, creating quickly visual summary and using statistical distributions to describe data. 

The plot is saved as a png file called seaborn_boxplot.png


In [None]:
sns.histplot(data=iris, x="sepal_width")

plt.savefig("seaborn_simple_frequency_histogram.png")

## Frequency histogram by speices

As before, the plot can be made on a specie specfic bases. 

The plot is saved as a png file called seaborn_frequency_histogram_by_speices.png



In [None]:
sns.histplot(data=iris, x="sepal_width", hue="species")
plt.savefig("seaborn_frequency_histogram_by_speices.png")

## 2D - Frequency histogram by speices

seaborn supports two dimensional frequency histograms.

The plot is saved as a png file called seaborn_2D_frequency_histogram_by_speices.png



In [None]:
sns.histplot(data=iris, x="sepal_width", y="sepal_length")
plt.savefig("seaborn_2D_frequency_histogram_by_speices.png")

## Scatter XY plot and Frequency histogram

Both scatter plot and Frequency histogram can be combined in joint plot


The plot is saved as a png file called seaborn_joint_plot.png



In [None]:
sns.jointplot(data=iris, x="sepal_length", y="sepal_width")
plt.savefig("seaborn_joint_plot.png")

## Key Points

* `seaborn` makes plotting lots of data very quick and easy
* the code above can be used as templates for more advanced plots. 
* Knowing how to plot exactly what you want will come with time, practice and a bit of on-line searching!

## Any Bugs/Issues/Comments?

If you've found a bug or have any comments about this notebook, please fill out this on-line form: https://forms.gle/tp2veeF8e7fbQMvY6.

Any feedback we get we will try to correct/implement as soon as possible.