# Master one third of Seaborn: Statistical plotting with relplot()
## If you can do it in Seaborn, do it in Seaborn
<img src='images/pexels-burak.jpg'></img>

### Introduction
> The goal of this article is that you come away with a strong knowledge of any type of statistical plotting of quantitative variables using Seaborn's `relplot()` function.

When I started learning Data Visualization, I was first introduced to Matplotlib. It is a library that is so vast and deep, you can visualize almost anything data-related. It is this vastness that enables people to create a single plot in many ways. While its flexibility is ideal for experienced scientists, as a beginner, it was a hell of a nightmare for me to distinguish the code between the methods. I even considered going for the no-code interface of Tableau, which I am deeply ashamed to admit, as a programmer. I wanted something that was easy to use and at the same time, enable to create those cool plots others were making (in code).

I learned about Seaborn while I was doing a Nanodegree at Udacity and finally, found my pick. That's why my golder rule for Data Visualization is "Do it in Seaborn, if you can do it in Seaborn". It offers many advantages over its counterpart, Matplotlib.

Firstly, it is very easy to use. You can create complex plots with just a few lines of code and still make it look pretty with built-in styles. Secondly, it works amazingly well with Pandas DataFrames, which is just what you need as a Data Scientist. Last but not least, it is built on top of Matplotlib itself. This means that you will get to enjoy most of the flexibility offered by Mpl, and still keep the code syntax to the minimum. 

And yes, I really mean what I say in the headline. Seaborn divides all of its API into three categories: Plotting statistical relationships, visualizing the distribution of data and categorical data plotting. Seaborn provides three high-level functions which encompass most of its features and one of them is `relplot()`. 

`reltplot()` can visualize any statistical relationships between quantitative variables. In this article, we will cover almost all features of this function, including how to create subplots and many more.

### Setup

In [1]:
# Load necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Plotting pretty figures and avoid blurry images
%config InlineBackend.figure_format = 'retina'
# Larger scale for plots in notebooks
sns.set_context('notebook')

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Enable multiple cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

We import Seaborn as `sns`. You might have been wondering why it is not aliased as `sb` like any normal person would. Well, get this: It is aliased after a fictional character in the TV show The West Wing, Samuel Norman Seaborn. It is a joked initialism. 

For the sample data, I will be using one of the built-in datasets of Seaborn and one I downloaded from Kaggle. You can get it using [this](https://storage.googleapis.com/kaggle-data-sets/29/2150/compressed/GlobalLandTemperaturesByCountry.csv.zip?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=gcp-kaggle-com%40kaggle-161607.iam.gserviceaccount.com%2F20200923%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20200923T061620Z&X-Goog-Expires=259199&X-Goog-SignedHeaders=host&X-Goog-Signature=28a5676d8694bc5ffd463c3eefd517c34e402114318ee13c9bdc159c3f45f7c1c176613370afb655d55891a175944db77266e1f66022dfda347416f579b87806f12e22c20e8e43c3756b13425d58ce7f1c093bfcc3e691ca76633c7215d0bed991e0691eb29a2022c2e5b0c1d83a0b4e5201bc497fd62055e0942394da5c03ba7c0043f7120daba1dd52814cbdd54baff01265b76187fdfab93d0682c9991e4191a2df76f44e5699ab973241e080089517b800110281b8a38692e47b5d66a752625b1776d2ea7f2107c2cc77c90e6a74e21282b5d991b5eeca144298628bc5e4c56bb34f234c3c3ffe14fbd68bfd12038bf657bf3883535e37fe657684cf6346) link.

In [2]:
# Load sample data
cars = sns.load_dataset('mpg')
global_temperatures = pd.read_csv('data/global_temperatures_by_coutnry.csv',
                                  index_col=[0],
                                  parse_dates=['dt'])

First dataset is about cars containing data about their engine, model, etc. Second dataset gives information about average global temperatures by country from 1743 to 2013.

### Basic Exploration

In [3]:
cars.head()
cars.info()
cars.describe()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 398 entries, 0 to 397
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   mpg           398 non-null    float64
 1   cylinders     398 non-null    int64  
 2   displacement  398 non-null    float64
 3   horsepower    392 non-null    float64
 4   weight        398 non-null    int64  
 5   acceleration  398 non-null    float64
 6   model_year    398 non-null    int64  
 7   origin        398 non-null    object 
 8   name          398 non-null    object 
dtypes: float64(4), int64(3), object(2)
memory usage: 28.1+ KB


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year
count,398.0,398.0,398.0,392.0,398.0,398.0,398.0
mean,23.514573,5.454774,193.425879,104.469388,2970.424623,15.56809,76.01005
std,7.815984,1.701004,104.269838,38.49116,846.841774,2.757689,3.697627
min,9.0,3.0,68.0,46.0,1613.0,8.0,70.0
25%,17.5,4.0,104.25,75.0,2223.75,13.825,73.0
50%,23.0,4.0,148.5,93.5,2803.5,15.5,76.0
75%,29.0,8.0,262.0,126.0,3608.0,17.175,79.0
max,46.6,8.0,455.0,230.0,5140.0,24.8,82.0


In [4]:
global_temperatures.head()
global_temperatures.info()

Unnamed: 0_level_0,AverageTemperature,AverageTemperatureUncertainty,Country
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1743-11-01,4.384,2.294,Åland
1743-12-01,,,Åland
1744-01-01,,,Åland
1744-02-01,,,Åland
1744-03-01,,,Åland


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 577462 entries, 1743-11-01 to 2013-09-01
Data columns (total 3 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   AverageTemperature             544811 non-null  float64
 1   AverageTemperatureUncertainty  545550 non-null  float64
 2   Country                        577462 non-null  object 
dtypes: float64(2), object(1)
memory usage: 17.6+ MB


There are some null values in both datasets. Since we are not doing any serious analysis, we can safely drop them.

In [5]:
cars.dropna(inplace=True)
global_temperatures.dropna(inplace=True)

> __Pro Tip__: Make your dataset as tidy as possible for Seaborn to perform well. Ensure that each row is an observation and each column is a single variable.

### Scatter plots with `relplot`

Let's get started wit the `cars` dataset. We