# Graphs with python. 4
## Scatter Plots

API SEABORN - Scatter Plots

https://seaborn.pydata.org/generated/seaborn.scatterplot.html

Scatter plots are useful for understanding the relationship between two continuous variables.

We can add a third categorical variable (with colors, or with another type of graphs).

Remember that in bar charts the relationship was between a numerical variable and another categorical variable.


In [24]:
# Summary:
# - scatterplot (scatter plot)
# - regplot (linear regression line)
# - lmplot (scatterplot with categorical variable and linear regression lines)
# - swarmplot (categorical scatterplot)

In [None]:
# Example taken from kaggle: data visualizations tutorials
# https://www.kaggle.com/learn/data-visualization

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Dataset with insurance data
df_seguros = pd.read_csv("../datasets/insurance.csv")

In [None]:
df_seguros.head()

In [None]:
# scatter plot of charges (what they pay) vs. body mass index (bmi)
sns.scatterplot(x=df_seguros.bmi, y=df_seguros.charges)

In [None]:
# The graph seems to show that there is a certain correlation,
# A higher bmi correlates with higher charges

In [None]:
# A linear regression can be added to be sure

In [None]:
sns.regplot(x=df_seguros.bmi, y=df_seguros.charges)

In [None]:
# The regression line shows a slight upward trend as the bmi increases.

In [None]:
# Add a third variable using color
sns.scatterplot(x=df_seguros['bmi'], y=df_seguros['charges'], hue=df_seguros['smoker'])

In [None]:
# Two groups are identified quite clearly
# In the group of non-smokers there is not much relation with bmi
# In the group of smokers the relationship seems to be more pronounced

In [None]:
sns.lmplot(x="bmi", y="charges", hue="smoker", data=df_seguros)

In [None]:
# The regression shows different relationships for the different groups.
# Costs for non-smokers hardly increase as their bmi increases
# Costs for smokers skyrockets as their bmi rises

In [None]:
# Categorical scatter plot
# Contrary to the previous graph, where we focus on the relationship between the numerical variables and add a categorical one
# a new type of graph, the swarmplot, focuses more on the difference between the categories.

In [None]:
sns.swarmplot(data=df_seguros,x='charges')

In [None]:
# It works like a graphic of distribution, but we can see all the occurrences(as points)

In [None]:
# We can adjust the size of the graphic and the dot size to improve the visualization.

In [None]:
plt.figure(figsize=(10,5))
sns.swarmplot(data=df_seguros,x='charges',size=3)


In [None]:
plt.figure(figsize=(10,5))
sns.swarmplot(data=df_seguros,x='charges',y='smoker',size=3)

In [None]:
# After adding the category 'smoker' we can see two different distributions

In [None]:
plt.figure(figsize=(10,5))
sns.swarmplot(data=df_seguros,x='charges',y='smoker',hue='sex',size=3)

In [None]:
# At first sight, sex doesn't seem significant

In [None]:
plt.figure(figsize=(10,5))
sns.swarmplot(data=df_seguros,x='charges',y='sex',hue='smoker',size=3)

In [None]:
# This graph shows the same conclusion but in a clearer way