<center> <img src="https://yildirimcaglar.github.io/ds3000/ds3000.png"> </center>

<center> <h2>Scatter Plots</h2></center>

## Outline
1. <a href='#1'>Scatter Plots</a>
2. <a href='#2'>Plotting Two Groups in One Scatter Plot</a>
3. <a href='#3'>Plot Properties</a>
4. <a href='#4'>Bubble Plots</a>

<a id="1"></a>

## 1. Scatter Plots
* Commonly used to show distribution of two variables using a cloud of points, where each point represents an observation in the dataset
* Makes it easier to discern the relationship between two variables
* Both axes should be a numeric variable
* Use **sns.scatterplot()** method to draw scatter plots
* https://seaborn.pydata.org/generated/seaborn.scatterplot.html

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
%matplotlib inline

In [None]:
import pandas as pd
df = pd.read_csv("res/ave_grades.csv")

In [None]:
df

In [None]:
df.head()

In [None]:
graph = sns.scatterplot(x="Potion_Ave", y="Charm_Ave", data = df)

In [None]:
import seaborn as sns

# create and display the scatter plot
graph = sns.scatterplot(x="Potion_Ave", y="Charm_Ave", data = df)

#specify the title
title = "The Relationship between Potion and Charm Grades"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("Average Potion Grades", size = 16)
graph.set_ylabel("Average Charm Grades", size = 16)

<a id="2"></a>

## 2. Plotting Two Groups in One Scatter Plot
* The datafile should have at least three columns:
    * One for the labels on the x-axis
    * One for the values on the y-axis
    * One for the category of the lines (grouping column)

<br>

* Specify the category column, or variable, using the **hue** keyword attribute
    * **hue** indicates the grouping variable that will produce scatters with different colors.

In [None]:
df.head()

In [None]:
import seaborn as sns



# create and display the line plot
graph = sns.scatterplot(x="Potion_Ave", y="Charm_Ave", hue = "House", 
                        data = df, palette = ["darkred", "gold", "darkblue", "darkgreen"])

#specify the title
title = "The Relationship between Potion and Charm Grades"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("Average Potion Grades", size = 16)
graph.set_ylabel("Average Charm Grades", size = 16)

#move the legend to lower right
plt.legend(loc="lower right")

<a id="3"></a>

## 3. Scatter Plot Properties

### 3.1. Changing Markers
* Use the **marker** keyword to change the marker to a different symbol
* Default is point
* Full list of markers:
    * https://matplotlib.org/3.1.1/api/markers_api.html



In [None]:
import seaborn as sns


# create and display the line plot
graph = sns.scatterplot(x="Potion_Ave", y="Charm_Ave", data = df, marker = "*", color = "darkgreen", )

#specify the title
title = "The Relationship between Potion and Charm Grades"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("Average Potion Grades", size = 16)
graph.set_ylabel("Average Charm Grades", size = 16)


### 3.2. Changing Marker Size
* Use the **s** keyword to specify the size of markers

In [None]:
import seaborn as sns


# create and display the scatter plot
graph = sns.scatterplot(x="Potion_Ave", y="Charm_Ave", data = df, 
                        marker = "o", s=100, color = "darkgreen", alpha=0.5)

#specify the title
title = "The Relationship between Potion and Charm Grades"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("Average Potion Grades", size = 16)
graph.set_ylabel("Average Charm Grades", size = 16)

<a id="4"></a>

## 4. Bubble Plots
* A variation of a scatter plot in which the data points are replaced with bubbles
* The size of the bubbles represents a third dimension of the data
* Use the **size** keyword to specify the column to be used for the size of the bubbles.
* **sizes** can specify the min and max size of the bubbles in a tuple.

In [None]:
world_data = pd.read_csv("res/life_income.csv")

In [None]:
world_data.head()

In [None]:
import seaborn as sns

# create and display the line plot
graph = sns.scatterplot(x="Income", y="LifeExpectancy", data = world_data, s=100, alpha=.5)

#specify the title
title = "Relationship between Income(GDP per capita) and Life Expectancy"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("Income", size = 16)
graph.set_ylabel("Life Expectancy", size = 16)

In [None]:
import seaborn as sns

# create and display the scatter plot
graph = sns.scatterplot(x="Income", y="LifeExpectancy", size="Population", data = world_data, sizes= (100, 2000), 
                        legend=False, color="darkred", alpha=.5)

#specify the title
title = "Relationship between Income(GDP per capita) and Life Expectancy"

#set the title of the plot
graph.set_title(title, size = 16)

#add labels to the axes  
graph.set_xlabel("Income", size = 16)
graph.set_ylabel("Life Expectancy", size = 16)