# Scatter Plots
- Scatter plots are the graphs that present the relationship between two variables in a data-set. It represents data points on a two-dimensional plane or on a Cartesian system. The independent variable or attribute is plotted on the X-axis, while the dependent variable is plotted on the Y-axis. These plots are often called scatter graphs or scatter diagrams.
- A scatter plot is also called a scatter chart, scattergram, or scatter plot, XY graph. The scatter diagram graphs numerical data pairs, with one variable on each axis, show their relationship. Now the question comes for everyone: when to use a scatter plot?

**Scatter plots are used in either of the following situations.**
- When we have paired numerical data
- When there are multiple values of the dependent variable for a unique value of an independent variable
- In determining the relationship between variables in some scenarios, such as identifying potential root causes of problems, checking whether two products that appear to be related both occur with the exact cause and so on.

**We can directly look at scatter plots to check the relationship between 2 variables**


![interpreting-a-scatter-plot.jpg](attachment:interpreting-a-scatter-plot.jpg)# 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
# plt.rcParams['figure.figsize'] = (7,5)
plt.rcParams['figure.dpi'] = 100

In [None]:
x = np.linspace(1,100,200)
y = np.random.random_sample(200)

plt.scatter(x,y)

In [None]:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x,y)
plt.xlabel('Age of the car')
plt.ylabel('Speed of the car')
plt.title('COMPARISON BETWEEN CAR MODEL AND SPEED OF THE CAR')

- The observation in the example above is the result of 13 cars passing by.
- The X-axis shows how old the car is.
- The Y-axis shows the speed of the car when it passes.
- Are there any relationships between the observations?
- It seems that the newer the car, the faster it drives, but that could be a coincidence, after all we only registered 13 cars.

# Compare Plots
- In the example above, there seems to be a relationship between speed and age, but what if we plot the observations from another day as well? Will the scatter plot tell us something else?

In [None]:
#day one, the age and speed of 13 cars:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)

#day two, the age and speed of 15 cars:
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y)
plt.xlabel('Age of the car')
plt.ylabel('Speed of the car')
plt.title('COMPARISON BETWEEN CAR MODEL AND SPEED OF THE CAR')

**YOU CAN CHANGE THE COLOR OF POINTS**

In [None]:
#day one, the age and speed of 13 cars:
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y, color='c')

#day two, the age and speed of 15 cars:
x = np.array([2,2,8,1,15,8,12,9,7,3,11,4,7,14,12])
y = np.array([100,105,84,105,90,99,90,95,94,100,79,112,91,80,85])
plt.scatter(x, y,color = 'b')
plt.xlabel('Age of the car')
plt.ylabel('Speed of the car')
plt.title('COMPARISON BETWEEN CAR MODEL AND SPEED OF THE CAR')

**USE CAN ALSO SET A DIFFERENT COLORS TO EACH DATA POINT**

In [None]:
a = [1,2,3,4,5,6,7,8,9]
b = [100,200,300,400,500,600,700,800,900]
colors = np.array(["red","green","blue","yellow","pink","black","orange","purple","cyan"])
plt.scatter(a,b, c= colors)
plt.xlabel('Age of the car')
plt.ylabel('Speed of the car')

In [None]:
import seaborn as sns

In [None]:
df = pd.read_csv('Automobile_data.csv',nrows=100)
sns.scatterplot(data=df,x= 'engine-size',y = 'width',hue='body-style')
plt.figure(figsize=(20,10))