![Data Dunkers Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdata-dunkers%2Flessons&branch=main&subPath=graphing-scatter-plots.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a><a href="https://colab.research.google.com/github/data-dunkers/lessons/blob/main/graphing-scatter-plots.ipynb" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"/></a>

# Creating Line Graphs and Scatter Plots

The corresponding Activity Notebook for this Lesson Notebook can be found [here](https://github.com/Data-Dunkers/activities/blob/main/graphing.ipynb).

## Objectives

Students will be able to:

- Analyze relationships between variables using scatter plots. *(Example: Create a scatter plot to examine the relationship between field goals attempted and field goals made by Pascal Siakam.)*
- Customize and enhance visual data analysis. *(Example: Customize a scatter plot by adding titles, labels, and adjusting point sizes for better clarity.)*
- Explore trends and patterns in data using visual tools like trendlines. *(Example: Add a trendline to a scatter plot to analyze how Pascal Siakam's field goal efficiency changes with more attempts.)*

## Getting and Filtering Data

We're going to continue with the same importing and processing of the Pascal Siakam data we have been using.

In [None]:
import pandas as pd
url = 'https://raw.githubusercontent.com/Data-Dunkers/data/main/NBA/Pascal_Siakam.csv'
df = pd.read_csv(url)
df = df[df['SEASON_ID'] <= '2022-23']
df

## Line Graphs

Previously we used `px.bar()` to create bar graphs with Plotly Express, we can create [line graphs](https://plotly.com/python/line-charts/) with `px.line()`.

In [None]:
import plotly.express as px
px.line(df, 
        x='SEASON_ID', 
        y='FGM', 
        title='Siakam Field Goals by Season')

## Scatter Plots

[Scatter plots](https://plotly.com/python/line-and-scatter/) are similar to line graphs, without the connecting lines.

In [None]:
px.scatter(df, 
           x='FGA', 
           y='FGM', 
           title='Siakam Field Goals versus Field Goal Attempts')

## Displaying More Data

We can **color** code the points by another value from the data set. Let's put numeric values on the axes and color the points by season.

In [None]:
px.scatter(df, 
           x='FGA', 
           y='FGM', 
           title='Siakam Field Goals Made versus Field Goal Attempts', 
           color='SEASON_ID')

We can also change the **size** of the data points to be proportional to one of the data columns.

In [None]:
px.scatter(df, 
           x='FGA', 
           y='FGM', 
           title='Siakam Field Goals versus Field Goal Attempts', 
           color='SEASON_ID', 
           size='AST')

## Trendlines

To help us draw conclusions we can add a line of best fit, which we call a trendline. We often use the [ordinary least squares](https://en.wikipedia.org/wiki/Ordinary_least_squares) (OLS) method to calculate the parameters of the trendline.

In [None]:
px.scatter(df, 
           x='FGA', 
           y='FGM', 
           title='Siakam Field Goals Made versus Field Goal Attempts', 
           trendline='ols')

## Exercise

Create a scatter plot with assists per game `('AST')` on the x-axis, points per game `('PTS')` on the y-axis, and `color='AGE'`. Include a trendline.

In [None]:
# Enter your program here!


---
Back to [Lessons](https://github.com/Data-Dunkers/lessons/blob/main/lessons.ipynb)