# Lesson: Interpreting Line Graphs & Trendlines

Line graphs are one of the most powerful tools in data science for showing **trends over time**. In this lesson, we'll introduce a new concept: the **1st Order Fit** (also known as a **Trendline**).

By adding a trendline, we can see the general direction of a player's career, making it much easier to spot **outliers** (seasons that are unusually high or low) and **clusters** (periods of steady performance).

In [None]:
import pandas as pd
import plotly.express as px
# Note: px.trendline requires 'statsmodels' to be installed

## 1. Loading the Data

We'll use career scoring trends for WNBA legends **Diana Taurasi** and **DeWanna Bonner**.

In [None]:
url = "https://raw.githubusercontent.com/Data-Dunkers/data/main/WNBA/player/wnba_player_stats_all.csv"
df = pd.read_csv(url)
df.head()

## 2. Filtering and Cleaning

We'll filter for our two players and clean up the year data.

In [None]:
players = ["Diana Taurasi", "DeWanna Bonner"]
df_filtered = df[df['Name'].isin(players)]
df_filtered = df_filtered.dropna(subset=['Year'])
df_filtered['Year'] = df_filtered['Year'].astype(int)
df_filtered = df_filtered.sort_values(['Name', 'Year'])

## 3. Creating the Graph with a Trendline

We'll use `px.scatter` with the `trendline="ols"` parameter. OLS stands for "Ordinary Least Squares," which is a mathematical way to find the straight line that best fits the data points.

In [None]:
fig = px.scatter(df_filtered, x="Year", y="PTS", color="Name",
                 title="WNBA Career Scoring: Actual Data vs. Trendlines",
                 labels={"PTS": "Points Per Game (PPG)", "Year": "Season Year"},
                 trendline="ols")

fig.show()

## 4. What is a 1st Order Fit?

The straight solid lines you see on the graph are the **1st Order Fit**. This line averages out the "ups and downs" of individual seasons to show the overall career trajectory.

### Why use a Trendline?
1. **Spotting Trends**: If the trendline points up, the player is generally improving over time. If it's flat, they are remarkably consistent. 
2. **Finding Outliers**: Look for data points that are very far above or below the trendline. These are "outlier" seasonsâ€”perhaps a year where the player had a spectacular injury-free run, or a year where they were playing through pain.
3. **Identifying Clusters**: Do you see a group of 3-4 points that are all sitting right on or near the line? That's a "cluster" of consistent performance.

## Reflection Questions

1. **Look at the trendlines for both players. Whose career shows a more positive (upward) trend over time?**
2. **Identify one specific season point that is an "outlier" (farthest from its trendline). Which player and which year?**
3. **If a player has a very flat trendline, what does that tell you about their consistency as a scorer?**