# Data Correlations

In this notebook we will look for [correlations](https://en.wikipedia.org/wiki/Correlation) in NBA player statistics from the 2024-2025 season. We'll importing the data and then use Plotly Express to display a coloured representation of the correlation [matrix](https://en.wikipedia.org/wiki/Matrix_(mathematics)).

In [None]:
import pandas as pd
import plotly.express as px
df = pd.read_csv('https://raw.githubusercontent.com/Data-Dunkers/data/refs/heads/main/NBA/player/nba_player_stats_2024-2025.csv')
px.imshow(df.corr(numeric_only=True), title='Correlation Matrix of 2024-2025 NBA Player Stats', height=800, text_auto='.2f')

Hover your mouse over different squares to see the variable names and how correlated they are.

## Questions

1. Why is there a diagonal line with 1.00 (100%) correlations?
2. Which variables are most strongly correlated with minutes played (`MIN`)? Why do you think that is?
3. What are some other variable pairs that are strongly correlated? Why might they be correlated?
4. What are some variable pairs that are negatively correlated? Why might that be?
5. Is there anything surprising about the correlations in the data?

## Column Reference

- **GP** — Games played
- **MIN** — Minutes per game
- **PTS** — Points per game

- **FGM / FGA / FG%** — Field goals made, attempted, percentage
- **3PM / 3PA / 3P%** — Three-pointers made, attempted, percentage
- **FTM / FTA / FT%** — Free throws made, attempted, percentage

- **REB** — Rebounds
- **AST** — Assists per game
- **STL** — Steals per game
- **BLK** — Blocks per game
- **TO** — Turnovers per game
- **DD2** — [double-doubles](https://en.wikipedia.org/wiki/Double-double): games in which the player achieved double-digits in *two* stats from PTS, REB, AST, STL, BLK
- **TD3** — [triple-doubles](https://en.wikipedia.org/wiki/Double-double#Triple-double): games in which the player achieved double-digits in *three* stats from PTS, REB, AST, STL, BLK