# A Comprehensive Analysis of NBA Player Salaries


## Introduction

This project analyzes the factors that determine NBA player salaries during the 2024-2025 season, investigating how statistics, age, and position influence compensation. We focus particularly on the extreme salary disparity caused by superstar players who earn vastly more than average players. To understand which performance metrics teams value most, we compare traditional box score statistics against advanced all-in-one metrics. Our analysis employs multiple modeling approaches including Ordinary Least Squares regression as a baseline, Ridge and LASSO regression to handle correlated predictors, and Random Forest to capture non-linear patterns. By combining regularized linear models with ensemble methods, we provide a comprehensive examination of salary determinants in professional basketball. Ultimately, this research aims to reveal what truly drives NBA salary decisions and whether teams prioritize traditional stats or advanced analytics when compensating players.

## Data Description

Below is a snapshot of what our data looks like. As you can see, each player is assigned to a record where the player's game statistics and salaries are recorded. The features that we look at are a mix of traditional box score statistics (Games Played, Field Goals, etc.) in addition to advanced all-in-one metrics (PER, VORP, etc.). All the data is scraped from Basketball-Reference, a third-party site that houses NBA statistics across several seasons. 


We also perform some feature engineering to extract binary features based on various NBA Awards and whether a player is in their "Contract Year". A "Contract Year" means that a player is in the final year of their contract, therefore, the player will be motivated to play to the best of their ability to prove to the franchise that they are worthy of a new contract.

In [1]:
import pandas as pd
import os

data_path_2024 = os.getcwd()+'/data/final_2024_player.csv'
df = pd.read_csv(data_path_2024)
df.head()

Unnamed: 0,Rk_x,Player,Age,Team,Pos,G,GS,MP_x,FG,FGA,...,FirstTeam,SecondTeam,ThirdTeam,DefTeam1,DefTeam2,Salary,Guaranteed,2023-24_contract_year,Next_Year_Salary,Next_Year_Guaranteed
0,363.0,A.J. Green,24.0,MIL,SG,56.0,0.0,11.0,1.5,3.5,...,0,0,0,0,0,1901769,1901769,0,2120693,2120693
1,476.0,AJ Griffin,20.0,ATL,SF,20.0,0.0,8.6,0.9,3.1,...,0,0,0,0,0,3712920,7602840,0,250000,250000
2,109.0,Aaron Gordon,28.0,DEN,PF,73.0,73.0,31.5,5.5,9.8,...,0,0,0,0,0,22266182,46107637,0,22841455,112197227
3,278.0,Aaron Holiday,27.0,HOU,PG,78.0,1.0,16.3,2.4,5.3,...,0,0,0,0,0,2019706,0,1,4668000,4668000
4,136.0,Aaron Nesmith,24.0,IND,SF,72.0,47.0,27.7,4.4,8.8,...,0,0,0,0,0,5634257,38634257,0,11000000,33000000


## Exploratory Data Analysis

## Model Building and Evaluation

## Results and Conclusion

# Author Contributions
- **Brian Fernando:** Brian worked on scraping the data. Brian also worked on feature engineering, setting up yaml file. Brian also helped with main.ipynb notebook
- **Sharona Yang:** Sharona worked on the modeling section of the analysis. Sharona also helped put together the pdf files for the MyST site and worked on the main.ipynb file.
- **Aarush Maddela:** Aarush worked on the EDA section of the project, sifting through the data to find all the trends and insights. Aarush also generated plots and figures.
- **Nixon Tan:** Nixon worked on the Makefile and running the tests. Nixon also helped created the Myst site and helped with the ai-documentation and merging PRs.
