NBA Player Salary Prediction

Initial data science project focuses on analyzing NBA player performance data and predicting player salaries using data analysis and machine learning models. The analysis identifies key metrics influencing salaries and provides actionable insights through data visualization, database queries, and modeling.

Datasets

The dataset was sourced from Kaggle. The data can be downloaded from here: NBA Salary Dataset. The two important datasets used are 'NBA Player Stats.csv' and 'NBA Salaries.csv'.

Key Steps implemented in this project

Data Preprocessing and Cleaning

Merged the player statistics and salary datasets to include player performance statistics, positions, teams, and salaries. This involved joining the datasets on common attributes.
Handled missing values and outliers across key features like points, assists, rebounds, and salary. This involved data cleaning, handling Null Values, and removing outliers using the Z-Score Method.
Converted numerical columns to the appropriate data types for mathematical operations.
Standardized features and scaled data where necessary.

Feature Engineering

We derived new variables from existing ones to better capture underlying patterns in the data.

Weighted Efficiency (WEFF): Combines points, assists, rebounds, steals, blocks, and turnovers, normalized by games played.
Points Per Game (PPG): Points scored divided by games played.
Assists Per Game (APG): Assists divided by games played.
Rebounds Per Game (RPG): Total rebounds divided by games played.
Steals Per Game (SPG): Steals divided by games played.
Blocks Per Game (BPG): Blocks divided by games played.
Turnovers Per Game (TPG): Turnovers divided by games played.
Usage Rate: Estimate of a player's involvement in offensive plays, based on field goals attempted, free throws attempted, and turnovers.
Shooting Efficiency: Average of field goal percentage and effective field goal percentage.
Offensive Contribution: Weighted sum of points, assists, and offensive rebounds.
Defensive Contribution: Sum of defensive rebounds, steals, and blocks.
Experience: Estimated years of professional activity, assuming players start their careers at age 19.
Games Started Percentage (GS%): Games started as a percentage of games played.
Impact Score: Weighted Efficiency (WEFF) per minute played
**Minutes Played per game (MPG): Total minutes played divided by games played
Efficiency Tiers: Players categorized into low, moderate, and high efficiency tiers.

Exploratory Data Analysis (EDA)

Visualization:
- Created plots to see the correlation of different metrics with Salary to find which metric was important.
- Made visualizations for each numeric metric vs salary to find the best features for predicting salary
- Some plots we made include:
  - Visualized salary trends by position, efficiency, and season using bar plots, scatter plots, and line plots.
  - Created more salary related visualizations to better see trends for predicting overall salary based on player stats and performance.

SQL Integration

Created a relational SQLite database for querying player statistics.
Build a schema and then inserted our dataset information in the local database
Executed advanced SQL queries for aspects such as:
- Analyzed salary trends in relation to NBA player stats, performance metrics, and efficiency.
- Explored year-on-year salary growth and distribution across seasons.
- Examined salary variations across different age groups and career stages.
- Investigated the impact of specific contributions (offensive/defensive) and efficiency on player earnings
By incorporating database management, we were able to easily query and find salary related trends, helping us build the overall machine learning model.

Machine Learning Models

Linear Regression:
- Established a baseline model for salary prediction. This model didnt perform too well.
- Evaluated using Mean Squared Error (MSE), Mean Absolute Error (MAE), and R² score.
Decision Tree Regressor:
- Improved prediction accuracy by capturing non-linear relationships. It uses decision trees to predict.
Neural Network:
- Uses a network of nodes (with one hidden layer using a greedy optimization approach)
Random Forest Regressor:
- Enhanced the model by reducing overfitting and improving robustness. This works by combining the predictions of multiple decision trees. This turned out to be the best model with a relatively high R² value.

Advanced Statistic Concepts

PRESS, Cp, Bootstrapping, K-fold cross-validation:
- Helped evaluate the model's performance and analyze how robust it was

Local Dashboard

Integrated an user interactive dashboard using ipywidgets to dynamically input player statistics and predict salaries.
Added a slider and textboxs for value inputs.
Used a trained Random Forest Regressor model (best performing model) to predict salaries dynamically.
Displayed predicted salaries after button click to create an user friendly dashboard.

Interactive Web Dashboard

Developed a web-based interactive dashboard using Dash for dynamic player salary predictions.
Integrated sliders and dropdowns for real-time input of player stats.
Displayed predicted salaries using the best-performing Random Forest Regressor model.
Enhanced user accessibility with visual styling, animations, and responsiveness for mobile devices.
This is deployed using Heroku. Here is the link:

Flow of the Project

Setup and Environment:
- Clone this repository and open the Jupyter Notebook file finalProj.ipynb in a Python notebook platform like Jupyter Notebook or CodeBench.
Data Import:
- The notebook imports multiple .csv files containing player statistics/info and salaries. This was downloaded fromthe Kaggle data link.
- Reads and retrieves the data using pandas and merges datasets into a consolidated DataFrame.
Data Cleaning and Processing:
- Handles missing values and outliers.
- Converts numerical columns to appropriate data types.
- Performs feature engineering to calculate metrics like Weighted Efficiency (WEFF).
Visualization:
- Creates various plots and charts (e.g., bar plots, scatter plots, heatmaps) to understand trends and relationships in the data.
- Analyzes salary trends by season, player position, and efficiency tiers.
SQL Queries:
- Transfers the data into an SQLite database.
- Executes queries to analyze salary trends, identify top players, evaluate team performance, etc.
Machine Learning:
- Trains models (Linear Regression, Decision Tree, Nueral Network, Random Forest) to predict salaries based on player performance metrics.
- Evaluates models using metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R² score.
- Create robust models that can help predict NBA player salaries in future seasons/years given their current stats/performance.
Model Evaluation:
- Applies statistical concepts such as PRESS and Mallow's Cp to choose the best model and see its performance
- Performs bootstrapping to evaluate the final model's performance on multiple resampled training sets to get 95% confidence intervals for Mean Squared Error (MSE) and R² scores.
- Does k-fold cross validation by splitting the dataset into k subsets (folds). The model is trained on k-1 folds and tested on the remaining fold, repeating the process k times so that each fold serves as the test set once. This method ensures that the model is tested on different subsets of data, providing a robust estimate of its performance.
Local Dashboard:
- Implements an user interacgive dashboard using ipywidgets to allow users to input player stats dynamically.
- Predicts salaries using the best performing model (Random Forest Regressor) and displays results interactively.
Web Dashboard:

Deployed an interactive Dash-based web application for real-time salary predictions for users.

Requirements

The following libraries were used that must be imported:

pandas
numpy
matplotlib
seaborn
scikit-learn
sqlite3
statsmodel
tensorflow
ipywidgets
dash

Steps to Run the Project

Clone this repository:

git clone https://github.com/vedp2003/NBAMachineLearningProject.git

Navigate to the project folder:
```
cd NBAMachineLearningProject
```
Download the dataset csvs. Save them in the same directory/folder as the Jupyter notebook.
Open and run the Jupyter notebook nba_salary_prediction_project.ipynb cell by cell to execute the analysis
Additional steps and installations may be needed to successfully run the Interactive Dashboard on the notebook. See below for steps

Setup for Local Dashboard

To enable the local dashboard in Jupyter Notebook, follow these steps:

Verify Node.js and npm: Run the following commands to check if Node.js and npm are installed and their versions:
```
node -v
npm -v
```

. Upgrade Node.js if necessary: If your Node.js version is below 20.0.0, you may need to upgrade it using the following commands:

wget https://nodejs.org/dist/v20.8.0/node-v20.8.0-linux-x64.tar.xz
tar -xf node-v20.8.0-linux-x64.tar.xz
mv node-v20.8.0-linux-x64 /path/to/your/directory/nodejs  # Replace /path/to/your/directory with the directory you want
export PATH=/path/to/your/directory/nodejs/bin:$PATH  # Replace with the same directory as above

Make sure updated Node.js path is active:: You can ensure the updated Node.js path is active by running this:

import os
os.environ['PATH'] = "/path/to/your/directory/nodejs/bin:" + os.environ['PATH']  # Replace /path/to/your/directory with the directory

Install required Python packages: Install the necessary Python packages for widgets functionality:

pip install --user --upgrade ipywidgets
pip install --user jupyterlab_widgets
pip install --upgrade jupyterlab
pip install --upgrade jupyterlab_widgets

Install JupyterLab extensions: Install the required JupyterLab extensions for enabling widgets:

jupyter labextension install @jupyter-widgets/jupyterlab-manager         #Run this as long there are no permission constraints 
jupyter labextension install @jupyter-widgets/jupyterlab-manager --app-dir=$(jupyter --data-dir)/lab    #You can also run this if you want to install the extensions in your home director

Rebuild JupyterLab and Restart the Kernel: After installing the extensions, rebuild JupyterLab to integrate the changes. Restart the kernel to ensure all updates take effect. It can be rebuilt by running: !jupyter lab build

NOTE

The commands listed above can be executed in the terminal. However, these commands can also be run directly within Jupyter Notebook cells by adding a ! in front of each command. For example:
```
!pip install --user ipywidgets
!jupyter labextension install @jupyter-widgets/jupyterlab-manager
!node -v
```

Setup for Interactive Web Dashboard

To enable the web dashboard, follow these steps:

Run the dashboard: Run the following commands to run the web dashboard
```
python nba_salary_dashboard.py
```

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
NBA Player Stats(1950 - 2022).csv		NBA Player Stats(1950 - 2022).csv
NBA Salaries(1990-2023).csv		NBA Salaries(1990-2023).csv
README.md		README.md
nba_salary_prediction_project.ipynb		nba_salary_prediction_project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NBA Player Salary Prediction

Datasets

Key Steps implemented in this project

Data Preprocessing and Cleaning

Feature Engineering

Exploratory Data Analysis (EDA)

SQL Integration

Machine Learning Models

Advanced Statistic Concepts

Local Dashboard

Interactive Web Dashboard

Flow of the Project

Requirements

Steps to Run the Project

Setup for Local Dashboard

NOTE

Setup for Interactive Web Dashboard

About

Uh oh!

Releases

Packages

Languages

VagMur/NBAMachineLearningProject

Folders and files

Latest commit

History

Repository files navigation

NBA Player Salary Prediction

Datasets

Key Steps implemented in this project

Data Preprocessing and Cleaning

Feature Engineering

Exploratory Data Analysis (EDA)

SQL Integration

Machine Learning Models

Advanced Statistic Concepts

Local Dashboard

Interactive Web Dashboard

Flow of the Project

Requirements

Steps to Run the Project

Setup for Local Dashboard

NOTE

Setup for Interactive Web Dashboard

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages