## Predicting NBA players' scoring average in Regular Season and Playoffs using Machine Learning.
In this notebook, I am going to go through an example machine learning project with the goal of predicting the scoring average of players

Problem definition
How well can we predict the scoring of each player given its characteristics and categories provided by Kaggle.

Data
The data was adapted from that of Data from Basketball Reference.  https://www.kaggle.com/datasets/vivovinco/nba-player-stats

There is 2 dataset:

2021-2022 NBA Player Stats Regular Season CSV
2021-2022 NBA Player Stats Playoff CSV


Evaluation
The evaluation metric for this competition is the RMSLE (root mean squared log error) between the actual and predicted the scoring Per Average.

For more on the evaluation of this project check: https://www.kaggle.com/datasets/vivovinco/nba-player-stats

Note: The goal for most regression evaluation metrics is to minimize the error. For example, our goal for this project will be to build a machine learning model which minimises RMSLE.

Features
* Rk : Rank
* Player : Player's name
* Pos : Position
* Age : Player's age
* Tm : Team
* G : Games played
* GS : Games started
* MP : Minutes played per game
* FG : Field goals per game
* FGA : Field goal attempts per game
* FG% : Field goal percentage
* 3P : 3-point field goals per game
* 3PA : 3-point field goal attempts per game
* 3P% : 3-point field goal percentage
* 2P : 2-point field goals per game
* 2PA : 2-point field goal attempts per game
* 2P% : 2-point field goal percentage
* eFG% : Effective field goal percentage
* FT : Free throws per game
* FTA : Free throw attempts per game
* FT% : Free throw percentage
* ORB : Offensive rebounds per game
* DRB : Defensive rebounds per game
* TRB : Total rebounds per game
* AST : Assists per game
* STL : Steals per game
* BLK : Blocks per game
* TOV : Turnovers per game
* PF : Personal fouls per game
* PTS : Points per game


## Importing the data and preparing it for modeling

In [2]:
pip install pandas

Collecting pandas
  Downloading pandas-2.0.2-cp310-cp310-macosx_11_0_arm64.whl (10.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting tzdata>=2022.1
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m341.8/341.8 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting numpy>=1.21.0
  Downloading numpy-1.24.3-cp310-cp310-macosx_11_0_arm64.whl (13.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.9/13.9 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: tzdata, numpy, pandas
Successfully installed numpy-1.24.3 pandas-2.0.2 tzdata-2023.3
Note: you may need to restart the kernel to use updated packages.


In [None]:
# Import data analysis tools 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [7]:
# Specify the encoding parameter when reading the CSV file
df = pd.read_csv("2021_2022_NBA_PlayerStats_Regular.csv", encoding='latin1')

In [8]:
df

Unnamed: 0,Rk;Player;Pos;Age;Tm;G;GS;MP;FG;FGA;FG%;3P;3PA;3P%;2P;2PA;2P%;eFG%;FT;FTA;FT%;ORB;DRB;TRB;AST;STL;BLK;TOV;PF;PTS
0,1;Precious Achiuwa;C;22;TOR;73;28;23.6;3.6;8.3...
1,2;Steven Adams;C;28;MEM;76;75;26.3;2.8;5.1;0.5...
2,3;Bam Adebayo;C;24;MIA;56;56;32.6;7.3;13;0.557...
3,4;Santi Aldama;PF;21;MEM;32;0;11.3;1.7;4.1;0.4...
4,5;LaMarcus Aldridge;C;36;BRK;47;12;22.3;5.4;9....
...,...
807,601;Thaddeus Young;PF;33;TOR;26;0;18.3;2.6;5.5...
808,602;Trae Young;PG;23;ATL;76;76;34.9;9.4;20.3;0...
809,603;Omer Yurtseven;C;23;MIA;56;12;12.6;2.3;4.4...
810,604;Cody Zeller;C;29;POR;27;0;13.1;1.9;3.3;0.5...


In [9]:
# Specify the encoding parameter when reading the CSV file
df2 = pd.read_csv("2021_2022_NBA_PlayerStats_Playoffs.csv", encoding='latin1')

In [10]:
df2

Unnamed: 0,Rk;Player;Pos;Age;Tm;G;GS;MP;FG;FGA;FG%;3P;3PA;3P%;2P;2PA;2P%;eFG%;FT;FTA;FT%;ORB;DRB;TRB;AST;STL;BLK;TOV;PF;PTS
0,1;Precious Achiuwa;C;22;TOR;6;1;27.8;4.2;8.7;0...
1,2;Steven Adams;C;28;MEM;7;5;16.3;1.3;3;0.429;0...
2,3;Bam Adebayo;C;24;MIA;18;18;34.1;5.8;9.7;0.59...
3,4;Nickeil Alexander-Walker;SG;23;UTA;1;0;5;2;2...
4,5;Grayson Allen;SG;26;MIL;12;5;25.4;3.1;6.8;0....
...,...
212,213;Ziaire Williams;SF;20;MEM;10;1;16.8;2.3;5....
213,214;Delon Wright;SG;29;ATL;5;0;27.4;3;5.8;0.51...
214,215;Thaddeus Young;PF;33;TOR;6;0;14.5;1.5;3;0....
215,216;Trae Young;PG;23;ATL;5;5;37.2;4.4;13.8;0.3...


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 812 entries, 0 to 811
Data columns (total 1 columns):
 #   Column                                                                                                            Non-Null Count  Dtype 
---  ------                                                                                                            --------------  ----- 
 0   Rk;Player;Pos;Age;Tm;G;GS;MP;FG;FGA;FG%;3P;3PA;3P%;2P;2PA;2P%;eFG%;FT;FTA;FT%;ORB;DRB;TRB;AST;STL;BLK;TOV;PF;PTS  812 non-null    object
dtypes: object(1)
memory usage: 6.5+ KB
