# NBA Regular Season 2019/2020 Ranking Prediction

This project's goal is to use Data Science techniques (statistics, probabilities, ml) to predict the NBA teams regular season ranking (without considering the seperated conferences i.e a global ranking and not a ranking for Western/Eastern conference).

To do this, we will approach this problem as a multi-class classification i.e the target variable will be the rank of each team.

The data related to each team statistics has been scraped on https://stats.nba.com from 1996 to 2019, the scripts to do the scraping are available in my GitHub repository here : https://github.com/cyriltso/nba_prediction.

This notebook contains all the analysis, visualization and modeling process.

### 1. Import libraries

In [1]:
### Ignore warning messages

import warnings
warnings.filterwarnings("ignore")

### Graphics Settings

%pylab inline 
sns.set(style = 'whitegrid', palette = 'pastel', font_scale = 1.5)
rcParams['figure.figsize'] = 20, 10

### Data Manipulation & Visualization

import pandas as pd
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
import seaborn as sns
from math import *

Populating the interactive namespace from numpy and matplotlib


### 2. Importing data

The scraped data are stored in a CSV file, so we will use Pandas to load it as a DataFrame.

All the features defined in the DataFrame are explained on the NBA official website : https://stats.nba.com/help/glossary/.

In [2]:
### Loading the dataset

df = pd.read_csv('team_stats.csv')
df.head()

Unnamed: 0,season,team,rank,game_played,wins,losses,wins_ratio,minutes_played,scoring_average,field_goals_made,...,defensive_rebounds,total_rebounds,assists,turnovers,steals,blocks,blocks_attempts,personal_fouls,personal_fouls_drawn,plus_minus
0,2018/2019,Milwaukee Bucks,1,82.0,60.0,22.0,0.732,48.2,118.1,43.4,...,40.4,49.7,26.0,13.9,7.5,5.9,4.8,19.6,20.2,8.9
1,2018/2019,Toronto Raptors,2,82.0,58.0,24.0,0.707,48.5,114.4,42.2,...,35.6,45.2,25.4,14.0,8.3,5.3,4.5,21.0,20.5,6.1
2,2018/2019,Golden State Warriors,3,82.0,57.0,25.0,0.695,48.3,117.7,44.0,...,36.5,46.2,29.4,14.3,7.6,6.4,3.6,21.4,19.5,6.5
3,2018/2019,Denver Nuggets,4,82.0,54.0,28.0,0.659,48.1,110.7,41.9,...,34.5,46.4,27.4,13.4,7.7,4.4,5.0,20.0,20.4,4.0
4,2018/2019,Houston Rockets,5,82.0,53.0,29.0,0.646,48.4,113.9,39.2,...,31.9,42.1,21.2,13.3,8.5,4.9,4.5,22.0,20.0,4.8


### 3. Exploratory Data Analysis

In [3]:
### Dimension of the data

df.shape

(682, 29)

In [4]:
### Information related to the data

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 682 entries, 0 to 681
Data columns (total 29 columns):
season                     682 non-null object
team                       682 non-null object
rank                       682 non-null int64
game_played                682 non-null float64
wins                       682 non-null float64
losses                     682 non-null float64
wins_ratio                 682 non-null float64
minutes_played             682 non-null float64
scoring_average            682 non-null float64
field_goals_made           682 non-null float64
field_goals_attempts       682 non-null float64
field_goals_percentage     682 non-null float64
three_points_made          682 non-null float64
three_points_attempts      682 non-null float64
three_points_percentage    682 non-null float64
free_throws_made           682 non-null float64
free_throws_attempts       682 non-null float64
free_throws_percentage     682 non-null float64
offensive_rebounds         682 non-

In [7]:
### Descriptive Statistics

to_drop_stats = [
    'rank', 'game_played', 'wins',
    'losses', 'wins_ratio', 'minutes_played',
    'field_goals_made', 'field_goals_attempts', 'three_points_made',
    'three_points_attempts', 'free_throws_made', 'free_throws_attempts',
    'total_rebounds', 'blocks_attempts'
]

df_stats = df.drop(to_drop_stats, axis=1)

df_stats.describe()

Unnamed: 0,scoring_average,field_goals_percentage,three_points_percentage,free_throws_percentage,offensive_rebounds,defensive_rebounds,assists,turnovers,steals,blocks,personal_fouls,personal_fouls_drawn,plus_minus
count,682.0,682.0,682.0,682.0,682.0,682.0,682.0,682.0,682.0,682.0,682.0,682.0,682.0
mean,98.929472,45.185484,35.352493,75.439736,11.400587,30.915103,21.835337,14.639296,7.697947,4.924194,21.264956,12.885337,0.00132
std,5.946911,1.62226,2.097646,2.996663,1.463406,2.125288,2.00958,1.206983,0.912,0.837459,1.773741,9.986079,4.583567
min,81.9,40.1,26.4,66.0,7.6,24.9,15.6,11.2,5.5,2.4,16.6,0.0,-13.9
25%,94.9,44.2,34.1,73.7,10.4,29.5,20.5,13.9,7.1,4.3,20.0,0.1,-3.075
50%,98.2,45.1,35.3,75.6,11.4,30.7,21.6,14.6,7.6,4.9,21.2,19.4,0.2
75%,102.5,46.2,36.7,77.5,12.4,32.3,23.1,15.3,8.3,5.5,22.5,20.9,3.4
max,118.1,50.4,42.8,82.9,17.2,40.4,30.4,19.0,12.0,8.2,27.1,25.7,11.6
