# Future predictions

In this notebook we train and calibrate our best-performing model using all available data to predict whether or not a given player will play in the NBA _at any point_ during the 2024-2025 season.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from HelperFunctions import *

## Loading the full dataset

In [2]:
df = pd.read_csv("full_data.csv")

We will now perform some imputation and scaling. It is important to note that all of the transformations we will perform here will be performed "within season," meaning that the data for a given season is transformed using information from ***only*** that season and no other.

In particular, since at the time of prediction we will have access to all of the data for that particular season, there is ***no data leakage*** occurring here.

In [3]:
df = ImputeAndScale(df)

In [4]:
# select features
features = df.select_dtypes(include='number').columns.drop(['PLAYER_ID', 'SEASON_START', 'IN_LEAGUE_NEXT'])

In [5]:
#make 2023-2024 season the test set
df_train = df.loc[df.SEASON_START < 2023]
df_test  = df.loc[df.SEASON_START == 2023]

In [6]:
#split training data into a train set and a calibration set
from sklearn.model_selection import train_test_split

df_tt, df_cal = train_test_split(df_train, test_size=0.2, shuffle=True, random_state=815, stratify=df_train.IN_LEAGUE_NEXT)

## Training the model

In [7]:
from imblearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE
from xgboost import XGBClassifier

model = Pipeline([('smote', SMOTE(random_state=23)),
                  ('xgb', XGBClassifier(n_estimators=350, learning_rate=0.005, random_state=206))])

model.fit(df_tt[features], df_tt.IN_LEAGUE_NEXT)

final_model_cal = CalibratedClassifierCV(model, cv="prefit")
final_model_cal.fit(df_cal[features], df_cal.IN_LEAGUE_NEXT)

In [8]:
df_2023 = df_test.copy()


df_2023["PREDICTIONS"] = model.predict(df_test[features])
df_2023["PROB"]        = final_model_cal.predict_proba(df_test[features])[:, 1]

In [9]:
df_2023

Unnamed: 0,NAME,PLAYER_ID,SEASON_START,TEAMS_LIST,PLAYER_AGE,EXPERIENCE,POS,GP,GS,MIN,...,WAIVED_POST,RELEASED_OFF,RELEASED_REG,RELEASED_POST,TRADED_OFF,TRADED_REG,TRADED_POST,IN_LEAGUE_NEXT,PREDICTIONS,PROB
7202,LeBron James,2544,2023,['LAL'],3.015457,3.878235,PF,0.973735,1.841158,1.730415,...,0,0,0,0,0,0,0,0,1,0.967420
8734,Chris Paul,101108,2023,['GSW'],3.015457,3.386416,PG,0.464222,-0.130322,0.581897,...,0,0,0,0,1,0,0,0,1,0.970181
9447,Kyle Lowry,200768,2023,"['MIA', 'PHI']",2.780369,3.140506,PG,0.542609,1.245994,0.769579,...,0,0,0,0,0,1,0,0,1,0.940913
9521,P.J. Tucker,200782,2023,"['PHI', 'LAC']",3.015457,1.910958,PF,-0.593998,-0.427904,-0.651610,...,0,0,0,0,0,1,0,0,0,0.561743
9676,Kevin Durant,201142,2023,['PHX'],2.075105,2.648687,PF,1.130509,1.989949,2.069187,...,0,0,0,0,0,0,0,0,1,0.975660
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15814,Dexter Dennis,1641926,2023,['DAL'],-0.275776,-1.039957,SG,-1.652219,-0.799881,-1.189867,...,0,0,0,0,0,0,0,0,0,0.319392
15815,Onuralp Bitim,1641931,2023,['CHI'],-0.275776,-1.039957,SG,-0.907545,-0.762684,-0.908935,...,0,0,0,0,0,0,0,0,0,0.361219
15816,Maozinha Pereira,1641970,2023,['MEM'],-0.745953,-1.039957,SF,-1.534639,-0.762684,-1.081271,...,0,0,0,0,0,0,0,0,0,0.509416
15817,Trey Jemison,1641998,2023,"['WAS', 'MEM']",-0.510865,-1.039957,C,-0.829159,-0.279113,-0.547736,...,0,0,0,0,0,0,0,0,1,0.876852
