# Présentation Projet

Le projet a pour objectif d'essayer de prédire l'équipe gagnante d'un match NBA

## Input Data

- Equipe à domicile
- Equipe à l'extérieur

- Classement des équipes depuis le début de la saison
- Classement de l'équipe à domicile à domicile
- Classement de l'équipe à domicile à l'exterieur
- Forme des équipes sur les 5 / 10 / 15 derniers matchs
- Forme de l'équipe à domicile sur les 5 derniers matchs à domicile
- Forme de l'équipe à l'exterieur sur les 5 derniers matchs à l'exterieur

- Classement des équipes en points, rebonds, passe, steals, block


# Import

In [64]:
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

In [None]:
# Vérifiez si CUDA est disponible
cuda_available = torch.cuda.is_available()
print(f"CUDA is available: {cuda_available}")

# Lecture CSV

In [None]:
file_1 = "data/2012-18_officialBoxScore.csv"
file_2 = "data/2012-18_playerBoxScore.csv"
file_3 = "data/2012-18_standings.csv"
file_4 = "data/2012-18_teamBoxScore.csv"

In [None]:
df_1 = pd.read_csv(file_1)
df_2 = pd.read_csv(file_2)
df_3 = pd.read_csv(file_3)
df_4 = pd.read_csv(file_4)

In [38]:
print(f"Shape avant changement {df_4.shape}")
colonne_4_a_conserver = ["gmDate", "gmTime", "teamAbbr", "teamConf", "teamDiv", "teamLoc", "teamRslt", "opptAbbr", "opptConf", "opptDiv", "opptLoc"]
df_4_reduit = df_4[colonne_4_a_conserver]

df_4_reduit['team_combined'] = df_4_reduit.apply(lambda row: '-'.join(sorted([row['teamAbbr'], row['opptAbbr']])), axis=1)

# Supprimer les doublons en utilisant les colonnes 'gmDate', 'gmTime' et 'team_combined'
df_4_reduit = df_4_reduit.drop_duplicates(subset=['gmDate', 'gmTime', 'team_combined'])

# On renomme une colonne
df_4_reduit = df_4_reduit.rename(columns={"gmDate": "Date"})

print(f"Shape après changement {df_4_reduit.shape}")

Shape avant changement (14758, 123)
Shape après changement (7379, 12)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_4_reduit['team_combined'] = df_4_reduit.apply(lambda row: '-'.join(sorted([row['teamAbbr'], row['opptAbbr']])), axis=1)


In [39]:
df_4_reduit.head()

Unnamed: 0,Date,gmTime,teamAbbr,teamConf,teamDiv,teamLoc,teamRslt,opptAbbr,opptConf,opptDiv,opptLoc,team_combined
0,2012-10-30,19:00,WAS,East,Southeast,Away,Loss,CLE,East,Central,Home,CLE-WAS
2,2012-10-30,20:00,BOS,East,Atlantic,Away,Loss,MIA,East,Southeast,Home,BOS-MIA
4,2012-10-30,22:30,DAL,West,Southwest,Away,Win,LAL,West,Pacific,Home,DAL-LAL
6,2012-10-31,19:00,DEN,West,Northwest,Away,Loss,PHI,East,Atlantic,Home,DEN-PHI
8,2012-10-31,19:00,IND,East,Central,Away,Win,TOR,East,Atlantic,Home,IND-TOR


In [42]:
print(f"Shape avant changement {df_3.shape}")
colonne_3_a_conserver = ["stDate", "teamAbbr", "rank", "gameWon", "gameLost", "stkType", "stkTot", "homeWin", "homeLoss", "awayWin", "awayLoss", "lastFive", "lastTen"]
df_3_reduit = df_3[colonne_3_a_conserver]

print(df_3_reduit["stkType"].unique())

# Créer les nouvelles colonnes 'winStk' et 'lossStk'
df_3_reduit['winStk'] = df_3_reduit['stkType'] == 'win'
df_3_reduit['lossStk'] = df_3_reduit['stkType'] == 'loss'

# On renomme une colonne
df_3_reduit = df_3_reduit.rename(columns={"stDate": "Date", "teamAbbr": "Abbr"})

print(f"Shape après changement {df_3_reduit.shape}")

Shape avant changement (29520, 39)
['-' 'loss' 'win']
Shape après changement (29520, 15)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_3_reduit['winStk'] = df_3_reduit['stkType'] == 'win'
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_3_reduit['lossStk'] = df_3_reduit['stkType'] == 'loss'


In [43]:
df_3_reduit.head()

Unnamed: 0,Date,Abbr,rank,gameWon,gameLost,stkType,stkTot,homeWin,homeLoss,awayWin,awayLoss,lastFive,lastTen,winStk,lossStk
0,2012-10-30,ATL,3,0,0,-,0,0,0,0,0,0,0,False,False
1,2012-10-30,BKN,3,0,0,-,0,0,0,0,0,0,0,False,False
2,2012-10-30,BOS,14,0,1,loss,1,0,0,0,1,0,0,False,True
3,2012-10-30,CHA,3,0,0,-,0,0,0,0,0,0,0,False,False
4,2012-10-30,CHI,3,0,0,-,0,0,0,0,0,0,0,False,False


In [46]:
# Définir le suffixe
prefix = 'team'

df_3_team = df_3_reduit.copy()

# Renommer les colonnes en ajoutant le suffixe
df_3_team.columns = [prefix + col for col in df_3_reduit.columns]

# On renomme une colonne
df_3_team = df_3_team.rename(columns={"teamDate": "Date"})

df_3_team.head()

Unnamed: 0,Date,teamAbbr,teamrank,teamgameWon,teamgameLost,teamstkType,teamstkTot,teamhomeWin,teamhomeLoss,teamawayWin,teamawayLoss,teamlastFive,teamlastTen,teamwinStk,teamlossStk
0,2012-10-30,ATL,3,0,0,-,0,0,0,0,0,0,0,False,False
1,2012-10-30,BKN,3,0,0,-,0,0,0,0,0,0,0,False,False
2,2012-10-30,BOS,14,0,1,loss,1,0,0,0,1,0,0,False,True
3,2012-10-30,CHA,3,0,0,-,0,0,0,0,0,0,0,False,False
4,2012-10-30,CHI,3,0,0,-,0,0,0,0,0,0,0,False,False


In [47]:
new_df = pd.merge(df_4_reduit, df_3_team, on=['Date', 'teamAbbr'], how='left')

new_df.head()

Unnamed: 0,Date,gmTime,teamAbbr,teamConf,teamDiv,teamLoc,teamRslt,opptAbbr,opptConf,opptDiv,...,teamstkType,teamstkTot,teamhomeWin,teamhomeLoss,teamawayWin,teamawayLoss,teamlastFive,teamlastTen,teamwinStk,teamlossStk
0,2012-10-30,19:00,WAS,East,Southeast,Away,Loss,CLE,East,Central,...,loss,1.0,0.0,0.0,0.0,1.0,0.0,0.0,False,True
1,2012-10-30,20:00,BOS,East,Atlantic,Away,Loss,MIA,East,Southeast,...,loss,1.0,0.0,0.0,0.0,1.0,0.0,0.0,False,True
2,2012-10-30,22:30,DAL,West,Southwest,Away,Win,LAL,West,Pacific,...,win,1.0,0.0,0.0,1.0,0.0,1.0,1.0,True,False
3,2012-10-31,19:00,DEN,West,Northwest,Away,Loss,PHI,East,Atlantic,...,loss,1.0,0.0,0.0,0.0,1.0,0.0,0.0,False,True
4,2012-10-31,19:00,IND,East,Central,Away,Win,TOR,East,Atlantic,...,win,1.0,0.0,0.0,1.0,0.0,1.0,1.0,True,False


In [49]:
# Définir le suffixe
prefix = 'oppt'

df_3_oppt = df_3_reduit.copy()

# Renommer les colonnes en ajoutant le suffixe
df_3_oppt.columns = [prefix + col for col in df_3_reduit.columns]

# On renomme une colonne
df_3_oppt = df_3_oppt.rename(columns={"opptDate": "Date"})

df_3_oppt.head()

Unnamed: 0,Date,opptAbbr,opptrank,opptgameWon,opptgameLost,opptstkType,opptstkTot,oppthomeWin,oppthomeLoss,opptawayWin,opptawayLoss,opptlastFive,opptlastTen,opptwinStk,opptlossStk
0,2012-10-30,ATL,3,0,0,-,0,0,0,0,0,0,0,False,False
1,2012-10-30,BKN,3,0,0,-,0,0,0,0,0,0,0,False,False
2,2012-10-30,BOS,14,0,1,loss,1,0,0,0,1,0,0,False,True
3,2012-10-30,CHA,3,0,0,-,0,0,0,0,0,0,0,False,False
4,2012-10-30,CHI,3,0,0,-,0,0,0,0,0,0,0,False,False


In [50]:
new_df = pd.merge(new_df, df_3_oppt, on=['Date', 'opptAbbr'], how='left')

new_df.head()

Unnamed: 0,Date,gmTime,teamAbbr,teamConf,teamDiv,teamLoc,teamRslt,opptAbbr,opptConf,opptDiv,...,opptstkType,opptstkTot,oppthomeWin,oppthomeLoss,opptawayWin,opptawayLoss,opptlastFive,opptlastTen,opptwinStk,opptlossStk
0,2012-10-30,19:00,WAS,East,Southeast,Away,Loss,CLE,East,Central,...,win,1.0,1.0,0.0,0.0,0.0,1.0,1.0,True,False
1,2012-10-30,20:00,BOS,East,Atlantic,Away,Loss,MIA,East,Southeast,...,win,1.0,1.0,0.0,0.0,0.0,1.0,1.0,True,False
2,2012-10-30,22:30,DAL,West,Southwest,Away,Win,LAL,West,Pacific,...,loss,1.0,0.0,1.0,0.0,0.0,0.0,0.0,False,True
3,2012-10-31,19:00,DEN,West,Northwest,Away,Loss,PHI,East,Atlantic,...,win,1.0,1.0,0.0,0.0,0.0,1.0,1.0,True,False
4,2012-10-31,19:00,IND,East,Central,Away,Win,TOR,East,Atlantic,...,loss,1.0,0.0,1.0,0.0,0.0,0.0,0.0,False,True


In [51]:
new_df.columns

Index(['Date', 'gmTime', 'teamAbbr', 'teamConf', 'teamDiv', 'teamLoc',
       'teamRslt', 'opptAbbr', 'opptConf', 'opptDiv', 'opptLoc',
       'team_combined', 'teamrank', 'teamgameWon', 'teamgameLost',
       'teamstkType', 'teamstkTot', 'teamhomeWin', 'teamhomeLoss',
       'teamawayWin', 'teamawayLoss', 'teamlastFive', 'teamlastTen',
       'teamwinStk', 'teamlossStk', 'opptrank', 'opptgameWon', 'opptgameLost',
       'opptstkType', 'opptstkTot', 'oppthomeWin', 'oppthomeLoss',
       'opptawayWin', 'opptawayLoss', 'opptlastFive', 'opptlastTen',
       'opptwinStk', 'opptlossStk'],
      dtype='object')

In [79]:
colonne_a_conserver = ['teamRslt', 'teamrank', 'teamgameWon', 'teamgameLost',
       'teamstkTot', 'teamhomeWin', 'teamhomeLoss',
       'teamawayWin', 'teamawayLoss', 'teamlastFive', 'teamlastTen',
       'teamwinStk', 'teamlossStk', 'opptrank', 'opptgameWon', 'opptgameLost', 
       'opptstkTot', 'oppthomeWin', 'oppthomeLoss',
       'opptawayWin', 'opptawayLoss', 'opptlastFive', 'opptlastTen']
final_df = new_df[colonne_a_conserver]

final_df.head()

Unnamed: 0,teamRslt,teamrank,teamgameWon,teamgameLost,teamstkTot,teamhomeWin,teamhomeLoss,teamawayWin,teamawayLoss,teamlastFive,...,opptrank,opptgameWon,opptgameLost,opptstkTot,oppthomeWin,oppthomeLoss,opptawayWin,opptawayLoss,opptlastFive,opptlastTen
0,Loss,14.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,...,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0
1,Loss,14.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,...,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0
2,Win,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,...,15.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
3,Loss,10.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,...,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0
4,Win,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,...,12.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0


In [80]:
final_df['result'] = final_df['teamRslt'] == "Win"

final_df = final_df.drop(columns=['teamRslt'])

final_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_df['result'] = final_df['teamRslt'] == "Win"


Unnamed: 0,teamrank,teamgameWon,teamgameLost,teamstkTot,teamhomeWin,teamhomeLoss,teamawayWin,teamawayLoss,teamlastFive,teamlastTen,...,opptgameWon,opptgameLost,opptstkTot,oppthomeWin,oppthomeLoss,opptawayWin,opptawayLoss,opptlastFive,opptlastTen,result
0,14.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,False
1,14.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,False
2,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,...,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,True
3,10.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,False
4,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,...,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,True


In [82]:
target = 'result'

# Séparer les caractéristiques et la cible
X = final_df.drop(columns=[target])
y = final_df[target]

# Diviser les données en ensembles d'entraînement et de test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [83]:
# Initialiser le modèle Random Forest
model = RandomForestClassifier(n_estimators=100)

# Entraîner le modèle
model.fit(X_train, y_train)

In [84]:
# Prédictions sur l'ensemble de test
y_pred = model.predict(X_test)

# Évaluer les performances
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print('Classification Report:')
print(report)

Accuracy: 0.9959349593495935
Classification Report:
              precision    recall  f1-score   support

       False       0.99      1.00      1.00       865
        True       1.00      0.99      1.00       611

    accuracy                           1.00      1476
   macro avg       1.00      1.00      1.00      1476
weighted avg       1.00      1.00      1.00      1476

