# NBA Season

### Data Initialization

We are pulling the NBA season stats throughout the years from the URL of https://www.kaggle.com/datasets/justinas/nba-players-data/data

We then are using a set of team names assigned to the abbreviations to bring in win rates from the URL of https://www.teamrankings.com/nba/stat/win-pct-all-games later

In [601]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from itertools import combinations
%matplotlib inline

**Get the kaggle dataset**

In [602]:
# read a csv file into a df
playerData = pd.read_csv('nba.csv')

teamNames = pd.read_csv('unique_teams.csv')

playerData.head()

Unnamed: 0.1,Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,...,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season
0,0,Randy Livingston,HOU,22.0,193.04,94.800728,Louisiana State,USA,1996,2,...,3.9,1.5,2.4,0.3,0.042,0.071,0.169,0.487,0.248,1996-97
1,1,Gaylon Nickerson,WAS,28.0,190.5,86.18248,Northwestern Oklahoma,USA,1994,2,...,3.8,1.3,0.3,8.9,0.03,0.111,0.174,0.497,0.043,1996-97
2,2,George Lynch,VAN,26.0,203.2,103.418976,North Carolina,USA,1993,1,...,8.3,6.4,1.9,-8.2,0.106,0.185,0.175,0.512,0.125,1996-97
3,3,George McCloud,LAL,30.0,203.2,102.0582,Florida State,USA,1989,1,...,10.2,2.8,1.7,-2.7,0.027,0.111,0.206,0.527,0.125,1996-97
4,4,George Zidek,DEN,23.0,213.36,119.748288,UCLA,USA,1995,1,...,2.8,1.7,0.3,-14.1,0.102,0.169,0.195,0.5,0.064,1996-97


#### **Put abbreviations to City Names**

This is where we replace the abbreviations with the city names from the excel sheet so we can merge with a win rate dataset later

In [603]:
# merge the two dataframes on team_abbreviation with df and dfTeams on abbreviations
playerData = pd.merge(playerData, teamNames, left_on='team_abbreviation', right_on='abbreviations')

# drop the team_abbreviation column and abbreviations column
playerData = playerData.drop(columns=['team_abbreviation', 'abbreviations'])

In [604]:
playerData.head()

Unnamed: 0.1,Unnamed: 0,player_name,age,player_height,player_weight,college,country,draft_year,draft_round,draft_number,...,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season,team
0,0,Randy Livingston,22.0,193.04,94.800728,Louisiana State,USA,1996,2,42,...,1.5,2.4,0.3,0.042,0.071,0.169,0.487,0.248,1996-97,Houston
1,1,Gaylon Nickerson,28.0,190.5,86.18248,Northwestern Oklahoma,USA,1994,2,34,...,1.3,0.3,8.9,0.03,0.111,0.174,0.497,0.043,1996-97,Washington
2,3,George McCloud,30.0,203.2,102.0582,Florida State,USA,1989,1,7,...,2.8,1.7,-2.7,0.027,0.111,0.206,0.527,0.125,1996-97,LA Lakers
3,4,George Zidek,23.0,213.36,119.748288,UCLA,USA,1995,1,22,...,1.7,0.3,-14.1,0.102,0.169,0.195,0.5,0.064,1996-97,Denver
4,5,Gerald Wilkins,33.0,198.12,102.0582,Tennessee-Chattanooga,USA,1985,2,47,...,2.2,2.2,-5.8,0.031,0.064,0.203,0.503,0.143,1996-97,Orlando


#### **Inspect Data Types to Later deal with Categorical Types**

In [605]:
# drop the unnamed column
playerData.drop('Unnamed: 0', axis=1, inplace=True)

playerData.dtypes

player_name       object
age              float64
player_height    float64
player_weight    float64
college           object
country           object
draft_year        object
draft_round       object
draft_number      object
gp                 int64
pts              float64
reb              float64
ast              float64
net_rating       float64
oreb_pct         float64
dreb_pct         float64
usg_pct          float64
ts_pct           float64
ast_pct          float64
season            object
team              object
dtype: object

#### **Look for Null Values**

only null is college but we will not consider that in our dataset

In [606]:
# look for null values
playerData.isnull().sum()

player_name         0
age                 0
player_height       0
player_weight       0
college          1852
country             0
draft_year          0
draft_round         0
draft_number        0
gp                  0
pts                 0
reb                 0
ast                 0
net_rating          0
oreb_pct            0
dreb_pct            0
usg_pct             0
ts_pct              0
ast_pct             0
season              0
team                0
dtype: int64

## **Data Preprocessing**

We need to deal with the columns we want to keep and also all the categorial data cols of:

player_name           object

team_abbreviation     object

college               object

country               object

draft_year            object

draft_round           object

draft_number          object

season                object

#### **Drop Some of Them**

In [607]:
playerData.drop(['college', 'draft_year', 'draft_round', 'draft_number', 'country'], axis=1, inplace=True)

In [608]:
playerData.head()

Unnamed: 0,player_name,age,player_height,player_weight,gp,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season,team
0,Randy Livingston,22.0,193.04,94.800728,64,3.9,1.5,2.4,0.3,0.042,0.071,0.169,0.487,0.248,1996-97,Houston
1,Gaylon Nickerson,28.0,190.5,86.18248,4,3.8,1.3,0.3,8.9,0.03,0.111,0.174,0.497,0.043,1996-97,Washington
2,George McCloud,30.0,203.2,102.0582,64,10.2,2.8,1.7,-2.7,0.027,0.111,0.206,0.527,0.125,1996-97,LA Lakers
3,George Zidek,23.0,213.36,119.748288,52,2.8,1.7,0.3,-14.1,0.102,0.169,0.195,0.5,0.064,1996-97,Denver
4,Gerald Wilkins,33.0,198.12,102.0582,80,10.6,2.2,2.2,-5.8,0.031,0.064,0.203,0.503,0.143,1996-97,Orlando


#### **Encode the seasons to int values**

In [609]:
# categorical code season col but i want to keep the original
playerData['season'] = pd.Categorical(playerData['season']).codes + 1997

# drop the rows where seasonEncoded is less than 7. This keeps the season of 03-04 and later
playerData = playerData[playerData['season'] >= 2004]

playerData.reset_index(drop=True, inplace=True)

#### **Add in Win Rates**

In [610]:
def winRateFromYear(year):

    winRateDf = pd.read_html(f'https://www.teamrankings.com/nba/stat/win-pct-all-games?date={year}-06-16')[0]

    winRateDf['Win PCT']= winRateDf[f'{year - 1}'] 

    winRateDf['season'] = year

    winRateDf = winRateDf[['Team', 'Win PCT', 'season']]

    return winRateDf

**Merge DF2 with df on Team Names**

In [611]:
def getWinRates():
    # merge the two dataframes on team with df and teams on team where season is 2004
    winRateDf = pd.DataFrame()

    for year in range(2004, 2024):
        winRateDf = pd.concat([winRateDf, winRateFromYear(year)], ignore_index=True)

    winRateDf.tail()

    return winRateDf

# If you don't want to run the web scraping code, set run to False
run = False

if run:
    winRateDf = getWinRates()
    winRateDf.to_csv('winRate.csv', index=False)
else:
    winRateDf = pd.read_csv('winRate.csv')


In [612]:
winRateDf.head()

Unnamed: 0,Team,Win PCT,season
0,Indiana,0.725,2004
1,San Antonio,0.685,2004
2,Minnesota,0.68,2004
3,Detroit,0.667,2004
4,LA Lakers,0.664,2004


#### **Merge player data frame with the win rates**

In [613]:
updatedPlayerData = pd.merge(playerData, winRateDf, left_on=['team', 'season'], right_on=['Team', 'season'])

updatedPlayerData.drop(['Team'], axis=1, inplace=True)

updatedPlayerData.head()

Unnamed: 0,player_name,age,player_height,player_weight,gp,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season,team,Win PCT
0,Brevin Knight,28.0,177.8,77.11064,56,4.7,2.0,3.6,-4.4,0.016,0.115,0.156,0.475,0.343,2004,Milwaukee,0.483
1,Jumaine Jones,25.0,203.2,98.883056,42,2.2,1.6,0.3,-5.0,0.071,0.133,0.15,0.438,0.065,2004,Boston,0.419
2,Zydrunas Ilgauskas,29.0,220.98,117.93392,81,15.3,8.1,1.3,-3.7,0.122,0.163,0.229,0.541,0.074,2004,Cleveland,0.427
3,Chris Whitney,32.0,182.88,79.3786,16,2.9,0.9,0.9,-16.0,0.016,0.077,0.128,0.498,0.146,2004,Washington,0.305
4,Chris Webber,31.0,208.28,111.13004,23,18.7,8.7,4.6,-0.7,0.065,0.198,0.289,0.456,0.227,2004,Sacramento,0.66


#### **Feature Engineer Average Minutes Played**
This is so we can sort by the starters of who played the most minutes in that season

In [614]:
updatedPlayerData['AVG Minutes Played'] = updatedPlayerData['gp'] * updatedPlayerData['usg_pct']

In [615]:
# Filter the df to only include the max 5 of games played from each seasonEncoded and Team
updatedPlayerData = updatedPlayerData.groupby(['team', 'season']).apply(lambda x: x.nlargest(5, 'AVG Minutes Played')).reset_index(drop=True)

updatedPlayerData.head()

  updatedPlayerData = updatedPlayerData.groupby(['team', 'season']).apply(lambda x: x.nlargest(5, 'AVG Minutes Played')).reset_index(drop=True)


Unnamed: 0,player_name,age,player_height,player_weight,gp,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season,team,Win PCT,AVG Minutes Played
0,Stephen Jackson,26.0,203.2,98.883056,80,18.1,4.6,3.1,-3.8,0.038,0.102,0.249,0.521,0.155,2004,Atlanta,0.342,19.92
1,Jason Terry,26.0,187.96,81.64656,81,16.8,4.1,5.4,-4.0,0.019,0.106,0.231,0.519,0.261,2004,Atlanta,0.342,18.711
2,Bob Sura,31.0,195.58,90.7184,80,7.5,4.1,2.9,0.8,0.071,0.158,0.191,0.51,0.24,2004,Atlanta,0.342,15.28
3,Chris Crawford,29.0,205.74,106.59412,56,10.2,3.1,0.8,-5.2,0.054,0.117,0.214,0.544,0.068,2004,Atlanta,0.342,11.984
4,Jacque Vaughn,29.0,185.42,86.18248,71,3.8,1.6,2.7,-9.8,0.011,0.094,0.143,0.44,0.256,2004,Atlanta,0.342,10.153


#### **Convert Player Stats into Starters' Team Stats**
This is so we can train our ANN to find the optimal five starters

In [616]:
teamData = updatedPlayerData.groupby(['season', 'team']).agg(
    ptsTotal=('pts', 'sum'),
    rebTotal=('reb', 'sum'),
    astTotal=('ast', 'sum'),
    MinutesPlayed=('AVG Minutes Played', 'sum'),
    averageAge=('age', 'mean'),
    averageHeight=('player_height', 'mean'),
    averageWeight=('player_weight', 'mean'),
    winRate=('Win PCT', 'first')
).reset_index()

teamData

Unnamed: 0,season,team,ptsTotal,rebTotal,astTotal,MinutesPlayed,averageAge,averageHeight,averageWeight,winRate
0,2004,Atlanta,56.4,17.5,14.9,76.048,28.2,195.580,92.804923,0.342
1,2004,Boston,58.8,21.7,13.2,87.015,24.0,200.152,99.608803,0.419
2,2004,Brooklyn,71.6,30.0,20.0,80.431,28.8,199.136,100.062395,0.581
3,2004,Chicago,62.5,24.9,16.1,77.090,27.8,199.644,100.788142,0.281
4,2004,Cleveland,67.7,31.5,11.6,82.232,26.0,207.772,110.313574,0.427
...,...,...,...,...,...,...,...,...,...,...
554,2023,Sacramento,87.8,26.9,21.8,85.421,26.4,200.152,95.072883,0.573
555,2023,San Antonio,64.1,19.4,17.0,67.671,23.8,194.056,93.439952,0.268
556,2023,Toronto,85.6,26.6,19.8,77.189,26.6,198.120,96.252222,0.494
557,2023,Utah,83.9,24.2,16.7,73.005,27.0,200.152,99.608803,0.451


#### **Winrate is our actual values of what we are trying to predict**

In [617]:
winRate = teamData['winRate']

#### **Drop categorical and winRate**
Categorical can't go into model and winRate is our target

In [618]:
teamData.drop(['season','team','winRate'], axis=1, inplace=True)

teamData.head()

Unnamed: 0,ptsTotal,rebTotal,astTotal,MinutesPlayed,averageAge,averageHeight,averageWeight
0,56.4,17.5,14.9,76.048,28.2,195.58,92.804923
1,58.8,21.7,13.2,87.015,24.0,200.152,99.608803
2,71.6,30.0,20.0,80.431,28.8,199.136,100.062395
3,62.5,24.9,16.1,77.09,27.8,199.644,100.788142
4,67.7,31.5,11.6,82.232,26.0,207.772,110.313574


In [619]:
winRate.head()

0    0.342
1    0.419
2    0.581
3    0.281
4    0.427
Name: winRate, dtype: float64

## **Train the ANN Model**

#### **80% 20% Split**

In [620]:
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(teamData, winRate, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test_Scaled = scaler.transform(X_test)

#### **ANN with three layers**
Tried with two and received better results after adding another layer

Added a fourth and performed worse

In [621]:
model = keras.Sequential([
    keras.layers.Input(shape=(X_train.shape[1],)),  # Input layer
    keras.layers.Dense(64, activation='relu'),       # Hidden layer
    keras.layers.Dense(32, activation='relu'),       # Hidden layer
    keras.layers.Dense(16, activation='relu'),       # Hidden layer
    keras.layers.Dense(1)                             # Output layer (for regression)
])

In [622]:
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])

## **Train Model**

#### **Epochs**
To avoid overfitting we added Early stopping to ensure that if the validation loss did not get better after 10 epochs then it would stop

In [623]:
from tensorflow.keras.callbacks import EarlyStopping

# Define the EarlyStopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

model.fit(X_train, y_train, epochs=100, batch_size=5, validation_split=0.2, verbose=1, callbacks=[early_stopping])

Epoch 1/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 0.1121 - mae: 0.2772 - val_loss: 0.0254 - val_mae: 0.1295
Epoch 2/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0256 - mae: 0.1218 - val_loss: 0.0210 - val_mae: 0.1150
Epoch 3/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0160 - mae: 0.0983 - val_loss: 0.0202 - val_mae: 0.1132
Epoch 4/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0135 - mae: 0.0921 - val_loss: 0.0184 - val_mae: 0.1061
Epoch 5/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0120 - mae: 0.0878 - val_loss: 0.0181 - val_mae: 0.1054
Epoch 6/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0.0103 - mae: 0.0826 - val_loss: 0.0188 - val_mae: 0.1052
Epoch 7/100
[1m72/72[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 0

<keras.src.callbacks.history.History at 0x296fa8131c0>

#### **Test Model**
We added a portion to see what the accuracy was for predicting within 5 games and 10 games for the win rate

It is done by dividing 5 and 10 into 82 as that is the total amount of games each season

In [624]:
# Example input for a new set of NBA players

predicted_win_rates = model.predict(X_test_Scaled)
actual_win_rates = y_test.values

# Values to keep track of how many predictions are within 5 and 10 games of the actual win rate
total = 0
within5 = 0
within10 = 0

# Display predictions alongside actual values
for pred, actual in zip(predicted_win_rates.flatten(), actual_win_rates):
    #print(f'Predicted: {pred:.3f}, Actual: {actual:.3f}')
    total += 1

    if abs(pred - actual) <= 5/82:
        within5 += 1
        within10 += 1
    elif abs(pred - actual) <= 10/82:
        within10 += 1

# print total, within5, within10
print(f'Total: {total}')
print(f'Within 5 Games: {within5/total * 100}%')
print(f'Within 10 Games: {within10/total * 100}%')

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step
Total: 112
Within 5 Games: 34.82142857142857%
Within 10 Games: 69.64285714285714%


In [625]:
# Calculate regression metrics
mae = mean_absolute_error(actual_win_rates, predicted_win_rates)
mse = mean_squared_error(actual_win_rates, predicted_win_rates)
r2 = r2_score(actual_win_rates, predicted_win_rates)

print(f'Mean Absolute Error: {mae:.2f}')
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')

Mean Absolute Error: 0.10
Mean Squared Error: 0.01
R-squared: 0.38


#### **Finding the most optimal team**
In this we are finding what the most optimal players would be on a team 

#### **Define weights of what we expect from a player**
30% weight on points, 30% on assists, and %40 on rebounds

In [626]:
# Define weights for each statistic
weights = {
    'pts': 0.3,  # Weight for points
    'ast': 0.3,  # Weight for assists
    'reb': 0.4   # Weight for rebounds
}

# Calculate combined score
updatedPlayerData['combined_score'] = (updatedPlayerData['pts'] * weights['pts'] +
                        updatedPlayerData['ast'] * weights['ast'] +
                        updatedPlayerData['reb'] * weights['reb'])

In [627]:
top_players = updatedPlayerData.sort_values(by='combined_score', ascending=False)

# only keep the first apperaance of each player
top_players = top_players.drop_duplicates(subset='player_name').head(10)

In [628]:
top_players

Unnamed: 0,player_name,age,player_height,player_weight,gp,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season,team,Win PCT,AVG Minutes Played,combined_score
785,Nikola Jokic,27.0,210.82,128.820128,74,27.1,13.8,7.9,8.4,0.09,0.313,0.309,0.661,0.388,2022,Denver,0.563,22.866,16.02
1575,Giannis Antetokounmpo,25.0,210.82,109.769264,63,29.5,13.6,5.6,15.4,0.068,0.307,0.363,0.613,0.328,2020,Milwaukee,0.815,22.869,15.97
1070,James Harden,29.0,195.58,99.79024,78,36.1,6.6,7.5,6.3,0.023,0.157,0.396,0.616,0.394,2019,Houston,0.634,30.888,15.72
690,Luka Doncic,24.0,200.66,104.32616,66,32.4,8.6,8.0,2.1,0.024,0.224,0.368,0.609,0.408,2023,Dallas,0.463,24.288,15.56
2090,Joel Embiid,29.0,213.36,127.00576,66,33.1,10.2,4.2,8.8,0.057,0.243,0.37,0.655,0.233,2023,Philadelphia,0.656,24.42,15.27
2781,Russell Westbrook,32.0,190.5,90.7184,65,22.2,11.5,11.7,-1.2,0.043,0.249,0.295,0.509,0.477,2021,Washington,0.456,19.175,14.77
565,LeBron James,33.0,203.2,113.398,82,27.5,8.6,9.1,1.6,0.033,0.201,0.31,0.621,0.432,2018,Cleveland,0.596,25.42,14.42
1767,DeMarcus Cousins,27.0,210.82,122.46984,48,25.2,12.9,5.4,1.9,0.065,0.292,0.318,0.583,0.241,2018,New Orleans,0.582,15.264,14.34
1595,Kevin Garnett,28.0,210.82,108.86208,82,24.2,13.9,5.0,10.4,0.092,0.298,0.294,0.547,0.233,2004,Minnesota,0.68,24.108,14.32
1645,Kevin Love,25.0,208.28,110.222856,77,26.1,12.5,4.4,4.4,0.086,0.298,0.284,0.591,0.205,2014,Minnesota,0.488,21.868,14.15


In [629]:
# Get all combinations of 5 players
combinations_of_5 = list(combinations(top_players.index, 5))

In [630]:
print(len(combinations_of_5))

252


In [631]:
createdTeams = []

for combo in combinations_of_5:
    team = pd.DataFrame()
    
    for player in combo:
        # print(player)
        # print(top_players.loc[player].to_frame().T)
        team = pd.concat([team, top_players.loc[player].to_frame().T])

    team['team'] = 1

    print(team)
    createdTeams.append(team.groupby(['team']).agg(
        ptsTotal=('pts', 'sum'),
        rebTotal=('reb', 'sum'),
        astTotal=('ast', 'sum'),
        MinutesPlayed=('AVG Minutes Played', 'mean'),
        averageAge=('age', 'mean'),
        averageHeight=('player_height', 'mean'),
        averageWeight=('player_weight', 'mean'),
    ).reset_index())

                player_name   age player_height player_weight  gp   pts   reb  \
785            Nikola Jokic  27.0        210.82    128.820128  74  27.1  13.8   
1575  Giannis Antetokounmpo  25.0        210.82    109.769264  63  29.5  13.6   
1070           James Harden  29.0        195.58      99.79024  78  36.1   6.6   
690             Luka Doncic  24.0        200.66     104.32616  66  32.4   8.6   
2090            Joel Embiid  29.0        213.36     127.00576  66  33.1  10.2   

      ast net_rating oreb_pct dreb_pct usg_pct ts_pct ast_pct season  team  \
785   7.9        8.4     0.09    0.313   0.309  0.661   0.388   2022     1   
1575  5.6       15.4    0.068    0.307   0.363  0.613   0.328   2020     1   
1070  7.5        6.3    0.023    0.157   0.396  0.616   0.394   2019     1   
690   8.0        2.1    0.024    0.224   0.368  0.609   0.408   2023     1   
2090  4.2        8.8    0.057    0.243    0.37  0.655   0.233   2023     1   

     Win PCT AVG Minutes Played combined_sco

In [632]:
for i in range(len(createdTeams)):
    createdTeams[i].drop(['team'], axis=1, inplace=True)

In [633]:
teamWinPCTs = []
for team in createdTeams:
    teamWinPCTs.append(model.predict(scaler.transform(team)))

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23

In [634]:
final_teamWinPCTs = [pct_array[0,0] for pct_array in teamWinPCTs]

In [635]:
maxPercentage = max(final_teamWinPCTs)
maxIndex = final_teamWinPCTs.index(maxPercentage)

print(f"The maximum percentage is: {maxPercentage} at index: {maxIndex}")

The maximum percentage is: 1.5867466926574707 at index: 102


This is the best team. They are projected to win 109.3% of their games. How, I have no idea! They put up 162.5 pts/g, 41.6 reb/g and 38.4 ast/g.

In [636]:
createdTeams[maxIndex]

Unnamed: 0,ptsTotal,rebTotal,astTotal,MinutesPlayed,averageAge,averageHeight,averageWeight
0,133.4,56.4,41.7,23.1714,28.8,203.2,109.224954


These are the indexes of the players in the **top_players** dataframe. You can find the stats of those players by doing the **top_players.loc[ *index_num* ]** seen below. I believe that player is James Harden, but we didn't save the names.

In [637]:
combinations_of_5[maxIndex]

(785, 690, 2781, 565, 1595)

In [642]:
# get the players names in top players at indecies of combinations of 5 at the max index
for player in combinations_of_5[maxIndex]:
    print(f"{top_players.loc[player, 'player_name']} Season: {top_players.loc[player, 'season']}")

Nikola Jokic Season: 2022
Luka Doncic Season: 2023
Russell Westbrook Season: 2021
LeBron James Season: 2018
Kevin Garnett Season: 2004
