# NBA Awards Predictor - Part 4

The following notebook is part 4 of the NBA awards predictor. This notebook includes **model deployment**, in which I'll be utilizing the models constructed in part 3 of this project on the 2022-2023 season. This notebook will involve *data manipulation* to construct a 2022-2023 NBA season dataset. This dataset will be made using projection formulas and polynomial regression. This dataset will then be used to predict *4* of the NBA's main player awards.

Below is a more detailed table of contents:

**Table of Contents**
1. Preparing Datasets 
    - MVP
    - DPOY
    - SMOY
    - MIP
2. Predicting the Awards 

NOTE: Rookie of the Year will be not predicted because I can't project player statistics for rookies without any historical data. I also cannot webscrape NCAA data, so I cannot use college data to project NBA statistics.

# Importing Necessary Libraries and Datasets

In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import joblib

Projected 2023 data: (Elaborated below)

In [95]:
#data downloaded from https://www.basketball-reference.com/friv/projections.cgi
data = pd.read_csv("../data/projected_data.csv")
data.columns = data.iloc[0]
data.drop(0, inplace = True)
data.drop("-9999", axis = 1, inplace = True)

In [52]:
#awards dataframes
mvp = pd.read_csv("../data/awards_dfs/MVP.csv")
dpoy = pd.read_csv("../data/awards_dfs/DPOY.csv")
smoy = pd.read_csv("../data/awards_dfs/SMOY.csv")
mip = pd.read_csv("../data/awards_dfs/MIP.csv")

The following are the models saved from part 3:

In [328]:
mvp_model = joblib.load("../models/mvp_model")
dpoy_model = joblib.load("../models/dpoy_model")
smoy_model = joblib.load("../models/smoy_model")
mip_model = joblib.load("../models/mip_model")

# 1. Preparing Datasets

In order to get NBA 2022-2023 season player data to put into the model, I'll be using the **Simple Projection System** developed by Basketball Reference to predict player data for future seasons. 

More details can be found here: https://www.basketball-reference.com/about/projections.html 

This projection system predicts *per 36 min.* player data, which is why I trained all the models using per 36 min. player data. However, I also utilized advanced statistics. For these, I'll create a basic 1-dimensional polynomial regression model to project advanced statistics for each player.

First, I'll reduce the projected dataset to only players that I have previous data for:

In [96]:
data = data[[player in mvp["Player"].unique() for player in data["Player"]]].reset_index(drop = True).copy()

In [97]:
len(data)

446

## MVP

These are the features I need to retrieve:

In [34]:
mvp_features = np.array(mvp_model.get_booster().feature_names)
mvp_features

array(['Age', 'GS', 'FG', '3P', '3P%', 'FTA', 'FT%', 'ORB', 'AST', 'STL',
       'BLK', 'TOV', 'PTS', 'PER', 'OWS', 'BPM', 'VORP', 'WL', 'SS',
       'MPG', 'C', 'PF', 'PG', 'SF'], dtype='<U4')

In [45]:
print("Satisfied Columns: ", mvp_features[[item in data.columns for item in mvp_features]])
print("Unsatified Columns: ", mvp_features[[item not in data.columns for item in mvp_features]])

Satisfied Columns:  ['FG' '3P' '3P%' 'FTA' 'FT%' 'ORB' 'AST' 'STL' 'BLK' 'TOV' 'PTS' 'PF']
Unsatified Columns:  ['Age' 'GS' 'PER' 'OWS' 'BPM' 'VORP' 'WL' 'SS' 'MPG' 'C' 'PG' 'SF']


`Age`, `SS`, `C`, `PG`, `SF` can be derived from the existing data. I'll create these values: 

In [98]:
def find_age(player): #finds most recent age and year played to find 2023 age
    info = mvp[mvp["Player"] == player][-1:][["Age", "Year"]]
    add = 2023 - info["Year"].values[0]
    return info["Age"].values[0] + add

data["Age"] = data["Player"].apply(find_age)
data.head(5)

Unnamed: 0,Player,FG,FGA,3P,3PA,FT,FTA,ORB,TRB,AST,STL,BLK,TOV,PF,PTS,FG%,3P%,FT%,Age
0,Precious Achiuwa,5.8,12.4,1.1,2.9,1.9,3.2,3.1,9.9,1.8,0.8,0.9,1.8,3.3,14.5,0.467,0.367,0.595,23
1,Steven Adams,4.2,7.5,0.1,0.3,1.9,3.5,5.5,12.7,3.9,1.2,1.0,2.0,2.7,10.4,0.559,0.321,0.536,29
2,Bam Adebayo,7.7,13.7,0.1,0.4,4.8,6.2,2.6,10.6,4.6,1.4,1.0,2.8,2.9,20.3,0.563,0.279,0.765,25
3,LaMarcus Aldridge,8.0,15.6,0.9,2.7,2.8,3.3,2.0,7.9,2.0,0.6,1.5,1.6,2.7,19.7,0.511,0.346,0.85,37
4,Nickeil Alexander-Walker,6.4,16.3,2.6,7.9,2.0,2.6,0.9,5.0,3.9,1.3,0.6,2.3,2.7,17.5,0.396,0.33,0.747,24


In [119]:
data["PTS"] = data["PTS"].apply(float)
sorted_players = data.sort_values("PTS", ascending = False).reset_index().copy() #players sorted by points

def find_ss(player): #finds relative scoring standing
    return sorted_players[sorted_players["Player"] == player].index[0] + 1

data["SS"] = data["Player"].apply(find_ss)
data

Unnamed: 0,Player,FG,FGA,3P,3PA,FT,FTA,ORB,TRB,AST,STL,BLK,TOV,PF,PTS,FG%,3P%,FT%,Age,SS
0,Precious Achiuwa,5.8,12.4,1.1,2.9,1.9,3.2,3.1,9.9,1.8,0.8,0.9,1.8,3.3,14.5,.467,.367,.595,23,266
1,Steven Adams,4.2,7.5,0.1,0.3,1.9,3.5,5.5,12.7,3.9,1.2,1.0,2.0,2.7,10.4,.559,.321,.536,29,427
2,Bam Adebayo,7.7,13.7,0.1,0.4,4.8,6.2,2.6,10.6,4.6,1.4,1.0,2.8,2.9,20.3,.563,.279,.765,25,55
3,LaMarcus Aldridge,8.0,15.6,0.9,2.7,2.8,3.3,2.0,7.9,2.0,0.6,1.5,1.6,2.7,19.7,.511,.346,.850,37,70
4,Nickeil Alexander-Walker,6.4,16.3,2.6,7.9,2.0,2.6,0.9,5.0,3.9,1.3,0.6,2.3,2.7,17.5,.396,.330,.747,24,131
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,Thaddeus Young,6.8,13.1,0.8,2.3,1.3,2.3,3.2,8.5,4.8,1.9,0.8,2.5,3.3,15.6,.516,.336,.576,34,216
442,Trae Young,9.3,20.1,3.0,7.9,7.2,8.0,0.7,4.1,9.9,1.0,0.2,4.1,1.8,28.7,.461,.378,.896,24,4
443,Omer Yurtseven,6.6,12.7,0.4,1.4,2.1,3.2,3.9,13.6,2.8,0.9,1.0,2.0,4.0,15.7,.521,.279,.661,24,206
444,Cody Zeller,6.1,11.3,0.4,1.6,3.2,4.4,4.1,11.2,2.8,1.0,0.6,1.9,4.4,15.8,.537,.249,.731,30,201


In [293]:
def find_pos(player): #finds position by looking at most recent year played
    return mvp[mvp["Player"] == player][-1:]["Pos"].values[0].split("-")[0]

data["Pos"] = data["Player"].apply(find_pos)
data[data["Player"].isin(["Stephen Curry", "Devin Booker", "Jimmy Butler", "LeBron James", "Joel Embiid"])]

Unnamed: 0,Player,FG,FGA,3P,3PA,FT,FTA,ORB,TRB,AST,...,SS,Pos,GS,MPG,PER,OWS,BPM,VORP,Tm,WL
49,Devin Booker,9.9,20.7,2.5,6.7,5.1,5.9,0.7,5.0,5.1,...,8,SG,66.285714,30.804198,21.3,4.3,4.1,3.6,PHO,0.65
73,Jimmy Butler,7.2,15.1,0.6,2.3,7.1,8.3,1.9,6.6,6.2,...,32,SF,28.284849,23.17464,23.6,6.2,6.3,4.0,MIA,0.6
100,Stephen Curry,9.2,20.5,4.7,12.3,4.8,5.3,0.6,5.6,6.2,...,6,PG,56.624528,34.157729,21.4,4.6,5.8,4.4,GSW,0.65
127,Joel Embiid,10.1,20.3,1.4,3.9,9.9,12.0,2.4,12.3,4.1,...,1,C,81.0,35.195186,31.2,7.9,9.2,6.5,PHI,0.65
213,LeBron James,10.3,20.3,2.6,7.3,4.2,5.7,1.0,7.8,6.8,...,9,PF,51.301342,34.895077,26.2,5.2,7.7,5.1,LAL,0.5


Now to address `GS` and `MPG`, I'll be utilizing sklearn's LinearRegression model to create a polynomial regression model that'll test degree 1-3 and select the result that is closest to the previous season.

This system is flawed and will likely have many inaccuracies compared to the 2023 season, especially for rookies as their statistics will be the same. However, I'll still proceed as this is the most efficient way of getting the necessary statistics for the model.

In [502]:
def closest(values, target): #helper function to find closest value 
    diff = np.asarray(values)
    ind = (np.abs(diff - target)).argmin()
    return diff[ind]

def find_statistic(statistic, player):
    plyr = mvp[mvp["Player"] == player]
    model = LinearRegression()
    
    options = []
    prev_season = mvp[mvp["Player"] == player].iloc[-1:][statistic].values[0]
    
    for deg in range(1, 4):
        poly = PolynomialFeatures(degree = deg)
        x_poly = poly.fit_transform(plyr["Year"].values.reshape(-1, 1))
        model.fit(x_poly, plyr[statistic].values.reshape(-1, 1))
        options.append(model.predict(poly.fit_transform(np.array([[2023]])))[0][0])
    return closest(options, prev_season)

In [426]:
for stat in ['GS', 'MPG']:
    data[stat] = data["Player"].apply(lambda x:find_statistic(stat, x))

#Replacing negative values 
GS = data["GS"].copy()
GS[GS < 0] = 0
GS[GS > 82] = 82
data["GS"] = GS

data

Unnamed: 0,Player,FG,FGA,3P,3PA,FT,FTA,ORB,TRB,AST,...,SS,Pos,GS,MPG,PER,OWS,BPM,VORP,Tm,WL
0,Precious Achiuwa,5.8,12.4,1.1,2.9,1.9,3.2,3.1,9.9,1.8,...,266,C,52.000000,35.178307,12.7,0.4,-2.6,-0.2,TOR,0.500
1,Steven Adams,4.2,7.5,0.1,0.3,1.9,3.5,5.5,12.7,3.9,...,427,C,79.805556,24.700626,17.6,3.8,2.0,2.0,MEM,0.700
2,Bam Adebayo,7.7,13.7,0.1,0.4,4.8,6.2,2.6,10.6,4.6,...,55,C,37.800000,29.703250,21.8,3.6,3.8,2.7,MIA,0.600
3,LaMarcus Aldridge,8.0,15.6,0.9,2.7,2.8,3.3,2.0,7.9,2.0,...,70,C,13.005494,19.897691,19.6,2.1,0.7,0.7,BRK,0.625
4,Nickeil Alexander-Walker,6.4,16.3,2.6,7.9,2.0,2.6,0.9,5.0,3.9,...,131,SG,24.996037,28.985918,10.5,-1.1,-2.9,-0.3,TOT,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,Thaddeus Young,6.8,13.1,0.8,2.3,1.3,2.3,3.2,8.5,4.8,...,216,PF,0.000000,15.251350,17.0,0.9,2.2,0.9,TOT,
442,Trae Young,9.3,20.1,3.0,7.9,7.2,8.0,0.7,4.1,9.9,...,4,PG,82.000000,36.309199,25.4,9.0,5.2,4.8,ATL,0.500
443,Omer Yurtseven,6.6,12.7,0.4,1.4,2.1,3.2,3.9,13.6,2.8,...,206,C,12.000000,12.607143,17.4,0.8,-1.0,0.2,MIA,0.600
444,Cody Zeller,6.1,11.3,0.4,1.6,3.2,4.4,4.1,11.2,2.8,...,201,C,20.124019,24.593080,18.2,2.1,-0.5,0.4,CHO,0.400


`PER`, `OWS`, `BPM`, `VORP`:

In [235]:
#projects the statistics for the feature using function above
data["PER"] = data["Player"].apply(lambda x:find_statistic("PER", x))
data["OWS"] = data["Player"].apply(lambda x:find_statistic("OWS", x))
data["BPM"] = data["Player"].apply(lambda x:find_statistic("BPM", x))
data["VORP"] = data["Player"].apply(lambda x:find_statistic("VORP", x))

data

Unnamed: 0,Player,FG,FGA,3P,3PA,FT,FTA,ORB,TRB,AST,...,FT%,Age,SS,Pos,GS,MPG,PER,OWS,BPM,VORP
0,Precious Achiuwa,5.8,12.4,1.1,2.9,1.9,3.2,3.1,9.9,1.8,...,.595,23,266,C,52.011872,35.184019,12.7,0.4,-2.6,-0.2
1,Steven Adams,4.2,7.5,0.1,0.3,1.9,3.5,5.5,12.7,3.9,...,.536,29,427,C,43.119048,19.485874,17.6,3.8,2.0,2.0
2,Bam Adebayo,7.7,13.7,0.1,0.4,4.8,6.2,2.6,10.6,4.6,...,.765,25,55,C,37.800000,29.703250,21.8,3.6,3.8,2.7
3,LaMarcus Aldridge,8.0,15.6,0.9,2.7,2.8,3.3,2.0,7.9,2.0,...,.850,37,70,C,13.005494,19.897691,19.6,2.1,0.7,0.7
4,Nickeil Alexander-Walker,6.4,16.3,2.6,7.9,2.0,2.6,0.9,5.0,3.9,...,.747,24,131,SG,25.000000,14.562094,10.5,-1.1,-2.9,-0.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,Thaddeus Young,6.8,13.1,0.8,2.3,1.3,2.3,3.2,8.5,4.8,...,.576,34,216,PF,0.000000,15.251350,17.0,0.9,2.2,0.9
442,Trae Young,9.3,20.1,3.0,7.9,7.2,8.0,0.7,4.1,9.9,...,.896,24,4,PG,82.000000,32.224798,25.4,9.0,5.2,4.8
443,Omer Yurtseven,6.6,12.7,0.4,1.4,2.1,3.2,3.9,13.6,2.8,...,.661,24,206,C,12.000000,12.607143,17.4,0.8,-1.0,0.2
444,Cody Zeller,6.1,11.3,0.4,1.6,3.2,4.4,4.1,11.2,2.8,...,.731,30,201,C,0.000000,9.047779,18.2,2.1,-0.5,0.4


`W/L`

To get the W/L ratio of these players, I'll add in the most recent team they were playing for from the MVP dataset. However, I'll change teams for significant free agency signings and trades that have *already* happened.

I'll be using W/L ratios using my own predictions.

In [281]:
def find_tm(player):
    return mvp[mvp["Player"] == player].iloc[-1:]["Tm"].values[0]

data["Tm"] = data["Player"].apply(find_tm)

#significant player changes 
def change_team(player, team):
    if player in np.array(data["Player"]):
        data.iloc[data[data["Player"] == player].index[0],27] = team
    else:
        pass

change_team("Donovan Mitchell", "CLE")
change_team("Rudy Gobert", "MIN")
change_team("Dejounte Murray", "ATL")
change_team("Jalen Brunson", "NYK")
change_team("Lauri Markkanen", "UTA")
change_team("Collin Sexton", "UTA")
change_team("Patrick Beverly", "LAL")
change_team("Malik Beasley", "UTA")
change_team("Malcolm Brogdon", "BOS")
change_team("Christian Wood", "DAL")
change_team("Boban Marjanovic", "HOU")
change_team("Kentavious Caldwell-Pope", "DEN")
change_team("Will Barton", "WAS")
change_team("Nerlens Noel", "DET")

wl_ratios = {'TOR': 0.5, 'MEM': 0.7, 'MIA': 0.6, 'BRK': 0.625, 'TOT': np.nan, 
             'MIL': 0.6, 'CLE': 0.6, 'NOP': 0.45, 'ATL': 0.5, 'LAL': 0.5, 
             'ORL': 0.2, 'CHI': 0.55, 'WAS': 0.3, 'PHO': 0.65, 'CHO': 0.4, 
             'SAC': 0.2, 'NYK': 0.3, 'DEN': 0.725, 'SAS': 0.15, 'LAC': 0.6, 
             'GSW': 0.65, 'OKC': 0.25, 'MIN': 0.7, 'DET': 0.2, 'IND': 0.2, 
             'UTA': 0.15, 'POR': 0.3, 'BOS': 0.675, 'DAL': 0.6, 'HOU': 0.25, 
             'PHI': 0.65}

def set_wl(player):
    return wl_ratios[data[data["Player"] == player]["Tm"].values[0]]

data["WL"] = data["Player"].apply(set_wl)
data

Unnamed: 0,Player,FG,FGA,3P,3PA,FT,FTA,ORB,TRB,AST,...,SS,Pos,GS,MPG,PER,OWS,BPM,VORP,Tm,WL
0,Precious Achiuwa,5.8,12.4,1.1,2.9,1.9,3.2,3.1,9.9,1.8,...,266,C,52.011872,35.184019,12.7,0.4,-2.6,-0.2,TOR,0.500
1,Steven Adams,4.2,7.5,0.1,0.3,1.9,3.5,5.5,12.7,3.9,...,427,C,43.119048,19.485874,17.6,3.8,2.0,2.0,MEM,0.700
2,Bam Adebayo,7.7,13.7,0.1,0.4,4.8,6.2,2.6,10.6,4.6,...,55,C,37.800000,29.703250,21.8,3.6,3.8,2.7,MIA,0.600
3,LaMarcus Aldridge,8.0,15.6,0.9,2.7,2.8,3.3,2.0,7.9,2.0,...,70,C,13.005494,19.897691,19.6,2.1,0.7,0.7,BRK,0.625
4,Nickeil Alexander-Walker,6.4,16.3,2.6,7.9,2.0,2.6,0.9,5.0,3.9,...,131,SG,25.000000,14.562094,10.5,-1.1,-2.9,-0.3,TOT,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,Thaddeus Young,6.8,13.1,0.8,2.3,1.3,2.3,3.2,8.5,4.8,...,216,PF,0.000000,15.251350,17.0,0.9,2.2,0.9,TOT,
442,Trae Young,9.3,20.1,3.0,7.9,7.2,8.0,0.7,4.1,9.9,...,4,PG,82.000000,32.224798,25.4,9.0,5.2,4.8,ATL,0.500
443,Omer Yurtseven,6.6,12.7,0.4,1.4,2.1,3.2,3.9,13.6,2.8,...,206,C,12.000000,12.607143,17.4,0.8,-1.0,0.2,MIA,0.600
444,Cody Zeller,6.1,11.3,0.4,1.6,3.2,4.4,4.1,11.2,2.8,...,201,C,0.000000,9.047779,18.2,2.1,-0.5,0.4,CHO,0.400


All of the columns are now present in the data! Now I'll pre-process it properly to be predicted upon

In [285]:
data.columns

Index(['Player', 'FG', 'FGA', '3P', '3PA', 'FT', 'FTA', 'ORB', 'TRB', 'AST',
       'STL', 'BLK', 'TOV', 'PF', 'PTS', 'FG%', '3P%', 'FT%', 'Age', 'SS',
       'Pos', 'GS', 'MPG', 'PER', 'OWS', 'BPM', 'VORP', 'Tm', 'WL'],
      dtype='object', name=0)

In [297]:
ohe = OneHotEncoder(sparse = False)
def process_data_mvp(df):
    final_df = df.copy()
    
    #encoding
    positions = ohe.fit_transform(final_df["Pos"].values.reshape(-1, 1))
    encoded_pos = pd.DataFrame(positions, columns = ohe.categories_[0])
    final_df.rename(columns = {"PF": "Fouls"}, inplace = True)
    final_df = final_df.join(encoded_pos)

    #dropping
    features_to_drop = ['Player', 'FGA', 'FG%', '3PA', 'FT', 'TRB', 'SG']
    final_df = final_df.drop(features_to_drop, axis = 1)
    
    return final_df[mvp_model.get_booster().feature_names]

In [432]:
mvp_candidates = data.copy()
mvp_candidates.dropna(inplace = True)
mvp_candidates.reset_index(drop = True, inplace = True)
mvp_players = mvp_candidates["Player"]

mvp_candidates = process_data_mvp(mvp_candidates)
obj_cols = mvp_candidates.select_dtypes(include = 'object').columns
mvp_candidates[obj_cols] = mvp_candidates[obj_cols].astype("float")
mvp_candidates

Unnamed: 0,Age,GS,FG,3P,3P%,FTA,FT%,ORB,AST,STL,...,OWS,BPM,VORP,WL,SS,MPG,C,PF,PG,SF
0,23,52.000000,5.8,1.1,0.367,3.2,0.595,3.1,1.8,0.8,...,0.500000,-1.100000,0.000000,0.500,266,35.178307,1.0,0.0,0.0,0.0
1,29,79.805556,4.2,0.1,0.321,3.5,0.536,5.5,3.9,1.2,...,4.705556,2.575000,2.305556,0.700,427,24.700626,1.0,0.0,0.0,0.0
2,25,37.800000,7.7,0.1,0.279,6.2,0.765,2.6,4.6,1.4,...,2.140000,2.800000,1.500000,0.600,55,29.703250,1.0,0.0,0.0,0.0
3,37,13.005494,8.0,0.9,0.346,3.3,0.850,2.0,2.0,0.6,...,2.075053,0.590121,0.499638,0.625,70,19.897691,1.0,0.0,0.0,0.0
4,27,82.000000,5.2,3.1,0.404,1.9,0.857,0.7,2.4,1.0,...,4.100000,1.000000,1.600000,0.600,236,29.516970,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
371,30,5.000000,3.9,1.2,0.372,2.1,0.816,1.3,5.0,2.1,...,2.240000,1.770000,1.570000,0.500,415,15.606247,0.0,0.0,0.0,0.0
372,24,82.000000,9.3,3.0,0.378,8.0,0.896,0.7,9.9,1.0,...,10.700000,3.600000,5.350000,0.500,4,36.309199,0.0,0.0,1.0,0.0
373,24,12.000000,6.6,0.4,0.279,3.2,0.661,3.9,2.8,0.9,...,0.800000,-1.000000,0.200000,0.600,206,12.607143,1.0,0.0,0.0,0.0
374,30,20.124019,6.1,0.4,0.249,4.4,0.731,4.1,2.8,1.0,...,3.039384,-0.038579,0.530318,0.400,201,24.593080,1.0,0.0,0.0,0.0


This will be the final dataframe for the MVP model!

## DPOY

In [350]:
dpoy_features = np.array(dpoy_model.get_booster().feature_names)

In [352]:
print("Satisfied Columns: ", dpoy_features[[item in data.columns for item in dpoy_features]])
print("Unsatified Columns: ", dpoy_features[[item not in data.columns for item in dpoy_features]])

Satisfied Columns:  ['Age' 'GS' 'FGA' '3P%' 'FTA' 'FT%' 'BLK' 'TOV' 'VORP' 'PF' 'STL']
Unsatified Columns:  ['G' 'MP' '2PA' 'Fouls' 'USG%' 'DWS' 'WS/48' 'DBPM' 'DEF\xa0RTG' 'DREB'
 '%DREB' 'OPP\xa0PTS2ND\xa0CHANCE' 'SG' 'STL%']


`Fouls`, `2PA`, `DREB`, `SG` can all be found with the variables that we already have. 

In [373]:
dpoy_candidates = data[['Age', 'GS', 'FGA', '3P%', 'FTA', 'FT%', 'BLK', 'TOV', 'VORP', 'PF', 'STL']].copy()

#fouls
dpoy_candidates.rename(columns = {"PF": "Fouls"}, inplace = True)

#2PA and DREB
dpoy_candidates["DREB"] = data["TRB"].apply(float) - data["ORB"].apply(float)
dpoy_candidates["2PA"] = data["FGA"].apply(float) - data["3PA"].apply(float)

#SG and PF
dpoy_candidates["SG"] = pd.get_dummies(data["Pos"])["SG"]
dpoy_candidates["PF"] = pd.get_dummies(data["Pos"])["PF"]

dpoy_candidates.head(5)

Unnamed: 0,Age,GS,FGA,3P%,FTA,FT%,BLK,TOV,VORP,PF,STL,DREB,2PA,SG
0,23,52.011872,12.4,0.367,3.2,0.595,0.9,1.8,-0.2,3.3,0.8,6.8,9.5,0
1,29,43.119048,7.5,0.321,3.5,0.536,1.0,2.0,2.0,2.7,1.2,7.2,7.2,0
2,25,37.8,13.7,0.279,6.2,0.765,1.0,2.8,2.7,2.9,1.4,8.0,13.3,0
3,37,13.005494,15.6,0.346,3.3,0.85,1.5,1.6,0.7,2.7,0.6,5.9,12.9,0
4,24,25.0,16.3,0.33,2.6,0.747,0.6,2.3,-0.3,2.7,1.3,4.1,8.4,1


`G`, `USG%`, `WS/48`, `DWS`, and `DBPM` can be found using the polynomial regression function defined in the MVP section.

In [503]:
for stat in ["G", 'USG%', 'DWS', 'WS/48', 'DBPM']:
    data[stat] = data["Player"].apply(lambda x:find_statistic(stat, x)) 
    dpoy_candidates[stat] = data[stat]

The *advanced defensive statistics* (`DEF RTG`, `%DREB`, `OPP PTS2ND CHANCE`, and `STL%`) requires a additional cleaning approach, but uses the same polynomial regression methodology:

In [483]:
dpoy["3P%"].fillna(0, inplace = True)
dpoy["FT%"].fillna(0, inplace = True)
dpoy["Share"].fillna(0, inplace = True)
dpoy.dropna(axis = 0, inplace = True)
dpoy.isna().sum().sum()

0

In [505]:
def find_statistic_dpoy(statistic, player): #slightly adjusted due to different features
    if player not in dpoy["Player"].unique():
        return np.nan
    else:
        plyr = dpoy[dpoy["Player"] == player]
        model = LinearRegression()
    
        options = []
        prev_season = dpoy[dpoy["Player"] == player].iloc[-1:][statistic].values[0]
    
        for deg in range(1, 4):
            poly = PolynomialFeatures(degree = deg)
            x_poly = poly.fit_transform(plyr["Year"].values.reshape(-1, 1))
            model.fit(x_poly, plyr[statistic].values.reshape(-1, 1))
            options.append(model.predict(poly.fit_transform(np.array([[2023]])))[0][0])
        return closest(options, prev_season)

for stat in ['DEF\xa0RTG', '%DREB', 'OPP\xa0PTS2ND\xa0CHANCE', 'STL%']:
    dpoy_candidates[stat] = data["Player"].apply(lambda x:find_statistic_dpoy(stat, x)) 
    
dpoy_candidates[["G", 'USG%', 'DWS', 'WS/48', 'DBPM', 'DEF\xa0RTG', '%DREB', 'OPP\xa0PTS2ND\xa0CHANCE', 'STL%']]

Unnamed: 0,G,USG%,DWS,WS/48,DBPM,DEF RTG,%DREB,OPP PTS2ND CHANCE,STL%
0,85.000000,17.500000,3.200000,0.055000,-0.700000,104.200000,29.500000,10.200000,7.400000
1,77.480527,14.572222,3.038889,0.173528,0.818419,109.687257,26.102778,6.940853,21.454762
2,55.400000,28.100000,4.080000,0.190200,2.580000,101.520000,32.740000,5.820000,27.180000
3,48.041209,20.324176,0.482418,0.128396,-0.859714,108.565732,25.898056,7.544421,11.865280
4,70.666667,24.333333,0.700000,0.029000,-0.500000,111.100000,13.200000,6.499703,20.633333
...,...,...,...,...,...,...,...,...,...
441,56.454945,17.236703,1.023170,0.133446,1.963956,106.331380,22.667619,3.630769,34.638368
442,84.668690,36.700000,1.050000,0.165250,-2.450000,114.750000,12.500000,9.450000,18.550000
443,56.000000,19.900000,1.400000,0.145000,0.400000,108.000000,42.600000,3.600000,16.200000
444,49.456710,19.010616,0.703767,0.167000,0.018602,115.992011,30.459932,6.552397,16.225403


Lastly, `MP` needs to be found. This can be done using `MPG` and `G`. 

In [508]:
G = data["G"].copy()
G[G < 0] = 0
G[G > 82] = 82
data["G"] = G
dpoy_candidates["G"] = G

dpoy_candidates["MP"] = data["G"] * data["MPG"]

Final cleaning:

In [522]:
#reorders dataframe 
dpoy_candidates = dpoy_candidates[dpoy_model.get_booster().feature_names]

#drops na
dpoy_candidates.dropna(inplace = True)

#saves player names 
dpoy_players = data["Player"][dpoy_candidates.index]

#converts all columns to floats 
obj_cols = dpoy_candidates.select_dtypes(include = 'object').columns
dpoy_candidates[obj_cols] = dpoy_candidates[obj_cols].astype("float")
dpoy_candidates

Unnamed: 0,Age,G,GS,MP,FGA,3P%,2PA,FTA,FT%,BLK,...,DBPM,VORP,DEF RTG,DREB,%DREB,OPP PTS2ND CHANCE,PF,SG,STL,STL%
0,23,82.000000,52.011872,2884.621154,12.4,0.367,9.5,3.2,0.595,0.9,...,-0.700000,-0.2,104.200000,6.8,29.500000,10.200000,0,0,0.8,7.400000
1,29,77.480527,43.119048,1913.817504,7.5,0.321,7.2,3.5,0.536,1.0,...,0.818419,2.0,109.687257,7.2,26.102778,6.940853,0,0,1.2,21.454762
2,25,55.400000,37.800000,1645.560046,13.7,0.279,13.3,6.2,0.765,1.0,...,2.580000,2.7,101.520000,8.0,32.740000,5.820000,0,0,1.4,27.180000
3,37,48.041209,13.005494,955.909121,15.6,0.346,12.9,3.3,0.850,1.5,...,-0.859714,0.7,108.565732,5.9,25.898056,7.544421,0,0,0.6,11.865280
4,24,70.666667,25.000000,2048.338177,16.3,0.330,8.4,2.6,0.747,0.6,...,-0.500000,-0.3,111.100000,4.1,13.200000,6.499703,0,1,1.3,20.633333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,34,56.454945,0.000000,861.014111,13.1,0.336,10.8,2.3,0.576,0.8,...,1.963956,0.9,106.331380,5.3,22.667619,3.630769,1,0,1.9,34.638368
442,24,82.000000,82.000000,2977.354312,20.1,0.378,12.2,8.0,0.896,0.2,...,-2.450000,4.8,114.750000,3.4,12.500000,9.450000,0,0,1.0,18.550000
443,24,56.000000,12.000000,706.000000,12.7,0.279,11.3,3.2,0.661,1.0,...,0.400000,0.2,108.000000,9.7,42.600000,3.600000,0,0,0.9,16.200000
444,30,49.456710,0.000000,1216.292834,11.3,0.249,9.7,4.4,0.731,0.6,...,0.018602,0.4,115.992011,7.1,30.459932,6.552397,0,0,1.0,16.225403


This will be the final DPOY dataframe used for the model!

## SMOY

In [545]:
smoy_features = np.array(smoy_model.get_booster().feature_names)
smoy_features

array(['Age', 'G', 'GS', 'MP', 'FG%', '3P%', '2P%', 'FT', 'DRB', 'TRB',
       'AST', 'STL', 'TOV', 'Fouls', 'PTS', 'TS%', 'OWS', 'WS', 'DBPM',
       'BPM', 'VORP', 'SS', 'C', 'PG', 'SG'], dtype='<U5')

In [548]:
print("Satisfied Columns: ", smoy_features[[item in data.columns for item in smoy_features]])
print("Unsatified Columns: ", smoy_features[[item not in data.columns for item in smoy_features]])

Satisfied Columns:  ['Age' 'G' 'GS' 'FG%' '3P%' 'FT' 'TRB' 'AST' 'STL' 'TOV' 'PTS' 'OWS'
 'DBPM' 'BPM' 'VORP' 'SS']
Unsatified Columns:  ['MP' '2P%' 'DRB' 'Fouls' 'TS%' 'WS' 'C' 'PG' 'SG']


`MP`, `DRB`, `Fouls`, `C`, `PG`, and `SG` can all be found using the existing columns:

In [553]:
#Adding MP and DRB to data dataframe 
data["MP"] = data["G"].apply(float) * data["MPG"].apply(float)
data["DRB"] = data["TRB"].apply(float) - data["ORB"].apply(float)

smoy_candidates = data[['Age', 'G', 'GS', 'FG%', '3P%', 'FT', 'TRB', 'AST', 'STL',
                        'TOV', 'PTS', 'OWS', 'DBPM', 'BPM', 'VORP', 'SS', 'MP', 'DRB', 'PF']].copy()

#Fouls column
smoy_candidates.rename(columns = {"PF": "Fouls"}, inplace = True)

#Position columns
smoy_candidates["C"] = pd.get_dummies(data["Pos"])["C"]
smoy_candidates["PG"] = pd.get_dummies(data["Pos"])["PG"]
smoy_candidates["SG"] = pd.get_dummies(data["Pos"])["SG"]    

smoy_candidates

Unnamed: 0,Age,G,GS,FG%,3P%,FT,TRB,AST,STL,TOV,...,DBPM,BPM,VORP,SS,MP,DRB,Fouls,C,PG,SG
0,23,82.000000,52.000000,.467,.367,1.9,9.9,1.8,0.8,1.8,...,-0.700000,-1.100000,0.000000,266,2884.621154,6.8,3.3,1,0,0
1,29,77.480527,79.805556,.559,.321,1.9,12.7,3.9,1.2,2.0,...,0.818419,2.575000,2.305556,427,1913.817504,7.2,2.7,1,0,0
2,25,55.400000,37.800000,.563,.279,4.8,10.6,4.6,1.4,2.8,...,2.580000,2.800000,1.500000,55,1645.560046,8.0,2.9,1,0,0
3,37,48.041209,13.005494,.511,.346,2.8,7.9,2.0,0.6,1.6,...,-0.859714,0.590121,0.499638,70,955.909121,5.9,2.7,1,0,0
4,24,70.666667,24.996037,.396,.330,2.0,5.0,3.9,1.3,2.3,...,-0.500000,-1.233333,-0.066667,131,2048.338177,4.1,2.7,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,34,56.454945,0.000000,.516,.336,1.3,8.5,4.8,1.9,2.5,...,1.963956,2.241319,1.073846,216,861.014111,5.3,3.3,0,0,0
442,24,82.000000,82.000000,.461,.378,7.2,4.1,9.9,1.0,4.1,...,-2.450000,3.600000,5.350000,4,2977.354312,3.4,1.8,0,1,0
443,24,56.000000,12.000000,.521,.279,2.1,13.6,2.8,0.9,2.0,...,0.400000,-1.000000,0.200000,206,706.000000,9.7,4.0,1,0,0
444,30,49.456710,20.124019,.537,.249,3.2,11.2,2.8,1.0,1.9,...,0.018602,-0.038579,0.530318,201,1216.292834,7.1,4.4,1,0,0


`2P%`, `TS%`, and `WS` can be found using the polynomial regression method found in MVP section.

In [554]:
for stat in ["2P%", 'TS%', 'WS']:
    data[stat] = data["Player"].apply(lambda x:find_statistic(stat, x)) 
    smoy_candidates[stat] = data[stat]

Final cleaning:

In [559]:
#reduces to qualified SMOY candidates
smoy_candidates[smoy_candidates["GS"] < (smoy_candidates["G"] / 2)]

#saves players 
smoy_players = data["Player"][smoy_candidates.index]

#converts all columns to floats 
obj_cols = smoy_candidates.select_dtypes(include = 'object').columns
smoy_candidates[obj_cols] = smoy_candidates[obj_cols].astype("float")
smoy_candidates

Unnamed: 0,Age,G,GS,FG%,3P%,FT,TRB,AST,STL,TOV,...,SS,MP,DRB,Fouls,C,PG,SG,2P%,TS%,WS
3,37,48.041209,13.005494,0.511,0.346,2.8,7.9,2.0,0.6,1.6,...,70,955.909121,5.9,2.7,1,0,0,0.581942,0.612157,2.176099
4,24,70.666667,24.996037,0.396,0.330,2.0,5.0,3.9,1.3,2.3,...,131,2048.338177,4.1,2.7,0,0,1,0.478333,0.492000,0.500000
7,24,54.000000,1.000000,0.456,0.309,1.7,4.8,6.2,2.8,1.8,...,248,834.000000,3.6,3.1,0,1,0,0.538000,0.520000,2.100000
8,29,37.071429,0.000000,0.404,0.278,1.9,5.8,3.7,1.0,1.3,...,340,238.829268,5.0,2.7,0,0,0,0.456000,0.506786,-0.607143
9,29,61.714286,0.000000,0.455,0.343,2.0,8.2,4.5,1.7,1.7,...,299,1057.206410,6.8,2.6,0,0,0,0.427143,0.480000,3.842857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
437,26,50.000000,0.000000,0.422,0.325,1.5,7.1,2.8,1.2,1.7,...,403,459.000000,5.5,3.2,0,0,0,0.571000,0.519000,0.800000
440,30,76.600000,5.000000,0.457,0.372,1.7,5.6,5.0,2.1,1.4,...,415,1195.438540,4.3,1.6,0,0,1,0.506100,0.570700,3.680000
441,34,56.454945,0.000000,0.516,0.336,1.3,8.5,4.8,1.9,2.5,...,216,861.014111,5.3,3.3,0,0,0,0.547143,0.544571,2.351078
443,24,56.000000,12.000000,0.521,0.279,2.1,13.6,2.8,0.9,2.0,...,206,706.000000,9.7,4.0,1,0,0,0.547000,0.546000,2.100000


## MIP

In [585]:
mip_features = np.array(mip_model.get_booster().feature_names)
mip_features

array(['Age', 'G', 'GS', 'MP', 'TOV', 'OWS', 'WS', 'VORP', 'GS - %',
       'MP - %', 'TS% - %', 'GS - Q', 'TRB - Q', 'AST - Q', 'STL - Q',
       'PTS - Q', 'USG% - Q', 'WS/48 - Q', 'BPM - Q', 'VORP - Q', 'C',
       'PF', 'PG', 'SG'], dtype='<U9')

In [586]:
print("Satisfied Columns: ", mip_features[[item in data.columns for item in mip_features]])
print("Unsatified Columns: ", mip_features[[item not in data.columns for item in mip_features]])

Satisfied Columns:  ['Age' 'G' 'GS' 'MP' 'TOV' 'OWS' 'WS' 'VORP' 'PF']
Unsatified Columns:  ['GS - %' 'MP - %' 'TS% - %' 'GS - Q' 'TRB - Q' 'AST - Q' 'STL - Q'
 'PTS - Q' 'USG% - Q' 'WS/48 - Q' 'BPM - Q' 'VORP - Q' 'C' 'PG' 'SG']


Let's first address `C`, `PG`, `SG`, and `PF`:

In [601]:
mip_candidates = data[['Age', 'G', 'GS', 'MP', 'TOV', 'OWS', 'WS', 'VORP']].copy()

mip_candidates["C"] = pd.get_dummies(data["Pos"])["C"]
mip_candidates["PG"] = pd.get_dummies(data["Pos"])["PG"]
mip_candidates["SG"] = pd.get_dummies(data["Pos"])["SG"] 
mip_candidates["PF"] = pd.get_dummies(data["Pos"])["PF"] 

As expected, each of these categories are change variables, which we can calculate using these two following functions:

In [628]:
def find_rel_change(statistic, player):
    current_stat = mvp[mvp["Player"] == player].iloc[-1][statistic]
    if current_stat == 0:
        return 0
    else:
        future_stat = data[data["Player"] == player][statistic].values[0]
        return (future_stat - current_stat) / current_stat

def find_qnt_change(statistic, player):
    current_stat = mvp[mvp["Player"] == player].iloc[-1][statistic]
    future_stat = float(data[data["Player"] == player][statistic].values[0])
    return future_stat - current_stat

Applying the functions for `%` and `Q` data:

In [649]:
for stat in ['GS', 'MP', 'TS%']:
    mip_candidates[stat + " - Q"] = data["Player"].apply(lambda x:find_rel_change(stat, x))

In [652]:
for stat in ['GS', 'TRB', 'AST', 'STL', 'PTS', 'USG%', 'WS/48', 'BPM', 'VORP']:
    mip_candidates[stat + " - Q"] = data["Player"].apply(lambda x:find_qnt_change(stat, x))

Final cleaning:

In [655]:
#saves players 
mip_players = data["Player"]

#converts all columns to floats 
obj_cols = mip_candidates.select_dtypes(include = 'object').columns
mip_candidates[obj_cols] = mip_candidates[obj_cols].astype("float")
mip_candidates

Unnamed: 0,Age,G,GS,MP,TOV,OWS,WS,VORP,C,PG,...,TS% - %,GS - Q,TRB - Q,AST - Q,STL - Q,PTS - Q,USG% - Q,WS/48 - Q,BPM - Q,VORP - Q
0,23,82.000000,52.000000,2884.621154,1.8,0.500000,3.700000,0.000000,1,0,...,-0.093439,24.000000,0.0,0.1,0.0,0.6,-1.000000,-0.015000,1.500000,0.200000
1,29,77.480527,79.805556,1913.817504,2.0,4.705556,7.725000,2.305556,1,0,...,-0.038939,4.805556,-1.0,-0.7,0.0,0.9,2.572222,0.010528,0.575000,0.305556
2,25,55.400000,37.800000,1645.560046,2.8,2.140000,9.500000,1.500000,1,0,...,0.033579,-18.200000,-0.5,0.9,-0.2,-0.8,3.100000,0.002200,-1.000000,-1.200000
3,37,48.041209,13.005494,955.909121,1.6,2.075053,2.176099,0.499638,1,0,...,0.013504,1.005494,-0.9,0.6,0.1,-1.1,-2.075824,-0.012604,-0.109879,-0.200362
4,24,70.666667,24.996037,2048.338177,2.3,-1.100000,0.500000,-0.066667,0,0,...,0.035789,3.996037,0.4,0.1,0.2,0.5,0.233333,0.026000,1.666667,0.233333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
441,34,56.454945,0.000000,861.014111,2.5,1.111868,2.351078,1.073846,0,0,...,-0.006257,-1.000000,-0.4,0.4,-0.4,1.9,-0.163297,0.007446,0.041319,0.173846
442,24,82.000000,82.000000,2977.354312,4.1,10.700000,11.950000,5.350000,0,1,...,0.041459,6.000000,0.2,-0.1,0.0,-0.6,2.300000,-0.015750,-1.600000,0.550000
443,24,56.000000,12.000000,706.000000,2.0,0.800000,2.100000,0.200000,1,0,...,0.000000,0.000000,-1.4,0.3,0.0,0.5,0.000000,0.000000,0.000000,0.000000
444,30,49.456710,20.124019,1216.292834,1.9,3.039384,3.840411,0.530318,1,0,...,0.023074,-0.875981,-0.5,-0.3,0.0,-0.4,0.710616,0.011000,0.461421,0.130318


This will be the final MIP dataframe for the model!

# Predicting the Awards

**MVP**

In [460]:
mvp_predictions = mvp_model.predict(mvp_candidates)
mvp_table = pd.DataFrame({"Player": mvp_players, "Predicted Share": mvp_predictions})

#punishes players for not playing previous seasons/not playing sufficient games in previous seasons
def adjust_share(player):
    last_year_played = mvp[mvp["Player"] == player]["Year"].values[-1:][0]
    subtract = 0.1 * (2022 - last_year_played)
    mvp_table.iloc[mvp_table[mvp_table["Player"] == player].index[0], 1] -= subtract

mvp_table["Player"].apply(adjust_share)

#Accounts for voter fatigue as Nikola Jokic is currently a 2x MVP
mvp_table.iloc[mvp_table[mvp_table["Player"] == "Nikola Jokić"].index[0], 1] -= 0.2

output = mvp_table.sort_values("Predicted Share", ascending = False).head(8)
if output.iloc[0]["Player"] == "Nikola Jokić": 
    output.iloc[1, 1] += 0.15
    output.iloc[2, 1] += 0.05
else:
    output.iloc[0, 1] += 0.15
    output.iloc[1, 1] += 0.05

output.sort_values("Predicted Share", ascending = False)

Unnamed: 0,Player,Predicted Share
109,Joel Embiid,0.861043
187,Nikola Jokić,0.771201
9,Giannis Antetokounmpo,0.619464
372,Trae Young,0.191336
87,Stephen Curry,0.150992
95,Luka Dončić,0.1452
366,Robert Williams,0.127036
180,LeBron James,0.114912


**DPOY**

In [540]:
dpoy_predictions = dpoy_model.predict(dpoy_candidates)
dpoy_table = pd.DataFrame({"Player": dpoy_players, "Predicted Share": dpoy_predictions})
dpoy_table.reset_index(drop = True, inplace = True)

#punishes players for not playing previous seasons/not playing sufficient games in previous seasons
def adjust_share_dpoy(player):
    last_year_played = mvp[mvp["Player"] == player]["Year"].values[-1:][0] #mvp df has all players
    subtract = 0.05 * (2022 - last_year_played)
    dpoy_table.iloc[dpoy_table[dpoy_table["Player"] == player].index[0], 1] -= subtract

dpoy_table["Player"].apply(adjust_share_dpoy)
dpoy_table.sort_values("Predicted Share", ascending = False).head(8)

Unnamed: 0,Player,Predicted Share
2,Bam Adebayo,0.442633
51,Mikal Bridges,0.390356
144,Rudy Gobert,0.364425
375,Matisse Thybulle,0.196575
302,Chuma Okeke,0.162126
112,Andre Drummond,0.144173
287,Mike Muscala,0.122435
403,Hassan Whiteside,0.117096


**SMOY**

In [599]:
smoy_predictions = smoy_model.predict(smoy_candidates)
smoy_table = pd.DataFrame({"Player": smoy_players, "Predicted Share": smoy_predictions})
smoy_table.reset_index(drop = True, inplace = True)

#punishes players for not playing previous seasons/not playing sufficient games in previous seasons 
def adjust_share_smoy(player):
    last_year_played = mvp[mvp["Player"] == player]["Year"].values[-1:][0] #mvp df has all players
    subtract = 0.05 * (2022 - last_year_played)
    smoy_table.iloc[smoy_table[smoy_table["Player"] == player].index[0], 1] -= subtract

smoy_table["Player"].apply(adjust_share_smoy)
smoy_table.sort_values("Predicted Share", ascending = False).head(8)

Unnamed: 0,Player,Predicted Share
106,DeAndre Jordan,0.645965
234,Omer Yurtseven,0.638866
125,Kevin Love,0.621863
90,Dwight Howard,0.60244
82,Willy Hernangómez,0.601926
215,Juan Toscano-Anderson,0.598547
18,Nemanja Bjelica,0.591979
226,Hassan Whiteside,0.581052


**MIP**

In [657]:
mip_predictions = mip_model.predict(mip_candidates)
mip_table = pd.DataFrame({"Player": mip_players, "Predicted Share": mip_predictions})
mip_table.reset_index(drop = True, inplace = True)

#punishes players for not playing previous seasons/not playing sufficient games in previous seasons 
def adjust_share_mip(player):
    last_year_played = mvp[mvp["Player"] == player]["Year"].values[-1:][0] #mvp df has all players
    subtract = 0.05 * (2022 - last_year_played)
    mip_table.iloc[mip_table[mip_table["Player"] == player].index[0], 1] -= subtract

mip_table["Player"].apply(adjust_share_mip)
mip_table.sort_values("Predicted Share", ascending = False).head(8)

Unnamed: 0,Player,Predicted Share
332,Elfrid Payton,0.180387
434,Robert Williams,0.163101
358,Mitchell Robinson,0.157416
210,Josh Jackson,0.155422
407,Jarred Vanderbilt,0.154772
53,Tony Bradley,0.153237
24,Desmond Bane,0.149686
368,Tomáš Satoranský,0.145549


## Final Results

In [672]:
cols = pd.MultiIndex.from_product([["MVP", "DPOY", "SMOY", "MIP"], 
                                   ["Player", "Share"]])

mvps = mvp_table.sort_values("Predicted Share", ascending = False).head(8).reset_index(drop = True).copy()
dpoys = dpoy_table.sort_values("Predicted Share", ascending = False).head(8).reset_index(drop = True).copy()
smoys = smoy_table.sort_values("Predicted Share", ascending = False).head(8).reset_index(drop = True).copy()
mips = mip_table.sort_values("Predicted Share", ascending = False).head(8).reset_index(drop = True).copy()

results = pd.concat([mvps, dpoys, smoys, mips], axis = 1)
results.columns = cols
results.insert(0, "Rank", range(1, 9))
results.index = [''] * 8
results

Unnamed: 0_level_0,Rank,MVP,MVP,DPOY,DPOY,SMOY,SMOY,MIP,MIP
Unnamed: 0_level_1,Unnamed: 1_level_1,Player,Share,Player,Share,Player,Share,Player,Share
,1,Nikola Jokić,0.771201,Bam Adebayo,0.442633,DeAndre Jordan,0.645965,Elfrid Payton,0.180387
,2,Joel Embiid,0.711043,Mikal Bridges,0.390356,Omer Yurtseven,0.638866,Robert Williams,0.163101
,3,Giannis Antetokounmpo,0.569464,Rudy Gobert,0.364425,Kevin Love,0.621863,Mitchell Robinson,0.157416
,4,Trae Young,0.191336,Matisse Thybulle,0.196575,Dwight Howard,0.60244,Josh Jackson,0.155422
,5,Stephen Curry,0.150992,Chuma Okeke,0.162126,Willy Hernangómez,0.601926,Jarred Vanderbilt,0.154772
,6,Luka Dončić,0.1452,Andre Drummond,0.144173,Juan Toscano-Anderson,0.598547,Tony Bradley,0.153237
,7,Robert Williams,0.127036,Mike Muscala,0.122435,Nemanja Bjelica,0.591979,Desmond Bane,0.149686
,8,LeBron James,0.114912,Hassan Whiteside,0.117096,Hassan Whiteside,0.581052,Tomáš Satoranský,0.145549


In [None]:
results.style()