## NBA Team Optimization

The following notebook is a personal project, it was made for me to explore linear optimization in python. I combined my interest in basketball with my knoweldge of optimization. This project attempts a hypotetical scenario where i am attempting to build a NBA championship team based of the stats from previous chanmpionshi teams, to try and minimize the total salaries. Largly inspired by the moneyball approch to building sport teams. The optimisation problem uses the 75 percentile of a set of team statistics as the constraints. As my knowledge of advanced statistics in basketball is limited, i have used simple statisitcal measures in order to find the optimal combination of players. 

### Data Management

In [1]:
# Importing Files and packages

import pandas as pd
import numpy as np

Player_stats = pd.read_csv("NBA Player Stats 2023 RS.csv")
Player_salaries = pd.read_excel("NBA Players Salary 2023.xlsx")
Team_stats = pd.read_excel("NBA Team Stats.xlsx")

In [2]:
# Merging datasets of salary and key statistics
Players = Player_stats.merge(Player_salaries, how="inner", on=["NAME", "TEAM"])

# Make 2point and 3point attempts a per game statistic
Players["2PApg"] = Players["2PA"]/Players["GP"]
Players["3PApg"] = Players["3PA"]/Players["GP"]

# Removing Variable columns that wont be used for Optimization
Players = Players.drop(["RANK", "AGE", "GP", "USG%", "TO%", "FTA", "2PA", "3PA", "eFG%", "TS%", 
                        "TPG", "P+R", "P+A", "P+R+A", "VI", "ORtg", "DRtg"], axis=1)

# Changing Categorical Variable names
Players["POS"] = Players["POS"].replace({"C-F":"CF", "F-G":"FG", "G-F":"FG", "F-C":"CF"})

In [3]:
Players

Unnamed: 0,NAME,TEAM,POS,MPG,FT%,2P%,3P%,PPG,RPG,APG,SPG,BPG,Salary,All Star,2PApg,3PApg
0,Joel Embiid,Phi,CF,34.6,0.857,0.587,0.330,33.1,10.2,4.2,1.0,1.7,33616770,1,17.090909,3.030303
1,Luka Doncic,Dal,FG,36.2,0.742,0.588,0.342,32.4,8.6,8.0,1.4,0.5,37096500,1,13.757576,8.196970
2,Damian Lillard,Por,G,36.3,0.914,0.574,0.371,32.2,4.8,7.3,0.9,0.3,42492492,1,9.379310,11.344828
3,Shai Gilgeous-Alexander,Okc,FG,35.5,0.905,0.533,0.345,31.4,4.8,5.5,1.6,1.0,30913750,1,17.838235,2.470588
4,Giannis Antetokounmpo,Mil,F,32.1,0.645,0.596,0.275,31.1,11.8,5.7,0.8,0.8,42492492,1,17.587302,2.714286
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
352,Wendell Moore Jr.,Min,G,5.3,0.800,0.615,0.118,1.4,0.6,0.6,0.3,0.2,2306520,0,0.896552,0.586207
353,Tyrese Martin,Atl,G,4.1,1.000,0.500,0.143,1.3,0.8,0.1,0.1,0.0,1017781,0,1.000000,0.437500
354,Vit Krejci,Atl,G,5.7,0.500,0.625,0.238,1.2,0.9,0.6,0.2,0.0,1563518,0,0.551724,0.724138
355,Joe Wieskamp,Tor,FG,5.5,0.000,0.000,0.250,1.0,0.4,0.3,0.0,0.0,2720989,0,0.222222,1.333333


In [4]:
# Changing position variable to binary to easier compute the optimization
New_pos = ["Guard", "F", "Center"]
Short_pos = ["G","FG", "CF", "C", "F"]
j = 0

for i in range(0,3,1):
    Players[New_pos[i]] = np.where(np.logical_or((Players["POS"] == Short_pos[j]), (Players["POS"] == Short_pos[j+1])), 1,0)
    j += 1
Players['Forward'] = np.where(np.logical_or((Players['POS']== "F"), (Players["F"]== 1)), 1, 0)

Players = Players.drop(["POS", "F"], axis=1)

Players["Player"] = 1

Player_ID = []
for i in range(len(Players)):
    Player_ID.append("x_"+str(i+1))
Players["Player_ID"] = Player_ID

In [5]:
# Renaming columns and removing columns in dataset for the championship teams
Players = Players.rename(columns = {"PPG":"PTS", "RPG":"REB", "APG":"AST", "SPG":"STL","BPG":"BLK","TEAM_y":"Team","2PApg":"2PA", "3PApg":"3PA"})
Team_stats = Team_stats.rename(columns = {"TRB":"REB"})
Teams = Team_stats.drop(["FGA", "FG%", "ORB", "DRB"], axis=1)

## Optimization

In [6]:
from pulp import *
import openpyxl
import re

In [7]:
# Function for multiplying column statistic with decision variable
import operator
def dotproduct(vec1, vec2):
    return sum(map(operator.mul, vec1, vec2))

In [8]:
# Creating Decisiom Variables
dec_var = []
for i in range(len(Players)):
     var_name = "x_"+str(i+1)
     var_name = LpVariable(var_name, cat="Binary")
     dec_var.append(var_name)

# Creating the optimization problem
min_salary = LpProblem("min_salary", LpMinimize) 

# Defining Objective Function
min_salary += dotproduct(Players["Salary"], dec_var)

In [9]:
# Adding sum statistical constraints
constraint_var = ["3PA", "2PA", "REB", "AST", "STL", "BLK", "PTS"]
for col in constraint_var:
    min_salary += dotproduct(Players[col], dec_var) >= Teams[col].values[-1]

# Adding average statistical constraints
constraint_var = ["3P%", "2P%", "FT%"]
for col in constraint_var:
    min_salary += dotproduct(Players[col], dec_var)/15 >= Teams[col].values[-1]

# Adding total roster constraints
min_salary += dotproduct(Players["Player"], dec_var) == 15

# Adding Positional constraints
posistions = ["Guard", "Forward", "Center", "All Star"]
pos_constraints = [5, 5, 3, 2]

for i in range(0,len(posistions)):
    min_salary += dotproduct(Players[posistions[i]], dec_var) >= pos_constraints[i]

In [10]:
# Optimizing solution
res = min_salary.solve()


In [11]:
# Gathering row indexes for the selected players
indexes = []
for element in dec_var:
    if element.varValue == 1:
        player_index = Players.index[Players["Player_ID"] == str(element)].tolist()
        indexes.append(player_index)

In [12]:
# Creating dataframe from the rows of all the optimal players
optimal_team_rows = []
for x in indexes:
    for y in x:
        row = Players.iloc[y]
        optimal_team_rows.append(row) 

optimal_team = pd.DataFrame(optimal_team_rows, columns=Players.columns)
optimal_team

Unnamed: 0,NAME,TEAM,MPG,FT%,2P%,3P%,PTS,REB,AST,STL,BLK,Salary,All Star,2PA,3PA,Guard,Center,Forward,Player,Player_ID
19,Anthony Edwards,Min,36.0,0.756,0.513,0.368,24.6,5.8,4.4,1.6,0.7,10733400,1,12.189873,7.329114,1,0,0,1,x_20
36,Tyrese Haliburton,Ind,33.6,0.867,0.572,0.4,20.7,3.7,10.4,1.6,0.4,4215120,1,7.839286,7.178571,1,0,0,1,x_37
92,Kris Dunn,Uta,25.8,0.774,0.55,0.472,13.2,4.5,5.6,1.1,0.5,735819,0,8.181818,1.636364,1,0,0,1,x_93
145,Eugene Omoruyi,Det,21.9,0.723,0.526,0.293,9.7,3.5,1.0,0.8,0.2,555402,0,4.470588,3.411765,0,0,1,1,x_146
160,Bol Bol,Orl,21.5,0.759,0.633,0.265,9.1,5.8,1.0,0.4,1.2,2200000,0,5.214286,1.614286,0,1,1,1,x_161
169,Jaden Hardy,Dal,14.8,0.823,0.469,0.404,8.8,1.9,1.4,0.4,0.1,1017781,0,3.6875,3.25,1,0,0,1,x_170
219,Drew Eubanks,Por,20.3,0.664,0.655,0.389,6.6,5.4,1.3,0.5,1.3,1836090,0,4.051282,0.230769,0,1,1,1,x_220
220,Jordan Goodwin,Was,17.8,0.768,0.511,0.322,6.6,3.3,2.7,0.9,0.4,900000,0,3.790323,1.903226,1,0,0,1,x_221
271,Sam Merrill,Cle,11.7,1.0,1.0,0.278,5.0,1.8,1.0,0.8,0.0,1000000,0,0.8,3.6,1,0,0,1,x_272
276,Meyers Leonard,Mil,12.6,0.889,0.636,0.389,4.8,3.8,0.1,0.2,0.0,105522,0,1.222222,2.0,0,1,1,1,x_277


In [13]:
optimal_team_sum = optimal_team[["PTS", "REB", "AST", "STL", "BLK", "2PA", "3PA"]].sum()
print(optimal_team_sum)
print("Team Salary: ", optimal_team["Salary"].sum())

PTS    127.700000
REB     48.600000
AST     31.400000
STL      9.600000
BLK      6.100000
2PA     58.517646
3PA     39.149623
dtype: float64
Team Salary:  27879573


### Per game variable to per minute 
Changing the per game variables to per minute, seeing if this will impact the results. Beliving we could get a better combination of players.

In [14]:
per_game_var = ["PTS", "REB", "AST", "STL", "BLK", "2PA", "3PA"]

Players2 = Players

for col in Players2:
    if col in per_game_var:
        Players2[col] = Players2[col]/Players2["MPG"]

per_game_var = ["PTS", "REB", "AST", "STL", "BLK", "2PA", "3PA"]

Teams2 = Teams

for col in Teams2:
    if col in per_game_var:
        Teams2[col] = Teams2[col]/48

In [15]:
# Creating Decisiom Variables
dec_var = []
for i in range(len(Players)):
     var_name = "x_"+str(i+1)
     var_name = LpVariable(var_name, cat="Binary")
     dec_var.append(var_name)

# Creating the optimization problem
min_salary = LpProblem("min_salary", LpMinimize)

# Defining Objective Function
min_salary += dotproduct(Players["Salary"], dec_var)

In [16]:
# Adding sum statistical constraints
constraint_var = ["3PA", "2PA", "REB", "AST", "STL", "BLK", "PTS"]
for col in constraint_var:
    min_salary += dotproduct(Players[col], dec_var) >= Teams[col].values[-1]

# Adding average statistical constraints
constraint_var = ["3P%", "2P%", "FT%"]
for col in constraint_var:
    min_salary += dotproduct(Players[col], dec_var)/15 >= Teams[col].values[-1]

# Adding total roster constraints
min_salary += dotproduct(Players["Player"], dec_var) == 15

# Adding Positional constraints
posistions = ["Guard", "Forward", "Center", "All Star"]
pos_constraints = [5, 5, 3, 2]

for i in range(0,len(posistions)):
    min_salary += dotproduct(Players[posistions[i]], dec_var) >= pos_constraints[i]

In [17]:
# Optimizing solution
res = min_salary.solve()

In [18]:
# Gathering row indexes for the selected players
indexes = []
for element in dec_var:
    if element.varValue == 1:
        player_index = Players.index[Players["Player_ID"] == str(element)].tolist()
        indexes.append(player_index)


In [19]:
# Creating dataframe from the rows of all the optimal players
optimal_team_rows = []
for x in indexes:
    for y in x:
        row = Players.iloc[y]
        optimal_team_rows.append(row) 

optimal_team = pd.DataFrame(optimal_team_rows, columns=Players.columns)
optimal_team

Unnamed: 0,NAME,TEAM,MPG,FT%,2P%,3P%,PTS,REB,AST,STL,BLK,Salary,All Star,2PA,3PA,Guard,Center,Forward,Player,Player_ID
19,Anthony Edwards,Min,36.0,0.756,0.513,0.368,0.683333,0.161111,0.122222,0.044444,0.019444,10733400,1,0.338608,0.203586,1,0,0,1,x_20
36,Tyrese Haliburton,Ind,33.6,0.867,0.572,0.4,0.616071,0.110119,0.309524,0.047619,0.011905,4215120,1,0.233312,0.213648,1,0,0,1,x_37
92,Kris Dunn,Uta,25.8,0.774,0.55,0.472,0.511628,0.174419,0.217054,0.042636,0.01938,735819,0,0.317125,0.063425,1,0,0,1,x_93
169,Jaden Hardy,Dal,14.8,0.823,0.469,0.404,0.594595,0.128378,0.094595,0.027027,0.006757,1017781,0,0.249155,0.219595,1,0,0,1,x_170
220,Jordan Goodwin,Was,17.8,0.768,0.511,0.322,0.370787,0.185393,0.151685,0.050562,0.022472,900000,0,0.212939,0.106923,1,0,0,1,x_221
225,Cody Zeller,Mia,14.4,0.686,0.649,0.0,0.451389,0.298611,0.048611,0.013889,0.020833,517060,0,0.263889,0.009259,0,1,1,1,x_226
271,Sam Merrill,Cle,11.7,1.0,1.0,0.278,0.42735,0.153846,0.08547,0.068376,0.0,1000000,0,0.068376,0.307692,1,0,0,1,x_272
276,Meyers Leonard,Mil,12.6,0.889,0.636,0.389,0.380952,0.301587,0.007937,0.015873,0.0,105522,0,0.097002,0.15873,0,1,1,1,x_277
294,Omer Yurtseven,Mia,9.2,0.833,0.65,0.429,0.478261,0.282609,0.021739,0.021739,0.021739,1752638,0,0.241546,0.084541,0,1,0,1,x_295
298,Admiral Schofield,Orl,12.2,0.913,0.646,0.324,0.344262,0.139344,0.065574,0.016393,0.008197,506508,0,0.106336,0.163934,0,0,1,1,x_299


In [22]:
for col in optimal_team:
    if col in per_game_var:
        optimal_team[col] = Players2[col]*Players2["MPG"]

optimal_team_sum = optimal_team[["PTS", "REB", "AST", "STL", "BLK", "2PA", "3PA"]].sum()
print(optimal_team_sum)
print("Team Salary: ", optimal_team["Salary"].sum())

PTS    116.100000
REB     41.500000
AST     29.900000
STL      8.600000
BLK      3.900000
2PA     52.089427
3PA     35.661057
dtype: float64
Team Salary:  25915496


Ideas of how to improve the model:
- Improve the all star metric as i think it is rather superficial and is a stand in for a "proven" player or impactful star.
- Add some type of chemistry constraint; in order to account for language barriers or cultural backgrounds. Should have impact on team performance.
- Add playoff experience or leauge experience as a metric; in order to account more for the difference in a playoff series to regular season games. Could impact the all-star metric positivly in combination.
- Make salaries more realistic; account for the overperforming newbees or newly traded and unfitted palyers. This would be difficult and change the premis some, as we would look at the changes in "Signing" the player to said wage by their future wages.
- Use more advanced statistics as they adress alot of the problems with pure point, assist, rebound metrics. Also include shots taken and efficiency as there is a given amount of possesions that a team can possibly have over a game.