<h1>Fantasy Basketball Part Two </h1>

<h2> Introduction </h2>

If you're familiar with fantasy basketball and worked through our **first example** you may have noticed that the lineup we solved for doesn't actually fit the criteria for many popular contests. It was meant as a ground-floor example into mathematical optimization, so make sure to dive into that one first for background on the problem. Additionally, we developed a predictive model to forecast each player's fantasy points, and we'll again use that forecast (without going through the modeling process again).

In this follow-up example, we will extend the optimization model to fulfill the requirements of a typical [DraftKings](https://www.draftkings.com/help/rules/nba) lineup. Specifically, we'll need to change the model to:
- Allow some players to be selected for up to two positions (e.g. LeBron James can fill the PF or SF position)
- Increase the roster from five to eight players
- Ensure the three new roster slots consist of one guard (SG or PG), one forward (SF or PF), and one utility player (any position)

A quick recap of what we saw in the prior example: When selecting a 5-player lineup, there were *a lot* of possibilities; staying under a salary cap wasn't all that easy; and implementing intuitive decision rules can lead to suboptimal lineups -- providing a glimpse into why mathematical optimization should be part of everyone's analytics toolkit. 

**The repository for this project can be accessed by following this link:**


<h2> Objective and Prerequisites </h2>

As with the first example, the goal here is to build upon our findings of the previous example and select the optimal lineup of players of the National Basketball Association (NBA) that would produce the highest total of fantasy points, which are composed of a player's in-game stats. 

As mentioned above, we will make a lineup that meets DraftKings contest rules. There were over 2 million possible lineups when selecting a 5-player team that contained one of each position. Can you figure out the number of possible lineups for this example? Here's a hint: It's a lot. Remember that there are 25 PGs, 23 SFs, 22 SGs, 19 PFs, and 9 Cs. The answer (approximately) will be given a bit later. 

Also as mentioned, this example will use the same forecasting results as our fantasy basketball beginner's example, which used historical data from the 2016-2017 and 2017-2018 seasons to predict each player's fantasy points for games on 12/25/2017.

This example assumes you have experience using Python for data manipulation and requires the installation of the following packages:

- **pandas**: for data analysis and manipulation 
- **math**: for mathematical manipulations
- **gurobipy**: for utilizing Gurobi to formulate and solve the optimization problem

We'll also explore a few different ways to write summations and constraints in gurobipy, so you can find what works best for you. A quick note: Any output you see similar to <gurobi.Constr \*Awaiting Model Update\*> can be ignored and semi-colons have been added to cells to suppress that output. 

<h2> Problem Statement and Solution Approach</h2>

By formuating a model that allows some players to be assigned to more than one position and by also expanding the roster, we are making the model much more complex.  

Our final lineup needs to include each of the following:
- Point Guard (PG)
- Shooting Guard (SG)
- Shooting Forward (SF)
- Power Forward (PF)
- Center (C)
- Guard (SG,PG)
- Forward (SF,PF)
- Utility (PG, SG, SF, PF, C)

The solution of the problem consists of two components: 1) **fantasy points forecast** and 2) **lineup optimization**.

We'll start by loading a dataset for the eligible players, showing their potential positions, salaries, as well as the forecasts for their upcoming performance. This information will act as an input to our optimization model which will guarantee that we satisfy the rules of the DraftKings contest while maximizing the total fantasy points of our team.

<h3> Fantasy Points Forecast </h3>

This section will be short as we are using the output of the predictive model from the first part of this example. We'll see that in this version some players have a *Main Position* and *Alternative Position*, which will let players fill different position slots if eligible. For example, James Harden can fill the PG or SG position, though Russel Westbrook is only available as a PG.

We begin by loading the necessary libraries to solve our problem and importing the data from the predictive model in part one of the problem. 

In [2]:
import pandas as pd                                       #importing pandas
import math                                               #importing math
from gurobipy import Model, GRB, quicksum                 #importing Gurobi

In [3]:
player_predictions = pd.read_csv('results_target_advanced.csv')     #load processed dataset
player_predictions.sort_values(by='PredictedFantasyPoints',ascending=False).head(20)

Unnamed: 0,Player,MainPosition,AlternativePosition,Team,Opp,Salary,PredictedFantasyPoints,Points/Salary Ratio
4,Joel Embiid,C,,PHI,NYK,9500,51.313689,5.401441
0,James Harden,PG,SG,HOU,OKC,11100,48.809577,4.397259
1,LeBron James,SF,PF,CLE,GSW,11000,48.149718,4.377247
2,Russell Westbrook,PG,,OKC,HOU,10900,44.007224,4.03736
3,Kevin Durant,SF,PF,GSW,CLE,10500,43.438575,4.137007
19,Dario Saric,PF,C,PHI,NYK,6200,40.505486,6.533143
5,Ben Simmons,PG,SF,PHI,NYK,9300,38.692817,4.160518
12,Kyle Kuzma,SF,PF,LAL,MIN,7300,38.201774,5.23312
8,Jimmy Butler,SG,,MIN,LAL,8400,37.873164,4.50871
13,Draymond Green,PF,C,GSW,CLE,7200,37.018949,5.141521


Have you figured out the total number of possible lineups yet? It's around $3.6 \times 10^{11}$, which is a lot. 

This is where mathematical optimization and Gurobi are best utilized: To efficiently explore a huge decision space and provide a much needed tool for optimal decision-making.

<h3> Optimal DraftKings Lineup Selection </h3>

As we set up our optimization model, we first need to make some definitions. Some of this is the same as before, but as this example is a bit more complicated, we'll need to be a bit more thorough in some definitions.

**Sets and Indices**

$i$ is the index for the set of all players 

$j$ is the index for the set of basketball positions (PG,SG,SF,PF,C)

**Input Parameters**

$p_{i}$: the predicted fantasy points of player $i$ 

$s_{i}$: the salary of player $i$ 

$S$: our total available salary

In [4]:
players = player_predictions["Player"].tolist()
positions = player_predictions["MainPosition"].unique().tolist()
salaries = player_predictions["Salary"].tolist()
fantasypoints = player_predictions["PredictedFantasyPoints"].tolist()
S = 50000

salary_dict = {players[i]: salaries[i] for i in range(len(players))}
points_dict = {players[i]: fantasypoints[i] for i in range(len(players))}

m = Model()

Restricted license - for non-production use only - expires 2023-10-25


**Decision Variables**

Since it is possible for certain players to fill one of two positions, we need to map each player to their eligible positions. Additionally, we need to add the position index to our decision variable. Instead of a binary variable (i.e. a variable that only takes the values of 0 or 1) $y_i$ we have $y_{i,j}$.

$y_{i,j}$: This variable is equal to 1 if player $i$ is selected at position $j$; and 0 otherwise.

In [5]:
mainposition = list(zip(player_predictions.Player, player_predictions.MainPosition))   
alternativeposition = list(zip(player_predictions.Player, player_predictions.AlternativePosition))               
indices = mainposition+alternativeposition
player_pos_map = [t for t in indices if not any(isinstance(n, float) and math.isnan(n) for n in t)]

y = m.addVars(player_pos_map, vtype=GRB.BINARY, name="y")

**Objective Function**

The objective function of our problem is to maximize the total fantasy points of our lineup, same as the last time but using the differently indexed decision variable.

\begin{align}
 Max \hspace{0.2cm} Z = \sum_{i,j} p_{i} \cdot y_{i,j}
\end{align}

In [6]:
m.setObjective(quicksum([points_dict[i]*y[i,j] for i,j in player_pos_map]), GRB.MAXIMIZE)

**Constraints**

Our model still requires each of the primary basketball positions (PG, SG, SF, PF, C) to be filled. Last time we required exactly one of each, so these constraints were equalities (note that this also implied our roster size of five players). In this version of our model, we need *at least one* of each position since it is possible to use, for example, only one power forward (PF). 

For each position $j$:
\begin{align}
\sum_{i} y_{i,j} \geq 1
\end{align}

Here is where we have a couple of options in how to add these constraints. The first, intuitively, is to loop through the set of positions. The second uses a slightly different function to add the constraints (*addConstr* -> *addConstrs*). This function incorporates the for loop directly as an argument. **Running both will double up on the constraints in the model, which isn't best practice, so take a look at each approach and comment one out before running the cell.**

In [7]:
# option 1 for writing the above constraints
for j in positions:
    m.addConstr(quicksum([y[i,j] for i, pos in player_pos_map if pos==j])>=1, name = "pos" + j)

# option 2, a slightly more compact way of adding the same set of constraints
m.addConstrs((quicksum([y[i,j] for i, pos in player_pos_map if pos==j])>=1 for j in positions), name = "pos");

Now let's work with the additional slots of our new roster: the guard, forward, and utility slots. Here it's worth mentioning that there are different ways to formulate the same problem in mathematical optimization. Some approaches can be better than others and writing efficient models will become an important skill as you tackle larger and more complex problems. 

Let's first consider the additional slot for the guard position, which can be filled by a PG or a SG. Considering the overall roster we want to create, we have already guaranteed one PG and one SG (so two total guards). To make sure we get one more additional guard, a constraint needs to added that says the total number of guards needs to be *at least* three. 

\begin{align}
\sum_{i} y_{i,j} \geq 3, position\space j \space is\space PG\space or \space SG
\end{align}

The same needs to be done for forwards (SF or PF)
\begin{align}
\sum_{i} y_{i,j} \geq 3, position\space j \space is\space SF\space or \space PF
\end{align}

It is important that we use inequalities here because of the utility position we'll address in a little bit. 

In [8]:
m.addConstr(quicksum([y[i,j] for i, j in player_pos_map if (j=='PG' or j=='SG')])>=3)
m.addConstr(quicksum([y[i,j] for i, j in player_pos_map if (j=='SF' or j=='PF')])>=3);

Now that we have a position index for our decision variable, we need to ensure each player is assigned to *at most* one position (either their primary or alternative, but not both). To do this we sum across each position $j$ for each player and limit that summation to one. 

For each player $i$,
\begin{align}
\sum_{j} y_{i,j} \leq 1
\end{align}

Here we'll use another way to sum over one of the variable's indices by appending *.sum* to the end and replacing the index we want to sum over with "\*". This is useful because each player is not eligible for each position and this syntax automatically sums over the second index. 

In [9]:
m.addConstrs((y.sum(i, "*") <= 1 for i in players), name="max_one");

A good exercise would be to rewrite other summations using this syntax and to try and the above constraint using quicksum. 

So far we have addressed seven of our eight lineup slots. Since the utility slot can have any position, all we need is to require the total number of players selected to be eight by setting the sum over all players and positions to that value.  

\begin{align}
\sum_{i,j} y_{i,j} = 8
\end{align}

In [10]:
m.addConstr(quicksum([y[i,j] for i,j in player_pos_map]) == 8, name="full_lineup");

Finally, we need to stay under the salary cap $S$, which in a typical DraftKings contest is $\$50,000$:

\begin{align}
\sum_{i,j} s_{i} \cdot y_{i,j} \leq S
\end{align}

In [11]:
cap = m.addConstr(quicksum(salary_dict[i]*y[i,j] for i,j in player_pos_map) <= S, name="salary")

In this last constraint, we stored it to be able to easily get information about this constraint after the model runs and the model is updated just as if we didn't store the constraint. Time to find the optimal lineup. 

In [12]:
m.optimize()  # optimize our model

Gurobi Optimizer version 9.5.1 build v9.5.1rc2 (mac64[rosetta2])
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 112 rows, 167 columns and 975 nonzeros
Model fingerprint: 0x9c23fd93
Variable types: 0 continuous, 167 integer (167 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+04]
  Objective range  [7e+00, 5e+01]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 5e+04]
Found heuristic solution: objective 206.2597683
Presolve removed 34 rows and 0 columns
Presolve time: 0.00s
Presolved: 78 rows, 167 columns, 779 nonzeros
Variable types: 0 continuous, 167 integer (167 binary)
Found heuristic solution: objective 268.7822348

Root relaxation: objective 2.864073e+02, 71 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0  286.40727    0    2  268.78223  286.40727 

Notice the output above says we have 167 binary variables, which is much less than $98 \cdot 5 = 490$ variables we'd have if we mapped all players to all basketball positions. Additionally, we would need more constraints to eliminate ineligible player/position combinations. While this wouldn't make much difference in this small example, formulating efficient models is a valuable skill in mathematical optimization. 

Let's display our optimal lineup. 

In [13]:
player_selections = []
for v in m.getVars():
    if (abs(v.x) > 1e-6):
        player_selections.append(tuple(y)[v.index])
        
df = pd.DataFrame(player_selections, columns = ['Player','Assigned Position'])
df = df.merge(pd.DataFrame(list(salary_dict.items()), columns=['Player', 'Salary']), left_on=['Player'], right_on=['Player'])
lineup = df.merge(pd.DataFrame(list(points_dict.items()), columns=['Player', 'Predicted Points']), left_on=['Player'], right_on=['Player'])

lineup.sort_values(by=['Assigned Position'])

Unnamed: 0,Player,Assigned Position,Salary,Predicted Points
0,Joel Embiid,C,9500,51.313689
6,Jordan Bell,C,4900,33.083296
2,Draymond Green,PF,7200,37.018949
3,Dario Saric,PF,6200,40.505486
4,Jeff Teague,PG,6000,31.460451
5,Jarrett Jack,PG,4600,27.780012
1,Kyle Kuzma,SF,7300,38.201774
7,Josh Hart,SG,3700,24.262564


In [14]:
print('Total fantasy score: ', round(m.objVal,2))
print('Remaining salary: ', cap.Slack)

Total fantasy score:  283.63
Remaining salary:  600.0


The last print statement uses the constraint we stored earlier. The *Slack* attribute of a constraint will show any gap between each side of the inequality. For this application, it's the gap between the salary cap and the amount of salary used in the optimal lineup, which shows there is $600 in unused salary. 

<h2> Conclusion </h2>

In this notebook we finished up a problem that started from a raw data set containing NBA player box score data and ended in, if our predictive model holds, an optimal fantasy basketball lineup. 

Specifically in this part, we:
- Expanded the initial model to reflect the true complexity of creating a fantasy basketball lineup
- Discovered multiple ways to add a set of constraints to a model
- Utilized two summation commands
- Used attributes of a model to get more information about the optimal solution

Overall, this two-part example displayed how, even with the best predictive model, making optimal decisions is still a complicated exercise. Along with machine learning techniques, mathematical optimization is an essential part of a well-developed analytical toolbox.