# Sportsnet Fantasy

This notebook is a little different from the previous one as it is no longer working in a draft scenario - rather anyone can pick any player they want, however, there is a point value cost associated with it. Formally, we have the problem 

$$
\begin{aligned}
\mbox{maximize} & \;\; \mathbf{r}^T \mathbf{x} -\gamma \mathbf{x}^T \mathbf{Q} \mathbf{x} \\
\mbox{subject to} & \;\; \sum \mathbf{C} \cdot \mathbf{x} \leq 30 \\
& \sum_{east \; goalies} \mathbf{x} = \sum_{west \; goalies} \mathbf{x} = 2 \\
& \sum_{east \; forwards} \mathbf{x} = \sum_{west \; forwards} \mathbf{x} = 3 \\ 
& \sum_{east \; defence} \mathbf{x} = \sum_{west \; defence} \mathbf{x} = 2 \\
& \mathbf{x} \in \mathbb{B}
\end{aligned}
$$
where $\mathbf{x}$ is a binary vector of players, $\mathbf{r}$ is our returns vector, $mathbf{Q}$ is the covariance matrix, $\gamma$ is the risk avoidance parameter and $\mathbf{C}$ is a diagonal matrix of the cost associated with each player. In this sense the problem we are solving is to maximize returns, minimize risk (to our tolerance), all while not exceeding our cost cap. Let's see how that looks first let's import some data we've already prepared. Note that most of the helper files are located in `../scripts/sportsnet_files.py`. 

In [39]:
import numpy as np
import pandas as pd
import cvxpy as cp
import importlib
import sys
import json
sys.path.insert(1, '../')
import scripts.hockey_bots as hockey
import scripts.sportsnet_files as sp
# need to keep reloading for development work because 
# I apparently like Jupyter too much 
importlib.reload(sp)

<module 'scripts.sportsnet_files' from '../scripts/sportsnet_files.py'>

Our first step is to load the index of each player in each conference, and load our already prepared player data.  

In [40]:
east_list, west_list = sp.generateConferenceLists()

player_data = pd.read_csv("../data/processed/sportsnetpoints.csv")

with open('../data/processed/names2019.json') as f:
    d = json.load(f)
    names = pd.Series(d, name='name').reset_index()
    
names.columns = ['player_id', 'name']

player_data['player_id'] = player_data['PlayerId']
player_data = pd.merge(names, player_data, how='inner').dropna()
player_data['div'] = player_data.apply(sp.conference, args=(east_list, west_list), axis=1)

values = pd.read_csv("../data/processed/player_values.csv")


As we're going to be calculating the covariance; it is important that we sort our player data by game number. The idea here is that when something happens is important in the covariannce calculation, so we need to sort our data accordingly before we do much else. 

In [41]:
player_data = pd.merge(values[['name', 'PV']], player_data, on = 'name', how='left')
player_data = player_data.dropna(subset=['div'])
player_data = player_data.sort_values(by=['name', 'gamenum'])
scores = player_data[['PlayerId', 'points',]].groupby('PlayerId').agg(lambda x: list(x)).reset_index()

Now, we convert all the player points into a matrix from which we can calculate the covariance of each "asset" or player. Below is a big matrix of all points that the players have gotten, and using this we will calculate a returns vector and covariance matrix. 

In [59]:
all_points = pd.DataFrame(scores.points.tolist()).T
all_points.columns = scores.PlayerId
all_points = all_points.fillna(0)
all_points = (all_points - all_points.min().max())/(all_points.max().max() - all_points.min().min())
idx = list(all_points.mean().sort_values(ascending=False).index)
player_data['primaryPosition'] = player_data['position']
all_points.head()

PlayerId,ID8465009,ID8466139,ID8468508,ID8468674,ID8468685,ID8469454,ID8469455,ID8469459,ID8469608,ID8470187,...,ID8480800,ID8480830,ID8480849,ID8480871,ID8480873,ID8480925,ID8480945,ID8481523,ID8481554,ID8481624
0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.166667,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.166667,0.0,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.166667,0.0,0.5,0.0,0.0,0.166667,0.0,0.0,0.5,0.166667,...,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.166667,0.0,0.166667,0.0,0.166667,0.166667,0.0,0.0,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.5,0.0,0.0,0.5,0.166667,0.333333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now, the tedious task of getting all the indexes of each player so we can recover who is who later

In [43]:
pointies = list(all_points.mean().index)

defence = hockey.position_indexes(pointies,all_points,player_data,idx, "D")
center = hockey.position_indexes(pointies,all_points,player_data,idx, "C")
goalie = hockey.position_indexes(pointies,all_points, player_data,idx,"G")
right_wingers = hockey.position_indexes(pointies, all_points,player_data,idx,"R")
left_wingers = hockey.position_indexes(pointies, all_points,player_data,idx,"L")

forward = center + right_wingers + left_wingers

east = sp.conferenceIndex('E', player_data)
west = sp.conferenceIndex('W', player_data)

e_defence = [i for i in defence if i in east]
w_defence = [i for i in defence if i in west]

w_goalie = [i for i in goalie if i in west]
e_goalie = [i for i in goalie if i in east]

w_forward = [i for i in forward if i in west]
e_forward = [i for i in forward if i in east]


len(defence), len(e_defence) + len(w_defence)

(133, 133)

Finally, perform the optimization.

In [64]:
team =  sp.sportnet_optim(all_points, 
                          [], 
                          [], 
                           0.4, 
                          player_data,
                          e_defence,
                          w_defence,
                          e_goalie,
                          w_goalie,
                          e_forward,
                          w_forward,
                          team_size = 12,
                          ed = 2,
                          wd=2,
                          eg = 1,
                          wg =1,
                          ef = 3,
                          wf = 3)

optimal


And now let's look at our team

In [65]:
sp.displayTeam(player_data, team[0], all_points)

Unnamed: 0.1,name,PV,player_id,Unnamed: 0,PlayerId,gamenum,team,points,position,div,primaryPosition
27327,Claude Giroux,2,ID8473512,38624.0,ID8473512,0.0,4.0,1.0,C,E,C
63894,Connor Hellebuyck,3,ID8476945,20407.0,ID8476945,0.0,52.0,1.0,G,W,G
44776,Erik Gustafsson,1,ID8476979,34119.0,ID8476979,0.0,16.0,1.0,D,W,D
52137,John Carlson,3,ID8474590,38882.0,ID8474590,0.0,15.0,2.0,D,E,D
46590,John Klingberg,1,ID8475906,28990.0,ID8475906,0.0,25.0,1.0,D,W,D
19050,Jonathan Toews,2,ID8473604,34120.0,ID8473604,0.0,16.0,2.0,C,W,C
61946,Keith Yandle,2,ID8471735,43271.0,ID8471735,0.0,13.0,0.0,D,E,D
3995,Leon Draisaitl,4,ID8477934,5789.0,ID8477934,0.0,22.0,2.0,C,W,C
31621,Nikita Kucherov,4,ID8476453,43260.0,ID8476453,0.0,14.0,0.0,R,E,R
24416,Sean Couturier,2,ID8476461,38636.0,ID8476461,0.0,4.0,0.0,C,E,C


As we get players who have double points, our best bet is to do another optimization and choose the players who we should invest the most heavily in as our "double points" people. 

In [62]:
rams = sp.ram_selection(team[0], all_points, player_data, 10)
sp.displayTeam(player_data, rams, all_points)

[6.96085490e-02 1.03440558e-01 1.07333227e-22 1.68175545e-01
 1.01414117e-01 4.55633152e-02 5.75815191e-02 6.31759438e-23
 1.23225231e-01 1.21832268e-01 2.09158898e-01 4.55269448e-23]


Unnamed: 0.1,name,PV,player_id,Unnamed: 0,PlayerId,gamenum,team,points,position,div,primaryPosition
21667,Aleksander Barkov,3,ID8477493,43288.0,ID8477493,0.0,13.0,0.0,C,E,C
3995,Leon Draisaitl,4,ID8477934,5789.0,ID8477934,0.0,22.0,2.0,C,W,C
31621,Nikita Kucherov,4,ID8476453,43260.0,ID8476453,0.0,14.0,0.0,R,E,R


In [66]:
player_data[player_data.player_id.isin(list(all_points.iloc[:,team[0]]))].points.sum()

1553.0