### CS/ECE/ISyE 524 — Introduction to Optimization — Spring 2018
# Fantasy Baseball Roster Optimizer
## Brian Broeking (broeking@wisc.edu)

# Introduction 

As fantasy sports grew in popularity over the course of the 2000's, players became dissatisfied in the season long format. Injuries to key players on a team could take your team out of contention early in the season. Players demanded a more flexible fantasy format and daily fantasy sports were born. Draftkings is one of the major providers of daily fantasy sports in the United States.

Everyday, Draftkings runs online competitions where between 2 to 50,000+ people compete to build the best fantasy baseball lineup for that day. The best lineups are formed by rostering players who perform well in live Major League Baseball games during that day and in turn score points for the lineup.

<img src="ReportResources/contest.png" />

[Draftkings](https://www.draftkings.com/lobby) provides many different tournament formats for players to choose, but for this project we will focus on the GPP tournaments also known as Guarentee Prize Pool tournaments.  These tournaments have top heavy prizing where players placing in the top 2-5 percent receive a vast majority of the reward compared to the rest of the entrants. 

In the project, we sought to discover methods of creating optimal lineups based on projected fantasy points generated by [SaberSim,](https://www.fangraphs.com/dailyprojections.aspx?pos=all&stats=bat&type=sabersim&team=0&lg=all&players=0) a fantasy baseball statistics organization. Using those projected points as a baseline, we combine the projections with Draftkings lineup builder csv to lineups like the one pictured below.

<img src="ReportResources/prizing.png" />

<img src="ReportResources/entry.png" />

# Mathematical Model

# Solutions
## Splitting input CSV file we generate in Java

In [3]:
using JuMP, Cbc, NamedArrays

raw = readcsv("output.csv");

# Reading the join csv we generated in Java
# Player Names
names = raw[:,1];
#Player Salaries
salaries = raw[:,2];
# Positions the player plays
position = raw[:, 3];
# Projected points from SaberSim
projectedPoints = raw[:,4];
# Average points from their last 5 outings
averagePoints = raw[:,5];
# Team the player plays for
playerTeam = raw[:,6];
#Team the player is playing against
opponentTeam = raw[:,7];
# Number of unique players
n = length(names)

## Determine player position eligibilities

In [4]:
# Array that holds the index of each player able to play the position that day
# Positional Array
pitchers = Int64[]
catchers = Int64[]
first = Int64[]
second = Int64[]
third = Int64[]
shortStop = Int64[]
fielders = Int64[]

# Adds player to a positional array 
for i in 1:n
 if contains(position[i], "P")
  append!(pitchers, i)
 end
 if contains(position[i], "C")
  append!(catchers, i)
 end
 if contains(position[i], "1B")
  append!(first, i)
 end
 if contains(position[i], "2B")
  append!(second, i)
 end
 if contains(position[i], "3B")
  append!(third, i)
 end
 if contains(position[i], "SS")
  append!(shortStop, i)
 end
 if contains(position[i], "OF")
  append!(fielders, i)
 end
end

## Associate player index with the team they play for

In [5]:
# NL East
phillies = Int64[]
atlanta = Int64[]
miami = Int64[]
washington = Int64[]
mets = Int64[]

# AL East
yankees = Int64[]
baltimore = Int64[]
tampa = Int64[]
boston = Int64[]
toronto = Int64[]

# NL Central
milwaukee = Int64[]
pittsburg = Int64[]
reds = Int64[]
cardinals = Int64[]
chicago = Int64[]

# AL Central
whitesox = Int64[]
royals = Int64[]
tigers = Int64[]
cleveland = Int64[]
twins = Int64[]

# AL West
seattle = Int64[]
angels = Int64[]
oakland = Int64[]
houston = Int64[]
texas = Int64[]

# NL West
dodgers = Int64[]
arizona = Int64[]
padres = Int64[]
giants = Int64[]
rockies = Int64[]

0-element Array{Int64,1}

In [6]:
# NL East
for i in 1:n
 if contains(playerTeam[i], " PHI")
  append!(phillies, i)
 end
 if contains(playerTeam[i], " ATL")
  append!(atlanta, i)
 end
 if contains(playerTeam[i], " MIA")
  append!(miami, i)
 end
 if contains(playerTeam[i], " WAS")
  append!(washington, i)
 end
 if contains(playerTeam[i], " NYM")
  append!(mets, i)
 end
    
# AL East
 if contains(playerTeam[i], " NYY")
  append!(yankees, i)
 end
 if contains(playerTeam[i], " BAL")
  append!(baltimore, i)
 end
 if contains(playerTeam[i], " TB")
  append!(tampa, i)
 end
 if contains(playerTeam[i], " BOS")
  append!(boston, i)
 end
 if contains(playerTeam[i], " TOR")
  append!(toronto, i)
 end
    
 # NL Central
 if contains(playerTeam[i], " MIL")
  append!(milwaukee, i)
 end
 if contains(playerTeam[i], " PIT")
  append!(pittsburg, i)
 end
 if contains(playerTeam[i], " CIN")
  append!(reds, i)
 end
 if contains(playerTeam[i], " STL")
  append!(cardinals, i)
 end
 if contains(playerTeam[i], " CHC")
  append!(chicago, i)
 end
    
# AL Central
  if contains(playerTeam[i], " CWS")
  append!(whitesox, i)
 end
 if contains(playerTeam[i], " KCR")
  append!(royals, i)
 end
 if contains(playerTeam[i], " CLE")
  append!(cleveland, i)
 end
 if contains(playerTeam[i], " DET")
  append!(detroit, i)
 end
 if contains(playerTeam[i], " MIN")
  append!(twins, i)
 end

 if contains(playerTeam[i], " SEA")
  append!(seattle, i)
 end
 if contains(playerTeam[i], " LAA")
  append!(angels, i)
 end
 if contains(playerTeam[i], " OAK")
  append!(oakland, i)
 end
 if contains(playerTeam[i], " HOU")
  append!(houston, i)
 end
 if contains(playerTeam[i], " TEX")
  append!(texas, i)
 end
    
 if contains(playerTeam[i], " LAD")
  append!(dodgers, i)
 end
 if contains(playerTeam[i], " ARI")
  append!(arizona, i)
 end
 if contains(playerTeam[i], " SD")
  append!(padres, i)
 end
 if contains(playerTeam[i], " SFG")
  append!(giants, i)
 end
 if contains(playerTeam[i], " COL")
  append!(rockies, i)
 end

end

## Associate player with the team they are opposing

In [7]:
# NL East
oppphillies = Int64[]
oppatlanta = Int64[]
oppmiami = Int64[]
oppwashington = Int64[]
oppmets = Int64[]

# AL East
oppyankees = Int64[]
oppbaltimore = Int64[]
opptampa = Int64[]
oppboston = Int64[]
opptoronto = Int64[]

# NL Central
oppmilwaukee = Int64[]
opppittsburg = Int64[]
oppreds = Int64[]
oppcardinals = Int64[]
oppchicago = Int64[]

# AL Central
oppwhitesox = Int64[]
opproyals = Int64[]
opptigers = Int64[]
oppcleveland = Int64[]
opptwins = Int64[]

# AL West
oppseattle = Int64[]
oppangels = Int64[]
oppoakland = Int64[]
opphouston = Int64[]
opptexas = Int64[]

# NL West
oppdodgers = Int64[]
opparizona = Int64[]
opppadres = Int64[]
oppgiants = Int64[]
opprockies = Int64[]

0-element Array{Int64,1}

In [8]:
# NL East
for i in 1:n
 if contains(opponentTeam[i], " PHI")
  append!(oppphillies, i)
 end
 if contains(opponentTeam[i], " ATL")
  append!(oppatlanta, i)
 end
 if contains(opponentTeam[i], " MIA")
  append!(oppmiami, i)
 end
 if contains(opponentTeam[i], " WAS")
  append!(oppwashington, i)
 end
 if contains(opponentTeam[i], " NYM")
  append!(oppmets, i)
 end
    
# AL East
 if contains(opponentTeam[i], " NYY")
  append!(oppyankees, i)
 end
 if contains(opponentTeam[i], " BAL")
  append!(oppbaltimore, i)
 end
 if contains(opponentTeam[i], " TB")
  append!(opptampa, i)
 end
 if contains(opponentTeam[i], " BOS")
  append!(oppboston, i)
 end
 if contains(opponentTeam[i], " TOR")
  append!(opptoronto, i)
 end
    
 # NL Central
 if contains(opponentTeam[i], " MIL")
  append!(oppmilwaukee, i)
 end
 if contains(opponentTeam[i], " PIT")
  append!(opppittsburg, i)
 end
 if contains(opponentTeam[i], " CIN")
  append!(oppreds, i)
 end
 if contains(opponentTeam[i], " STL")
  append!(oppcardinals, i)
 end
 if contains(opponentTeam[i], " CHC")
  append!(oppchicago, i)
 end
    
# AL Central
  if contains(opponentTeam[i], " CWS")
  append!(oppwhitesox, i)
 end
 if contains(opponentTeam[i], " KCR")
  append!(opproyals, i)
 end
 if contains(opponentTeam[i], " CLE")
  append!(oppcleveland, i)
 end
 if contains(opponentTeam[i], " DET")
  append!(oppdtigers, i)
 end
 if contains(opponentTeam[i], " MIN")
  append!(opptwins, i)
 end

 if contains(opponentTeam[i], " SEA")
  append!(oppseattle, i)
 end
 if contains(opponentTeam[i], " LAA")
  append!(oppangels, i)
 end
 if contains(opponentTeam[i], " OAK")
  append!(oppoakland, i)
 end
 if contains(opponentTeam[i], " HOU")
  append!(opphouston, i)
 end
 if contains(opponentTeam[i], " TEX")
  append!(opptexas, i)
 end
    
 if contains(opponentTeam[i], " LAD")
  append!(oppdodgers, i)
 end
 if contains(opponentTeam[i], " ARI")
  append!(opparizona, i)
 end
 if contains(opponentTeam[i], " SD")
  append!(opppadres, i)
 end
 if contains(opponentTeam[i], " SFG")
  append!(oppgiants, i)
 end
 if contains(opponentTeam[i], " COL")
  append!(opprockies, i)
 end
end

## Key constraints for the model

### Salary Constraint (Constraint 1)

All Draftkings lineups will have to have less than 50,000 in player salary

### Positional Constraint (Constraint 2)

Each Draftkings lineups is required to have 2 pitchers, 1 catcher, 1 first baseman, 1 second baseman, 1 shortstop,
1 third baseman, and 3 outfielders. This results in 10 total players.

### No Batter vs Pitcher Constraint (Constraint 3)

Here we generate a named array that maps team abbreviations to an array of opponents. We use this in the model to check that if a pitcher is being played that no batters are played against him. This is implemented because there is negative correlation between the success of a batter and the success of an opposing pitcher.

For example:
When a batter get a hit off a pitcher, he is awarded 3 points for the single. Pitchers are penalized every time they allow a hit and get -0.60 points. Therefore in the quest to maximize potential scoring for a lineup, it never make sense to play a batter vs. a opposing pitcher

In [21]:
# total salary
salary_cap = 50000

#number of players at each position
no_pitchers = 2
no_catchers = 1
no_first = 1
no_second = 1
no_third = 1
no_ss = 1
no_of = 3
total_players = 10

# No batter vs pitcher constraint
team_abbv = [" PHI", " ATL", " MIA", " WAS", " NYM",
" NYY", " BAL", " TB", " BOS", " TOR", 
" MIL", " PIT", " CIN", " STL", " CHC", 
" CWS", " KCR", " CLE"," DET"," MIN", 
" SEA", " LAA"," OAK"," HOU"," TEX",
" LAD", " ARI", " SD", " SFG", " COL"]

# In the model, we will look up a pitcher's team. With this information, we will get the index of 
# all of the players who will be excluded because they would be opposing our pitcher
noplaylist = Dict(zip(team_abbv, [oppphillies, oppatlanta, oppmiami, oppwashington, oppmets,
            oppyankees, oppbaltimore, opptampa, oppboston, opptoronto,
            oppmilwaukee, opppittsburg, oppreds, oppcardinals, oppchicago,
            oppwhitesox, opproyals, oppcleveland, opptigers, opptwins,
            oppseattle, oppangels, oppoakland, opphouston, opptexas,
            oppdodgers, opparizona, opppadres, oppgiants, opprockies]))


0-element Array{Array{Int64,N} where N,1}

### Stacking Constraint (Constraint 4)
Stacking lineups has been proven to be the most effective and efficient way to make your Draftkings lineups better. [(Psychology of Stacking)](https://rotogrinders.com/articles/mlb-dfs-psychology-of-stacking-731014) In general, this is because each team has 27 outs per game on the offensive side. The less outs a player makes, the more chances other teammates are given over the course of the game. As a product of the additional at-bats, we are granted more opportunities to generate points. 

In [114]:
function generate_lineup(noplaylist, lineups, phillies, atlanta, miami, washington, mets,
            yankees, baltimore, tampa, boston, toronto,
            milwaukee, pittsburg, reds, cardinals, chicago,
            whitesox, royals, cleveland, tigers, twins,
            seattle, angels, oakland, houston, texas,
            dodgers, arizona, padres, giants, rockies)
    
    m = Model(solver = CbcSolver())
    
    salary_cap = 50000
    
    no_pitchers = 2
    no_catchers = 1
    no_first = 1
    no_second = 1
    no_third = 1
    no_ss = 1
    no_of = 3
    total_players = 10
    num_overlap = 7

    #binary variable for whether a player is chosen or not
    @variable(m, x[1:n], Bin)

    # Constraint 1
    # total salary must be under 50K
    @constraint(m, sum(x .* salaries) <= salary_cap)

    # Constraint 2
    # constraints must fit line up, eg. two pitchers, one catcher, one 1B etc 
    @constraint(m, sum(x[i] for i in pitchers) == no_pitchers)
    @constraint(m, sum(x[i] for i in catchers) == no_catchers)
    @constraint(m, sum(x[i] for i in first) == no_first)
    @constraint(m, sum(x[i] for i in second) == no_second)
    @constraint(m, sum(x[i] for i in third) == no_third)
    @constraint(m, sum(x[i] for i in shortStop) == no_ss)
    @constraint(m, sum(x[i] for i in fielders) == no_of)
    @constraint(m, sum(x[i] for i in 1:n) == total_players)

    # Constraint 3
    # no batter vs pitcher constraint
    for i = 1:length(pitchers)
        for j in noplaylist[playerTeam[i]]
          @constraint(m, (x[i] + x[j]) <= 1)
        end
    end

    # Constraint 4
    # In this example we want between 4 and 5 players from the Los Angeles Angels
    # and between 3 and 4 players from the Oakland A's
    # We will discuss 
    @constraint(m, 4 <= sum(x[i] for i in angels) <= 5)
    @constraint(m, 3 <= sum(x[i] for i in oakland) <= 4)

    @constraint(m, constr[i=1:size(lineups)[1]], sum( x[(lineups[i][j])] for j=1:10 ) <= num_overlap)

    
    #objective to maximize projected points
    @objective(m, Max, sum(x .* projectedPoints))

    solve(m)

    x = getvalue(x)

    lineup = Int64[]
    
    for i in 1:length(x)
     if x[i] == 1
      println(names[i] , " " , position[i], " ", salaries[i])
      append!(lineup, i)
     end
    end
    
    push!(lineups, lineup)
    println(getobjectivevalue(m))
end

generate_lineup (generic function with 3 methods)

#### Infeasible lineup added to lineups array to allow for consistant overlap checking

In [115]:
lineups = Array{Int64}[]
first = Int64[]
push!(first, 1), push!(first, 2)
push!(first, 3), push!(first, 4), push!(first, 5), push!(first, 6)
push!(first, 7), push!(first, 8), push!(first, 9), push!(first, 10)
push!(lineups, first)

1-element Array{Array{Int64,N} where N,1}:
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

## Generates 10 lineups given the following variables

In [116]:
for i in 1:10
    generate_lineup(noplaylist, lineups, phillies, atlanta, miami, washington, mets,
            yankees, baltimore, tampa, boston, toronto,
            milwaukee, pittsburg, reds, cardinals, chicago,
            whitesox, royals, cleveland, tigers, twins,
            seattle, angels, oakland, houston, texas,
            dodgers, arizona, padres, giants, rockies)
end

Justin Verlander  P 13200
Khris Davis  OF 4700
Justin Upton  OF 4300
J.T. Realmuto  C 4200
Ross Stripling  P 4000
Marcus Semien  SS 3800
Matt Olson  1B 3800
Jabari Blash  OF 3500
Ian Kinsler  2B 3300
Luis Valbuena  1B/3B 3100
129.92
Justin Verlander  P 13200
Joey Votto  1B 4800
Khris Davis  OF 4700
Justin Upton  OF 4300
Ross Stripling  P 4000
Marcus Semien  SS 3800
Ian Kinsler  2B 3300
Luis Valbuena  1B/3B 3100
Matt Joyce  OF 3100
Rene Rivera  C 2700
129.0
Justin Verlander  P 13200
Justin Upton  OF 4300
J.T. Realmuto  C 4200
Ross Stripling  P 4000
Andrelton Simmons  SS 3900
Matt Olson  1B 3800
Mark Canha  1B/OF 3500
Ian Kinsler  2B 3300
Luis Valbuena  1B/3B 3100
Matt Joyce  OF 3100
128.87
Justin Verlander  P 13200
Joey Votto  1B 4800
Khris Davis  OF 4700
Justin Upton  OF 4300
Ross Stripling  P 4000
Marcus Semien  SS 3800
Matt Olson  1B 3800
Jabari Blash  OF 3500
Zack Cozart  2B/3B 3200
Rene Rivera  C 2700
128.09
Justin Verlander  P 13200
Joey Votto  1B 4800
Justin Upton  OF 4300
Ross S

In [117]:
lineups

11-element Array{Array{Int64,N} where N,1}:
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]        
 [1, 15, 21, 24, 33, 39, 40, 51, 60, 71]
 [1, 13, 15, 21, 33, 39, 60, 71, 73, 83]
 [1, 21, 24, 33, 36, 40, 52, 60, 71, 73]
 [1, 13, 15, 21, 33, 39, 40, 51, 67, 83]
 [1, 13, 21, 33, 39, 51, 52, 60, 71, 74]
 [3, 13, 21, 24, 33, 39, 51, 60, 71, 73]
 [3, 13, 15, 21, 24, 33, 36, 52, 60, 71]
 [1, 13, 15, 21, 33, 36, 38, 52, 60, 83]
 [1, 15, 20, 21, 24, 36, 38, 40, 51, 60]
 [1, 13, 21, 33, 36, 40, 52, 67, 73, 83]

# Results and discussions

# Conclusions
One of the most exciting improvements we can make is developing an additional model to improve the selection of teams to stack. Currently, we are simply using basic Vegas lines, previous information and personal preference to select the teams that will be stacked. There are several datasets we could explore to improve our selection process.
### Draftkings Contest Past Tournament Dataset
After each Draftkings competition is completed, players have the opportunity to download data about the lineups played that night by each competitor with their place and scoring in the competition included. There are two potentially interesting options we can explore with this dataset. First, we can pair this data with other data to backtest our model and determine if our process is successfully generating potentially profitable lineups. Second, there is a small minority of players that take home the majority of Draftkings winnings. We can use the contest data to determine the successful players and explore their lineups to discover their strategies. 
### Donbest.com Dataset
Donbest.com collects the moneyline odds and over/under totals for each MLB game played from various online sportsbooks. With this data, we can explore the correlations between moneylines, totals and placing highly in tournaments. Most Draftkings players will look at totals and moneylines to form opinions about the slate of games that night. Games with higher totals tend to create scenarios where those players are over-valued and over-owned. From here we can explore the tradeoffs between Vegas totals, player ownership and placing highly in Draftkings contests.
### Statcast Dataset
In partnership with Amazon Web Services, every Major League Baseball stadium has been equipped with radar technology that tracks every play. These tools collect data such as exit velocity off the bat and the spin rate of a pitchers fastball. Statistics like these have been revolutionizing the way baseball is played and discovering the predictive value of some of these statistics and incorporating into a model seem like the next logical way to get an advantage over the Draftkings competition.