## Creating the Optimal Baseball Lineup
### Cameron Cross
### ISYE 524

The long accepted approach to building a batting order is the batting order strategy that most teams below the professional level still follow. The leadoff batter is the fastest player on the team, the fourth hitter, also known as the clean-up hitter, is the most powerful hitter on the team, and last batter is the worst hitter on the team. The entire lineup is as follows:

**-------------------------------------------------------------------------------------------------**

**1:** The fastest player on the team <br>

**2:** The player with the best bat control. Likely not the best player, but will make consistant contact to move the leadoff baserunner over <br>

**3:** The best overall hitter on the team, boasting the highest batting average and likely having a lot of RBIs <br>

**4:** This is the most powerful hitter on the team. This player's slugging ability is more important than their ability to get on base or make consistant contact <br>

**5:** A good hitter with strong RBI numbers <br>

**6:** An average hitter with reasonable RBI numbers <br>

**7:** An average hitter and often the weakest baserunner on the team <br>

**8:** This is the backup for 2nd in the order. An average hitter with the ability to make contact consistantly<br>

**9:** The worst hitter on the team <br>

**-------------------------------------------------------------------------------------------------**

For the majority of baseball’s history, every level of baseball used this theology to construct “optimized” batting lineups. In recent years however, the approach to making lineups is completely different. Advanced statistics and sabermetrics have optimized lineups in a vastly different way. These lineups utilize OBP (on base percentage), wRC+ (weighted runs created plus), SLG% (slugging percentage), ISO (extra-base hit average), among other statistics to create a lineup. Sabermetrics have been pivitol in finding important lineup positions that were thought to be useless before (ex: 2nd in lineup). The new guidelines for lineup creation are as follows:

**-------------------------------------------------------------------------------------------------**

**1:** The highest OBP on team. Speed is a plus, but OBP matters much more (5th best overall hitter) <br>

**2:** The highest wRC+ on team with a very high OBP (Best overall hitter) <br>

**3:** High wRC+, but less important than 5th in the lineup (4th best overall hitter) <br>

**4:** Very high SLG%, HR, and ISO (2nd best overall hitter) <br>

**5:** High wRC+ and high SLG% (3rd best overall hitter) <br>

**6:** High OBP and strong SB numbers (6th best overall hitter) <br>

**7:** Average overall hitter (7th best overall hitter) <br>

**8:** Average overall hitter (8th best overall hitter) <br>

**9:** Worst hitter on the team (Worst overall hitter) <br>

**-------------------------------------------------------------------------------------------------**

I will use these new guidelines for constructing a lineup to find the optimized lineup for the entirety of the MLB, and for my favorite team, the Detroit Tigers for the 2019 season. I will use www.baseball-reference.com for my 2019 player data sets. I used R Studio to add two addition columns (wOBA and wRC+) to the csv files as well as filter players with less than 200 plate appearances in 2019. This is to ensure that the players have enough data in the season to avoid inflated statistics. wOBA is the weighted on base average, calculated using 

$$wOBA = (0.690*uBB + 0.719*HBP + 0.850*1B + 1.217*2B + 1.529*3B + 1.940*HR) / (AB + BB - IBB + SF + HBP)$$

The coefficients are weights calculated by baseball statisticians for the 2019 MLB season to normalize the wOBA to be consistant every year.

wRC+, or weighted runs created plus, is the best statistic to measure a player's total offensive value. This statistic is heavily used in my creation of the optimized lineups.

$$wRC+ = (((wOBA - League wOBA)/wOBA Scale) + League R/PA) + (League R/PA – Park Factor * League R/PA)) / (AL or NL wRC/PA excluding pitchers) * 100$$

Unlike wRC, wRC+ utilized a park factor to control the park effects. This is important because the MLB fields are not standardized, so depending on where the player plays could determine how well they perform. The park factor removes the advantages of some parks and the disadvantages of other parks from the equation.

The values for the wOBA scale, league R/PA, park factor , wRC/PA, and the wOBA weights were all found on https://library.fangraphs.com.

In order to find the optimal lineup, I will first find the best player at each position. To do this, I will assume that wRC+ is the best measurement of the hitter's value, using a mathematical model to pick the positional players with the highest wRC+ at each position (C, 1B, 2B, 3B, SS, LF, CF, RF, DH). After this, I will apply the sabermetric guidelines for creating a lineup to order the best positional players in an optimal way. I will also assume that defensive skill is not valued and only offensive output is important. This is to avoid complicated situations of a defensive superstar with average hitting statisitcs playing over an above average hitter. Additionally, I will assume that pitchers will not hit, thus allowing there to be a designated hitter, DH, in their absence in the lineup.

Lastly, I will assume that positions outside the standard eight positions will fit into one of those positions. Utility players (can play many positions) will be included in the sets for second basemen, shortstops, left fielders, center fielders, and right fielders. Middle infielders (play SS or 2B) will be included in the sets for second basemen and shortstops. Corner infielders (play 1B or 3B) will be included in the sets for first basemen and third basemen. Outfielders, or OF, (play LF, CF, RF) will be included in the sets for left fielders, center fielders, and right fielders. 

### Mathematical Model

First, there are several sets to be used in the model.

**P** is the set of all MLB players <br>

**C** is the set of all catchers <br>

**F** is the set of all first basemen <br>

**S** is the set of all second basemen <br>

**T** is the set of all third basemen <br>

**X** is the set of all short stops <br>

**L** is the set of all left fielders <br>

**F** is the set of all center fielders <br>

**R** is the set of all right fielders <br>

**D** is the set of all designated hitters <br>

---------------------------------------------------------

There are two variable being used.

$$w_i$$ is the wRC+ for player i

$$β_i$$ is the decision variable,

β_i is defined as 1 if player i is selected and 0 otherwise. Let $$β = (β_1, ... , β_n)$$

The optimal selection of eight positional players and one designated hitter is found by finding β such that the total wRC+ of the team is maximaized.

$$\sum\limits_{i=1}^P w_i * β_i$$

Subject to the constraints:

Catcher requirement (can only have one catcher in the lineup): 
$$\sum\limits_{i=1}^C β_i = 1$$
First baseman requirement (can only have one first baseman in the lineup): 
$$\sum\limits_{i=1}^F β_i = 1$$
Second baseman requirement (can only have one second baseman in the lineup): 
$$\sum\limits_{i=1}^S β_i = 1$$
Third baseman requirement (can only have one third baseman in the lineup): 
$$\sum\limits_{i=1}^T β_i = 1$$
Short stop requirement: (can only have one short stop in the lineup):
$$\sum\limits_{i=1}^X β_i = 1$$
Left field requirement (can only have one left fielder in the lineup):
$$\sum\limits_{i=1}^L β_i = 1$$
Center field requirement (can only have one center fielder in the lineup):
$$\sum\limits_{i=1}^F β_i = 1$$
Right field requirement (can only have one right fielder in the lineup): 
$$\sum\limits_{i=1}^R β_i = 1$$
Designated hitter requirement (can only have one designated hitter in the lineup):
$$\sum\limits_{i=1}^D β_i = 1$$
Lineup limit (the lineup consists of nine players):
$$\sum\limits_{i=1}^P β_i = 9$$
Binary Choice (the decision variable is either a 1 or 0 at i):
$$β_i∈{0,1}$$

The constraints are important to ensure that only one player from each position is selected. Without these constraints, the team could be filled with players all playing out of position.

This problem is an pure integer programming problem. This is because the decision variable must be an integer because it represents the selection of a player and you can not select a fraction of a player. All of the constraints and the objective function are linear, showing that this is a linear approach to an integer programming problem. It is not a mixed integer programming problem becuase MIP problems have only some variables that are constrained to being integers.

In [224]:
using DataFrames, CSV

MLB = DataFrame(CSV.read("mlb.csv",header=true,delim=',')) # MLB data set
DET = DataFrame(CSV.read("detroit_tigers.csv",header=true,delim=',')) # Detroit Tigers data set

Unnamed: 0_level_0,Rk,Pos,Name,Age,G,PA,AB,R,H,X2B
Unnamed: 0_level_1,Int64,String,String,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,1,C,Grayson Greiner,26,58,224,208,18,42,5
2,2,1B,Brandon Dixon,27,117,420,391,41,97,20
3,3,2B,Gordon Beckham,32,83,240,223,29,48,13
4,4,SS,Jordy Mercer,32,74,271,256,24,69,16
5,5,3B,Dawel Lugo,24,77,288,273,28,67,11
6,6,LF,Christin Stewart*,25,104,416,369,32,86,25
7,7,CF,JaCoby Jones,27,88,333,298,39,70,19
8,8,RF,Nicholas Castellanos,27,100,439,403,57,110,37
9,9,DH,Miguel Cabrera,36,136,549,493,41,139,21
10,10,UT,Niko Goodrum#,27,112,472,423,61,105,27


## MLB All-Star Lineup Optimization

I am going to find the optimal player at each position from the entire MLB player list, meaning these players are the best of the best for 2019.

In [225]:
# position encoding for MLB
first_base = MLB[(MLB[:Pos] .== "1B"),:]
second_base = MLB[(MLB[:Pos] .== "2B"),:]
third_base = MLB[(MLB[:Pos] .== "3B"),:]
short_stop = MLB[(MLB[:Pos] .== "SS"),:]
left_field = MLB[(MLB[:Pos] .== "LF"),:]
center_field = MLB[(MLB[:Pos] .== "CF"),:]
right_field = MLB[(MLB[:Pos] .== "RF"),:]
catcher = MLB[(MLB[:Pos] .== "C"),:]
outfield = MLB[(MLB[:Pos] .== "OF"),:]
utility = MLB[(MLB[:Pos] .== "UT"),:]
corner_infield = MLB[(MLB[:Pos] .== "CI"),:]
designated_hitter = MLB[(MLB[:Pos] .== "DH"),:]
middle_infield = MLB[(MLB[:Pos] .== "MI"),:]

MLB_first_base = vcat(first_base, corner_infield) # all players that can play 1B
MLB_second_base = vcat(second_base, utility, middle_infield) # all players that can play 2B
MLB_third_base = vcat(third_base, corner_infield) # all players that can play 3B
MLB_short_stop = vcat(short_stop, utility, middle_infield) # all players that can play SS
MLB_left_field = vcat(left_field, outfield, utility) # all players that can play LF
MLB_center_field = vcat(center_field, outfield, utility) # all players that can play CF
MLB_right_field = vcat(right_field, outfield, utility) # all players that can play RF
MLB_catcher = catcher # all players that can play C
MLB_designated_hitter = designated_hitter # all players that can play DH

│   caller = top-level scope at In[225]:1
└ @ Core In[225]:1
│   caller = top-level scope at In[225]:3
└ @ Core In[225]:3
│   caller = top-level scope at In[225]:4
└ @ Core In[225]:4
│   caller = top-level scope at In[225]:5
└ @ Core In[225]:5
│   caller = top-level scope at In[225]:6
└ @ Core In[225]:6
│   caller = top-level scope at In[225]:7
└ @ Core In[225]:7
│   caller = top-level scope at In[225]:8
└ @ Core In[225]:8
│   caller = top-level scope at In[225]:9
└ @ Core In[225]:9
│   caller = top-level scope at In[225]:10
└ @ Core In[225]:10
│   caller = top-level scope at In[225]:11
└ @ Core In[225]:11
│   caller = top-level scope at In[225]:12
└ @ Core In[225]:12
│   caller = top-level scope at In[225]:13
└ @ Core In[225]:13
│   caller = top-level scope at In[225]:14
└ @ Core In[225]:14


Unnamed: 0_level_0,Rk,Pos,Name,Age,G,PA,AB,R,H,X2B
Unnamed: 0_level_1,Int64,String,String,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,34,DH,Renato Nunez,25,151,599,541,72,132,24
2,45,DH,J.D. Martinez,31,146,657,575,98,175,33
3,66,DH,Yonder Alonso*,32,67,251,219,23,39,6
4,113,DH,Miguel Cabrera,36,136,549,493,41,139,21
5,128,DH,Yordan Alvarez,22,87,369,313,58,98,26
6,139,DH,Jorge Soler,27,162,679,589,95,156,33
7,150,DH,Shohei Ohtani*,24,106,425,384,51,110,20
8,198,DH,Nelson Cruz,38,120,521,454,81,141,26
9,236,DH,Khris Davis,31,133,533,481,61,106,11
10,280,DH,Daniel Vogelbach*,26,144,558,462,73,96,17


In [226]:
# finding the optimal player at each position for MLB
using JuMP, Gurobi

m = Model(Gurobi.Optimizer)

@variable(m, x[1:size(MLB,1)], Bin) # decision variable

@constraint(m, sum(x[i] for i in MLB_first_base[:,1]) == 1) # first basemen constraint
@constraint(m, sum(x[i] for i in MLB_second_base[:,1]) == 1) # second basemen constraint
@constraint(m, sum(x[i] for i in MLB_third_base[:,1]) == 1) # third basemen constraint
@constraint(m, sum(x[i] for i in MLB_short_stop[:,1]) == 1) # short stop constraint
@constraint(m, sum(x[i] for i in MLB_left_field[:,1]) == 1) # left field constraint
@constraint(m, sum(x[i] for i in MLB_center_field[:,1]) == 1) # center field constraint
@constraint(m, sum(x[i] for i in MLB_right_field[:,1]) == 1) # right field constraint
@constraint(m, sum(x[i] for i in MLB_catcher[:,1]) == 1) # catcher constraint
@constraint(m, sum(x[i] for i in MLB_designated_hitter[:,1]) == 1) # designated hitter constraint
@constraint(m, sum(x) == 9) #  maximum number of players in lineup

@objective(m, Max, sum(x[i] * MLB[i,30] for i = 1:size(MLB,1))) # maximize total wRC+ for team

optimize!(m)

# print row numbers for the best players in MLB data set
for i = 1:size(MLB,1)
    if value(x[i]) == 1
        println(i)
    end
end

Academic license - for non-commercial use only
Academic license - for non-commercial use only
Gurobi Optimizer version 9.0.2 build v9.0.2rc0 (mac64)
Optimize a model with 10 rows, 356 columns and 829 nonzeros
Model fingerprint: 0x31d8d818
Variable types: 0 continuous, 356 integer (356 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [5e+00, 2e+02]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 9e+00]
Found heuristic solution: objective 973.6938561
Presolve removed 7 rows and 346 columns
Presolve time: 0.00s
Presolved: 3 rows, 10 columns, 14 nonzeros
Found heuristic solution: objective 1617.8146138
Variable types: 0 continuous, 10 integer (10 binary)

Root relaxation: cutoff, 1 iterations, 0.00 seconds

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0     cutoff    0      1617.81461 1617.81461  0.00%     -    0s

Explored 0 

In [227]:
opt = [7,16,128,185,249,253,261,294,350] # optimal rows

# print player position, player name, player wRC+
for i in 1:9
    println(MLB[opt[i],2])
    println(MLB[opt[i],3])
    println(MLB[opt[i],30])
    println("---")
end

CF
Ketel Marte#
183.231965606865
---
2B
Ozzie Albies#
158.30424275376
---
DH
Yordan Alvarez
173.767660804244
---
RF
Christian Yelich*
209.640639783374
---
1B
Josh Bell#
193.928616208406
---
LF
Bryan Reynolds#
175.757912511145
---
SS
Fernando Tatis Jr.
190.423443656451
---
C
Stephen Vogt*
151.493825636438
---
3B
Anthony Rendon
181.266306851699
---


According to our calculations, the optimal positional players for the MLB are as follows:

**Catcher:** *Stephen Vogt* (wRC+: 128.47)<br>

**First base:** *Josh Bell* (wRC+: 193.93) <br>

**Second base:** *Ozzie Albies* (wRC+: 158.30)<br>

**Third base:** *Anthony Rendon* (wRC+: 181.27)<br>

**Short stop:** *Fernando Tatis Jr.* (wRC+: 190.42) <br>

**Left field:** *Bryan Reynolds* (wRC+: 175.76)<br>

**Center Field:** *Ketel Marte* (wRC+: 183.23)<br>

**Right Field:** *Christian Yelich* (wRC+: 209.64)<br>

**Designated Hitter:** *Yordan Alvarez* (wRC+: 173.77) <br>

These players show the best player at each position by wRC+ value. From finding the best players in the MLB at each position, we can now find the optimal lineup using these players.

In order to create the optimal lineup with the optimal players, we are going to use a sabermetric approach to creating a lineup (as described above).

#### 2nd in lineup
First, we look at the second spot in the lineup, the place where the player with the highest wRC+ should go. Looking at our wRC+ values for the optimal players, it is clear that Christian Yelich leads the nine hitters with his 209.64 wRC+

**1:** <br>

**2:** Christian Yelich <br>

**3:** <br>

**4:** <br>

**5:**  <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 4th in lineup ("Cleanup")

This player must be the most powerful player on the team, sporting a high SLG%, HR, and ISO. ISO is an isolated power sabermetric statistic that is equal to (SLG% - AVG). To determine this player, we will find the player with the highest sum of the equation $$(SLG + HR/AB + ISO)$$ I created this equation to give a more equal weight to SLG% and ISO compared to HR, by looking at the amount of HR per at bat. Thus the sum of the three statistics will give each one 1/3 of the total weight.

In [228]:
for i in 1:9
    println(MLB[opt[i],3])
    println(MLB[opt[i],20] + (MLB[opt[i],12] / MLB[opt[i],7]) + (MLB[opt[i],20] - MLB[opt[i],18]))
    println("---")
end    

Ketel Marte#
0.9112390158172232
---
Ozzie Albies#
0.7424999999999999
---
Yordan Alvarez
1.083261980830671
---
Christian Yelich*
1.1029795501022495
---
Josh Bell#
0.9312087286527513
---
Bryan Reynolds#
0.7245865580448065
---
Fernando Tatis Jr.
0.9288682634730538
---
Stephen Vogt*
0.7562156862745097
---
Anthony Rendon
0.9393853211009173
---


Once again, Christian Yelich leads the pack, but he is already in the lineup so we will look to the second highest value. This comes from Yordan Alvarez with his 1.0833.

**1:** <br>

**2:** Christian Yelich <br>

**3:** <br>

**4:** Yordan Alvarez<br>

**5:**  <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 5th in lineup

This player is the third best hitter on the team. They bost both a high wRC+ and SLG%. To find this player, we will use the equation $$(wRC+) + (SLG * 229.89)$$ to give roughly equal weight to the two factors. This equation was found by knowing that the average wRC+ is 100, so converting the average SLG% of .435 in 2019 to the wRC+ scale gives this weight.

In [229]:
# print player name and value for equation
for i in 1:9
    println(MLB[opt[i],3])
    println(MLB[opt[i],20] * 229.89 + MLB[opt[i],30])
    println("---")
end    

Ketel Marte#
319.326845606865
---
Ozzie Albies#
273.24924275376
---
Yordan Alvarez
324.34561080424396
---
Christian Yelich*
363.896829783374
---
Josh Bell#
324.736026208406
---
Bryan Reynolds#
291.392582511145
---
Fernando Tatis Jr.
326.058543656451
---
Stephen Vogt*
264.13992563643797
---
Anthony Rendon
318.740526851699
---


It is pretty obvious that Christian Yelich had himself a tremendous year, leading once again. Fernando Tatis Jr. comes in second through with his 326.06

**1:** <br>

**2:** Christian Yelich <br>

**3:** <br>

**4:** Yordan Alvarez<br>

**5:** Fernando Tatis Jr. <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 3rd in lineup
This is the next best hitter on the team. They must have a high wRC+ meaning they must be a well rounded hitter. To determine the player in this spot, we will find the highest wRC+ of the players not in the lineup.

In [230]:
for i in 1:9
    println(MLB[opt[i],3])
    println(MLB[opt[i],30])
    println("---")
end

Ketel Marte#
183.231965606865
---
Ozzie Albies#
158.30424275376
---
Yordan Alvarez
173.767660804244
---
Christian Yelich*
209.640639783374
---
Josh Bell#
193.928616208406
---
Bryan Reynolds#
175.757912511145
---
Fernando Tatis Jr.
190.423443656451
---
Stephen Vogt*
151.493825636438
---
Anthony Rendon
181.266306851699
---


The highest wRC+ of the remaining players goes to Josh Bell, sporting a 193.93 wRC+ in 2019

**1:** <br>

**2:** Christian Yelich <br>

**3:** Josh Bell <br>

**4:** Yordan Alvarez<br>

**5:** Fernando Tatis Jr. <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 1st in lineup ("Leadoff")

This player possess the highest OBP on the team. The leadoff hitter must also have some speed. To account for the speed factor, we will use the equation $$OBP + SB / ((SB + CS) * 10)$$ to give some weight to base stealing ability while focusing on OBP

In [231]:
for i in 1:9
    println(MLB[opt[i],3])
    println(MLB[opt[i],19] + MLB[opt[i],14] / ((MLB[opt[i],14] + MLB[opt[i],15]) * 10))
    println("---")
end

Ketel Marte#
0.4723333333333333
---
Ozzie Albies#
0.43094736842105263
---
Yordan Alvarez
NaN
---
Christian Yelich*
0.52275
---
Josh Bell#
0.367
---
Bryan Reynolds#
0.437
---
Fernando Tatis Jr.
0.45172727272727276
---
Stephen Vogt*
0.389
---
Anthony Rendon
0.4953333333333333
---


It is clear to see that Anthony Rendon leads the remaining players with 0.495

**1:** Anthony Rendon <br>

**2:** Christian Yelich <br>

**3:** Josh Bell <br>

**4:** Yordan Alvarez<br>

**5:** Fernando Tatis Jr. <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 6th in lineup

This batting order position has a similar emphasis on OBP and SB except SB matter more in this position compared to leadoff. To account for this, we will tweak the equation to give more weight to SB% (SB / (SB + CS) in the equation, leading to it being: $$OBP + SB/((SB + CS) * 5)$$

In [232]:
for i in 1:9
    println(MLB[opt[i],3])
    println(MLB[opt[i],19] + MLB[opt[i],14] / ((MLB[opt[i],14] + MLB[opt[i],15]) * 5))
    println("---")
end

Ketel Marte#
0.5556666666666666
---
Ozzie Albies#
0.5098947368421052
---
Yordan Alvarez
NaN
---
Christian Yelich*
0.6165
---
Josh Bell#
0.367
---
Bryan Reynolds#
0.497
---
Fernando Tatis Jr.
0.5244545454545455
---
Stephen Vogt*
0.46399999999999997
---
Anthony Rendon
0.5786666666666667
---


Note that some players have a value of NaN because they never stole a base in 2019 making their SB% (0/0). Of the remaining players, Ketel Marte has the highest value with 0.556 

**1:** Anthony Rendon <br>

**2:** Christian Yelich <br>

**3:** Josh Bell <br>

**4:** Yordan Alvarez<br>

**5:** Fernando Tatis Jr. <br>

**6:** Ketel Marte <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### Remaining places in the lineup

For the remained of the lineup, there is no specific statistics that sabermetrics has found to be important for any of these spots. These players just be listed in descending order of "batter skill". To do this, we will just list the remaining players in descending order of wRC+, the best statistic to determine a hitter's skill level.

In [233]:
for i in 1:9
    println(MLB[opt[i],3])
    println(MLB[opt[i],30])
    println("---")
end    

Ketel Marte#
183.231965606865
---
Ozzie Albies#
158.30424275376
---
Yordan Alvarez
173.767660804244
---
Christian Yelich*
209.640639783374
---
Josh Bell#
193.928616208406
---
Bryan Reynolds#
175.757912511145
---
Fernando Tatis Jr.
190.423443656451
---
Stephen Vogt*
151.493825636438
---
Anthony Rendon
181.266306851699
---


It can be seen that Bryan Reynolds has the highest wRC+ of the remaining players with 175.76, followed by Ozzie Albies with 158.30, and finally Stephen Vogt with a wRC+ of 151.49.

**1:** Anthony Rendon <br>

**2:** Christian Yelich <br>

**3:** Josh Bell <br>

**4:** Yordan Alvarez<br>

**5:** Fernando Tatis Jr. <br>

**6:** Ketel Marte <br>

**7:** Bryan Reynolds <br>

**8:** Ozzie Albies <br>

**9:** Stephen Vogt <br>

This is the optimal lineup for the 2019 MLB season, found by using a mathematical model to find the best player at each position, then using sabermetrics to construct an optimal lineup with these players.

### Detroit Tigers Lineup Optimization
Now let's dive into my favorite MLB team, the Detroit Tigers, to create an optimal lineup that would have hopefully prevented their 114 loss season.

In [234]:
# position encoding for Detroit Tigers
first_base = DET[(DET[:Pos] .== "1B"),:]
second_base = DET[(DET[:Pos] .== "2B"),:]
third_base = DET[(DET[:Pos] .== "3B"),:]
short_stop = DET[(DET[:Pos] .== "SS"),:]
left_field = DET[(DET[:Pos] .== "LF"),:]
center_field = DET[(DET[:Pos] .== "CF"),:]
right_field = DET[(DET[:Pos] .== "RF"),:]
catcher = DET[(DET[:Pos] .== "C"),:]
outfield = DET[(DET[:Pos] .== "OF"),:]
utility = DET[(DET[:Pos] .== "UT"),:]
corner_infield = DET[(DET[:Pos] .== "CI"),:]
designated_hitter = DET[(DET[:Pos] .== "DH"),:]
middle_infield = DET[(DET[:Pos] .== "MI"),:]

DET_first_base = vcat(first_base, corner_infield)  # all players that can play 1B
DET_second_base = vcat(second_base, utility, middle_infield) # all players that can play 2B
DET_third_base = vcat(third_base, corner_infield) # all players that can play 3B
DET_short_stop = vcat(short_stop, utility, middle_infield) # all players that can play SS
DET_left_field = vcat(left_field, outfield, utility) # all players that can play LF
DET_center_field = vcat(center_field, outfield, utility) # all players that can play CF
DET_right_field = vcat(right_field, outfield, utility) # all players that can play RF
DET_catcher = catcher # all players that can play C
DET_designated_hitter = designated_hitter # all players that can play DH

│   caller = top-level scope at In[234]:1
└ @ Core In[234]:1
│   caller = top-level scope at In[234]:3
└ @ Core In[234]:3
│   caller = top-level scope at In[234]:4
└ @ Core In[234]:4
│   caller = top-level scope at In[234]:5
└ @ Core In[234]:5
│   caller = top-level scope at In[234]:6
└ @ Core In[234]:6
│   caller = top-level scope at In[234]:7
└ @ Core In[234]:7
│   caller = top-level scope at In[234]:8
└ @ Core In[234]:8
│   caller = top-level scope at In[234]:9
└ @ Core In[234]:9
│   caller = top-level scope at In[234]:10
└ @ Core In[234]:10
│   caller = top-level scope at In[234]:11
└ @ Core In[234]:11
│   caller = top-level scope at In[234]:12
└ @ Core In[234]:12
│   caller = top-level scope at In[234]:13
└ @ Core In[234]:13
│   caller = top-level scope at In[234]:14
└ @ Core In[234]:14


Unnamed: 0_level_0,Rk,Pos,Name,Age,G,PA,AB,R,H,X2B
Unnamed: 0_level_1,Int64,String,String,Int64,Int64,Int64,Int64,Int64,Int64,Int64
1,9,DH,Miguel Cabrera,36,136,549,493,41,139,21


In [235]:
# finding the optimal player at each position for the Detroit Tigers
using JuMP, Gurobi

m = Model(Gurobi.Optimizer)

@variable(m, x[1:size(DET,1)], Bin)

@constraint(m, sum(x[i] for i in DET_first_base[:,1]) == 1) # first basemen constraint
@constraint(m, sum(x[i] for i in DET_second_base[:,1]) == 1) # second basemen constraint
@constraint(m, sum(x[i] for i in DET_third_base[:,1]) == 1) # third basemen constraint
@constraint(m, sum(x[i] for i in DET_short_stop[:,1]) == 1) # short stop constraint
@constraint(m, sum(x[i] for i in DET_left_field[:,1]) == 1) # left field constraint
@constraint(m, sum(x[i] for i in DET_center_field[:,1]) == 1) # center field constraint
@constraint(m, sum(x[i] for i in DET_right_field[:,1]) == 1) # right field constraint
@constraint(m, sum(x[i] for i in DET_catcher[:,1]) == 1) # catcher constraint
@constraint(m, sum(x[i] for i in DET_designated_hitter[:,1]) == 1) # designated hitter constraint
@constraint(m, sum(x) == 9) #  maximum number of players in lineup

@objective(m, Max, sum(x[i] * DET[i,30] for i = 1:size(DET,1))) # Maximize total wRC+ for team

optimize!(m)

# print row numbers for best players on Tigers
for i = 1:size(DET,1)
    if value(x[i]) == 1
        println(i)
    end
end

Academic license - for non-commercial use only
Academic license - for non-commercial use only
Gurobi Optimizer version 9.0.2 build v9.0.2rc0 (mac64)
Optimize a model with 10 rows, 15 columns and 46 nonzeros
Model fingerprint: 0x1e3ea317
Variable types: 0 continuous, 15 integer (15 binary)
Coefficient statistics:
  Matrix range     [1e+00, 1e+00]
  Objective range  [4e+01, 1e+02]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 9e+00]
Found heuristic solution: objective 718.0169853
Presolve removed 10 rows and 15 columns
Presolve time: 0.00s
Presolve: All rows and columns removed

Explored 0 nodes (0 simplex iterations) in 0.00 seconds
Thread count was 1 (of 16 available processors)

Solution count 1: 718.017 

Optimal solution found (tolerance 1.00e-04)
Best objective 7.180169853224e+02, best bound 7.180169853224e+02, gap 0.0000%
1
2
3
4
5
6
7
8
9


In [236]:
opt = [1,2,3,4,5,6,7,8,9] # optimal rows

# print player position, player name, player wRC+
for i = 1:9
        println(DET[opt[i], 2])
        println(DET[opt[i],3])
        println(DET[opt[i],30])
        println("---")
end

C
Grayson Greiner
41.970053765686
---
1B
Brandon Dixon
83.3725356653553
---
2B
Gordon Beckham
64.1955661032347
---
SS
Jordy Mercer
92.3744910893711
---
3B
Dawel Lugo
63.7580277836933
---
LF
Christin Stewart*
80.4747399884352
---
CF
JaCoby Jones
92.1142383770609
---
RF
Nicholas Castellanos
103.459398665266
---
DH
Miguel Cabrera
96.2979338842693
---


According to our calculations, the optimal positional players for the Detroit Tigers are as follows:

**Catcher:** *Grayson Greiner* (wRC+: 41.97)<br>

**First base:** *Brandon Dixon* (wRC+: 83.37) <br>

**Second base:** *Gordan Beckham* (wRC+: 64.20)<br>

**Third base:** *Dawel Lugo* (wRC+: 63.76)<br>

**Short stop:** *Jordy Mercer* (wRC+: 92.37) <br>

**Left field:** *Christin Stewart* (wRC+: 80.47)<br>

**Center Field:** *JaCoby Jones* (wRC+: 92.11)<br>

**Right Field:** *Nicholas Castellanos* (wRC+: 103.46)<br>

**Designated Hitter:** *Miguel Cabrera* (wRC+: 96.30) <br>

These players show the best player at each position by wRC+ value for the 2019 season. We can use the players we found to create the optimal lineup for the Detroit Tigers.

To create the optimal lineup, we will use the same equations and logic that was used to create the "All-Star" lineup.

#### 2nd in lineup
It is clear that Nicholas Castellanos leads the Tigers with a wRC+ of 103.46.

**1:** <br>

**2:** Nicholas Castellanos <br>

**3:** <br>

**4:** <br>

**5:**  <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 4th in lineup ("Cleanup")

In [237]:
for i in 1:9
    println(DET[opt[i],3])
    println(DET[opt[i],20] + (DET[opt[i],12] / DET[opt[i],7]) + (DET[opt[i],20] - DET[opt[i],18]))
    println("---")
end    

Grayson Greiner
0.43803846153846154
---
Brandon Dixon
0.6603631713554987
---
Gordon Beckham
0.5559058295964125
---
Jordy Mercer
0.64115625
---
Dawel Lugo
0.538978021978022
---
Christin Stewart*
0.57010027100271
---
JaCoby Jones
0.6619127516778524
---
Nicholas Castellanos
0.6782952853598014
---
Miguel Cabrera
0.5383407707910751
---


JaCoby Jones has the highest value of the remaining players with 0.662

**1:** <br>

**2:** Nicholas Castellanos <br>

**3:** <br>

**4:** JaCoby Jones <br>

**5:** <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 5th in lineup

In [238]:
for i in 1:9
    println(DET[opt[i],3])
    println(DET[opt[i],20] * 229.89 + MLB[opt[i],30])
    println("---")
end    

Grayson Greiner
217.944444641379
---
Brandon Dixon
241.881625883615
---
Gordon Beckham
231.04924704813897
---
Jordy Mercer
215.260405869515
---
Dawel Lugo
222.981923222849
---
Christin Stewart*
223.458019409309
---
JaCoby Jones
282.084665606865
---
Nicholas Castellanos
215.679982974711
---
Miguel Cabrera
177.6286314069825
---


Brandon Dixon has the highest value of the remaining players with 241.882

**1:** <br>

**2:** Nicholas Castellanos <br>

**3:** <br>

**4:** JaCoby Jones <br>

**5:** Brandon Dixon <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:**  <br>

#### 3rd in lineup

In [239]:
for i in 1:9
    println(DET[opt[i],3])
    println(DET[opt[i],30])
    println("---")
end

Grayson Greiner
41.970053765686
---
Brandon Dixon
83.3725356653553
---
Gordon Beckham
64.1955661032347
---
Jordy Mercer
92.3744910893711
---
Dawel Lugo
63.7580277836933
---
Christin Stewart*
80.4747399884352
---
JaCoby Jones
92.1142383770609
---
Nicholas Castellanos
103.459398665266
---
Miguel Cabrera
96.2979338842693
---


Of the remaining players, Miguel Cabrera has the highest wRC+ with 96.30

**1:** <br>

**2:** Nicholas Castellanos <br>

**3:** Miguel Cabrera <br>

**4:** JaCoby Jones <br>

**5:** Brandon Dixon <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:** <br>

#### 1st in lineup ("Leadoff")

In [240]:
for i in 1:9
    println(DET[opt[i],3])
    println(DET[opt[i],19] + DET[opt[i],14] / ((DET[opt[i],14] + DET[opt[i],15]) * 10))
    println("---")
end

Grayson Greiner
NaN
---
Brandon Dixon
0.3733333333333333
---
Gordon Beckham
0.34600000000000003
---
Jordy Mercer
NaN
---
Dawel Lugo
NaN
---
Christin Stewart*
0.305
---
JaCoby Jones
0.3877777777777778
---
Nicholas Castellanos
0.39466666666666667
---
Miguel Cabrera
NaN
---


Gordan Beckham has the highest value of the remaining players with 0.346

**1:** Gordon Beckham <br>

**2:** Nicholas Castellanos <br>

**3:** Miguel Cabrera <br>

**4:** JaCoby Jones <br>

**5:** Brandon Dixon <br>

**6:** <br>

**7:** <br>

**8:** <br>

**9:** <br>

#### 6th in lineup

In [241]:
for i in 1:9
    println(DET[opt[i],3])
    println(DET[opt[i],19] + (DET[opt[i],14] / ((DET[opt[i],14] + DET[opt[i],15]) * 5)))
    println("---")
end

Grayson Greiner
NaN
---
Brandon Dixon
0.45666666666666667
---
Gordon Beckham
0.42100000000000004
---
Jordy Mercer
NaN
---
Dawel Lugo
NaN
---
Christin Stewart*
0.305
---
JaCoby Jones
0.46555555555555556
---
Nicholas Castellanos
0.4613333333333334
---
Miguel Cabrera
NaN
---


Because none of the remaining players have any stolen bases, we will just look at OBP.

In [242]:
for i in 1:9
    println(DET[opt[i],3])
    println(DET[opt[i],19])
    println("---")
end

Grayson Greiner
0.251
---
Brandon Dixon
0.29
---
Gordon Beckham
0.271
---
Jordy Mercer
0.31
---
Dawel Lugo
0.271
---
Christin Stewart*
0.305
---
JaCoby Jones
0.31
---
Nicholas Castellanos
0.328
---
Miguel Cabrera
0.346
---


Jordy Mercer leads the remaining players with an OBP of 0.310

**1:** Gordon Beckham <br>

**2:** Nicholas Castellanos <br>

**3:** Miguel Cabrera <br>

**4:** JaCoby Jones <br>

**5:** Brandon Dixon <br>

**6:** Jordy Mercer <br>

**7:** <br>

**8:** <br>

**9:** <br>

#### Remaining places in lineup

In [243]:
for i in 1:9
    println(DET[opt[i],3])
    println(DET[opt[i],30])
    println("---")
end

Grayson Greiner
41.970053765686
---
Brandon Dixon
83.3725356653553
---
Gordon Beckham
64.1955661032347
---
Jordy Mercer
92.3744910893711
---
Dawel Lugo
63.7580277836933
---
Christin Stewart*
80.4747399884352
---
JaCoby Jones
92.1142383770609
---
Nicholas Castellanos
103.459398665266
---
Miguel Cabrera
96.2979338842693
---


Christin Stewart leads with a wRC+ of 80.47, followed by Dawel Lugo with a wRC+ of 63.76, and finally Grayson Greiner with 41.97

**1:** Gordon Beckham <br>

**2:** Nicholas Castellanos <br>

**3:** Miguel Cabrera <br>

**4:** JaCoby Jones <br>

**5:** Brandon Dixon <br>

**6:** Jordy Mercer <br>

**7:** Christin Stewart <br>

**8:** Dawel Lugo <br>

**9:** Grayson Greiner <br>

This is the optimal lineup of the 2019 Detroit Tigers

### Limitations

The first obvious limitation of my model is the assumption that wRC+ is the only deciding factor in selecting a player. While this is the best statistic for determining offensive run creation value, it does not account for defense. While being a good hitter is important, it is also important to be able to defend. By taking a strictly offensive approach, I run the risk of my lineup being unsuccessful in the field and creating more errors which lead to unearned runs.

Additionally, assuming my equations calculated to weigh various statistics when determining where to put players in the lineup were correct without extensive testing is a limitation. The equations are approximate in weighing the various values of batting statistics, but I would need to run through many years of data to acquire more exact weights. Because of this, using proven weights instead might alter the decision making process when placing the players in the lineups.

Lastly, my model is unable to differentiate between AL and NL teams on its own. This means that it will select a designated hitter even if the game is played at an NL stadium (pitchers must hit in the NL). This would effect the lineups by forcing me to replace one of the players with a pitcher in the lineup.

### Conclusion

In conclusion, I was able to find the best positional players and designated hitter for the entire league and for my favorite team, the Detroit Tigers. This was accomplished using integer programming, using a binomial variable to pick the best player at each position in order to maximize the total wRC+ for the created team.

With this data, I used sabermetric analysis of the optimal baseball lineup to construct the best lineups for both the entire league and the Tigers. These lineups are unconventional compared to the general model for lineup creation I am used to in my playing career. 

These sabermetric optimized lineups are starting to be used more often in the MLB, as wave of younger general managers are leaning on statistics to help win championships.

It would be interesting to use this concept of lineup optimization in other sports. Particularly, using sabermetrics and an optimization model to determine the optimal game plan for an NBA team. This would mean finding the correct players to start, as well as factor in load management throughout the game. This would allow NBA teams to find the optimal amount of minutes to play all of their players in order to maximize offensive production.