In [1]:
# Importing packages for the analysis

import pandas as pd
import numpy as np
import pybaseball
import warnings

from RE24_function import compute_RE24
from RP24_function import compute_RP24

In [2]:
warnings.filterwarnings('ignore')

## Getting RE24 Tables for 2021-2023

Once we have an RE24 table, we can do a variety of tasks related to baseball. For the ones we'll be interested in today, we'll look at how run expectancy varies from 2021 through 2023. In order to do so, we'll need to compute run expectancy for each of those seasons, which we can do with the function we created earlier after computing RE24 once.

In [3]:
RE24_2021 = compute_RE24(startyear = 2021, endyear = 2021)
RE24_2021

This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|████████████████████████████████████████████████████████████████████████████████| 246/246 [08:01<00:00,  1.96s/it]


Outs,0,1,2
Bases,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.404421,0.211963,0.083436
1,1.207668,0.833625,0.327084
10,0.991827,0.578817,0.265401
11,1.838926,1.25283,0.490937
100,0.742529,0.428722,0.185288
101,1.572052,0.979462,0.40495
110,1.2755,0.748178,0.369455
111,2.16701,1.404847,0.704897


In [4]:
RE24_2022 = compute_RE24(startyear = 2022, endyear = 2022)
RE24_2022

This is a large query, it may take a moment to complete
Skipping offseason dates
Skipping offseason dates


100%|████████████████████████████████████████████████████████████████████████████████| 246/246 [08:29<00:00,  2.07s/it]


Outs,0,1,2
Bases,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.382866,0.20544,0.079162
1,1.184669,0.851852,0.325581
10,0.937226,0.574321,0.266112
11,1.809302,1.266542,0.462011
100,0.714511,0.420314,0.173134
101,1.598145,1.029294,0.437441
110,1.20829,0.765323,0.36339
111,2.075515,1.336466,0.649342


In [5]:
RE24_2023 = compute_RE24(startyear = 2023, endyear = 2023)
RE24_2023

Outs,0,1,2
Bases,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.513162,0.272968,0.104097
1,1.464444,0.972603,0.357815
10,1.131379,0.705946,0.308621
11,1.916788,1.364883,0.526152
100,0.900556,0.537914,0.239489
101,1.825994,1.177682,0.526352
110,1.458616,0.945819,0.471745
111,2.251381,1.589162,0.771806


With these tables computed, we would be best served to put those tables somewhere we can easily compare them without scrolling. I will do this separately on Google Sheets or Microsoft Excel, but you could also screenshot the tables and paste them into a document to see them all at once. My Google Sheets link is [here.](https://docs.google.com/spreadsheets/d/1iBCPRJkaXIHGTM6mMwqN7oMvgkPAJditbuzejypxw2A/edit?usp=sharing)

## Answering Simple Baseball Questions with RE24

A simple question we could be interested in is whether it is worth it to sacrifice bunt a runner from first base to advance to second base. Teams might only consider this strategic decision with 0 outs or 1 outs, as with 2 outs, an out recorded anywhere will end the inning. We calculate the change in run expectancy by subtracting original bases & outs combination from new bases & outs combination. We can easily locate those values on the Google Sheet, as each pair is highlighted in matching colors for each year:

* Bunting a runner from first base with 0 out is highlighted in green.

* Bunting a runner from first base with 1 out is highlighted in yellow.

Let's see if there are any significant differences by year and by outs:

| Year & Outs | Run Expectancy with Runner on 1st | Run Expectancy with Runner on 2nd, 1 more out | Run Expectancy Net Change |
| --- | --- | --- | --- |
| **2021, 0 out** | 0.910188 | 0.682045 | -0.228143 |
| **2021, 1 out** | 0.528809 | 0.320924 | -0.207885 |
| **2022, 0 out** | 0.867163 | 0.669258 | -0.197905 |
| **2022, 1 out** | 0.510544 | 0.308992 | -0.201552 |
| **2023, 0 out** | 0.900556 | 0.705946 | -0.194610 |
| **2023, 1 out** | 0.537914 | 0.308621 | -0.229293 |

We can see that it is not worth it to sacrifice bunt in any of the last three years. We see that there is a negative run expectancy change for all of these scenarios, which indicates that the average runs scored in the inning goes down when recording an out and advancing a runner from first base to second base. According to the run expectancy table, it is more worth it to take your chances with the runner staying at first base than trying to advance him to second base with a sacrifice bunt. The changes in run expectancy are fairly similar whether you are bunting a runner with 0 outs or with 1 out.

If we were curious about any of these with 2 out, we can simply reduce that run expectancy to zero, since a successful bunt (a recorded out) would bring the inning to an end and no runners would score.

Let's see if this changes if we look at scenarios with runners on 1st and 2nd in the same years and with 0 or 1 out. Similarly, each pair is highlighted as follows:

* Bunting runners from first and second with 0 out is highlighted in red.

* Bunting runners from first and second with 1 out is highlighted in blue.

| Year & Outs | Run Expectancy with Runner on 1st and 2nd | Run Expectancy with Runner on 2nd and 3rd, 1 more out | Run Expectancy Net Change |
| --- | --- | --- | --- |
| **2021, 0 out** | 1.523455 | 1.390080 | -0.133375 |
| **2021, 1 out** | 0.912859 | 0.582442 | -0.330417 |
| **2022, 0 out** | 1.437622 | 1.397410 | -0.040212 |
| **2022, 1 out** | 0.906098 | 0.547535 | -0.358563 |
| **2023, 0 out** | 1.458616 | 1.364883 | -0.093733 |
| **2023, 1 out** | 0.945819 | 0.526152 | -0.419667 |

We can see that it is still not worth it to sacrifice bunt in any of the last three years. We see that there is a negative run expectancy change for all of these scenarios, which indicates that the average runs scored in the inning goes down when recording an out and advancing two runners from first and second base to second and third base respectively. According to the run expectancy table, it is more worth it to take your chances with the runners staying at first base and second base than trying to advance them to second and third base with a sacrifice bunt. The changes in run expectancy do differ fairly significantly whether there is 0 outs or 1 out: with 0 out, the run expectancy drops roughly 10 points, but with 1 out, the run expectancy drops by about 36 points. Hence it is a significantly larger mistake to bunt runners on first and second with 1 out than with 0 out.

## Asking more complex questions with Run Expectancy

### Steal Attempts of Second Base

Run expectancy can also answer questions about whether it is worth it to steal. We can see from all of the tables that it is worth it for a runner to advance from first base to second base with no change in outs, provided they are successful (highlighted in purple on the sheet). And if the runner fails to steal, the run expectancy decreases (highlighted in orange on the sheet). So it follows that we are not interested in run expectancy changes here, because they are all positive, but the success rate that a runner must have in order to add positive value, in expected runs, to their team by stealing the base.

I'll refer to some formulas provided by Russell A. Carleton's *The Shift* where he tackles this problem with a different year's dataset:

We let $p$ be the probability that the runner successfully steals second base from first base. In order to compute the change in run expectancy from steal attempts, we'll multiply the success rate by the run expectancy with a runner on second base ($p * E(R)_{010, O}$), then add that to the product of one minus the success rate (or the failure rate) by the run expectancy with no runners on base ($(1-p) * E(R)_{000, O+1}$). And we can compare this to the run expectancy with a runner staying on first base to see if our runner is better off just staying at first base or taking their chances at second, fully acknowledging the risks ($E(R)_{100, O}$). The full formula is shown below:

$$E(R)_{100, O} = p \times E(R)_{010, O} + (1-p) \times E(R)_{000, O+1} $$

Starting with the example of a runner on first base and 0 out, we can make the following substitutions from each of the RE24 table and solve for $p$:

$$\text{2021 RE24 Table: } 0.910188 = p \times 1.112581 + (1-p) \times 0.263721$$

$$\text{2022 RE24 Table: } 0.867163 = p \times 1.065626 + (1-p) \times 0.252459$$

$$\text{2023 RE24 Table: } 0.900556 = p \times 1.131379 + (1-p) \times 0.272968$$

If algebra is enjoyable for you, feel free to solve by hand, but Python has a solver that we can use to simplify the process:

In [6]:
from sympy import solve, Eq, Symbol
from sympy.abc import x

x = Symbol('x')

#2021 equation solved
solve(Eq(0.910188, (1.112581*x) + ((1-x)*0.263721)))

[0.761570812619278]

According to our run expectancy matrix from the 2021 season, our runner at first would have to succeed just over 3 times for every 4 attempts in order to bring positive run expectancy from their steal.

Let's also apply the solver to the other seasons:

In [7]:
#2022 equation solved
solve(Eq(0.867163, (1.065626*x) + ((1-x)*0.252459)))

[0.755938202115924]

The bar is just slightly lower in 2022 compared to 2021 when stealing second base with 0 outs.

In [8]:
#2023 equation solved
solve(Eq(0.900556, (1.131379*x) + ((1-x)*0.272968)))

[0.731104331142075]

And the bar lowers yet again for 2023 in the same bases and outs combination.

We could also look if things change with 1 out:

$$\text{2021 RE24 Table: } 0.528809 = p \times 0.682045 + (1-p) \times 0.101194$$

$$\text{2022 RE24 Table: } 0.510544 = p \times 0.669258 + (1-p) \times 0.096001$$

$$\text{2023 RE24 Table: } 0.537914 = p \times 0.705946 + (1-p) \times 0.104097$$

In [9]:
#2021 equation solved
solve(Eq(0.528809, (0.682045*x) + ((1-x)*0.101194)))

[0.736187077236675]

In [10]:
#2022 equation solved
solve(Eq(0.510544, (0.669258*x) + ((1-x)*0.096001)))

[0.723136394322267]

In [11]:
#2023 equation solved
solve(Eq(0.537914, (0.705946*x) + ((1-x)*0.104097)))

[0.720807046285696]

We can see similar trends with how the minimum success rate changes year to year, but overall with 1 out, a runner can be just slightly less successful and bring positive value to their team compared to with 0 out.

For anyone curious about steal attempts of second base with 2 outs, we would substitute the run expectancy on failure with 0 because the inning would be over if the runner is caught stealing. For all four of our run expectancy tables, we would need to find the success rate such that:

$$E(R)_{100, 2} = p \times E(R)_{010, 2}$$

In [12]:
#2021 equation solved
solve(Eq(0.226648, 0.320924*x))

[0.706235744288367]

In [13]:
#2022 equation solved
solve(Eq(0.207913, 0.308992*x))

[0.672875025890638]

In [14]:
#2023 equation solved
solve(Eq(0.239489, 0.308621*x))

[0.775997096762696]

We can see that the bar for adding positive value through a steal of second base with 2 outs drops fairly dramatically. Given run expectancy is already fairly low with a runner on first base with 2 out and the net change is not very high, the runner can add marginal positive value with a lower success rate at stealing second base in this scenario.

## Adding a new dimension with Run Probability

There's plenty more we could do with run expectancy, which we can save for another time. But we know that when an "old-school manager" calls for a bunt sign, they're not trying to score as many runs as possible in the inning, which is the assumption we can often (though not always) make during a game. They're probably calling for a sacrifice to advance the runner into scoring position, where they'll have a better chance to score.

Or will they?

Let's see if that's the case by checking the Run Probability table.

### Sacrifice bunts

We make a couple of key changes to the RE24 function to modify it to the Run Probability function. They're both generally calculated with the same framework and principles, only that instead of getting the average number of runs scored, we're going for the number of instances with one or more scores out of the total number of counted bases-outs situations.

In [15]:
RP24_2021 = compute_RP24(startyear = 2021, endyear = 2021)
RP24_2021

Outs,0,1,2
Bases,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.27246,0.160051,0.070034
1,0.817982,0.639822,0.258769
10,0.573141,0.394918,0.212705
11,0.850299,0.666888,0.258929
100,0.413526,0.262575,0.124451
101,0.813757,0.60707,0.25572
110,0.609078,0.402913,0.221486
111,0.839378,0.657413,0.332249


The run probability table shows the probability of scoring in decimal form, so numbers closer to 0 are less situations in which teams scored at least one run while numbers closer to 1 are more such situations. We can mentally multiply if we want to deal with percentages for ease of communication.

In [16]:
RP24_2022 = compute_RP24(startyear = 2022, endyear = 2022)
RP24_2022

Outs,0,1,2
Bases,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.258793,0.152229,0.066916
1,0.81407,0.654047,0.267521
10,0.567189,0.388803,0.206855
11,0.827258,0.66283,0.246503
100,0.410224,0.261176,0.118197
101,0.832623,0.63856,0.269485
110,0.598619,0.403459,0.218069
111,0.825633,0.647952,0.322447


In [17]:
RP24_2023 = compute_RP24(startyear = 2023, endyear = 2023)
RP24_2023

Outs,0,1,2
Bases,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.277172,0.164397,0.070469
1,0.835886,0.628002,0.24878
10,0.582916,0.411733,0.210458
11,0.803468,0.659987,0.241131
100,0.41758,0.273208,0.132125
101,0.84,0.631206,0.283135
110,0.594504,0.417143,0.235949
111,0.806971,0.652847,0.327715


The most common sacrifice bunt attempt is with a runner on first base and no outs. We can find out if the chances of scoring increase after a bunt (which we assume is successful) and advancing the runner from first base to second by looking at the corresponding cells in the Run Probability table.

As with the Run Expectancy table, the cells highlighted green shows the cells involved with bunting a runner from first base and 0 out to second base.

| Year | Run Probability with Runner on 1st, 0 out | Run Probability with Runner on 2nd, 1 out | Run Probability Net Change |
| --- | --- | --- | --- |
| **2021, 0 out** | 0.413526 | 0.394918 | -0.018608 |
| **2022, 0 out** | 0.410224 | 0.388803 | -0.021421 |
| **2023, 0 out** | 0.417580 | 0.411733 | -0.005847 |

From these results, we see that in addition to run expectancy, that the chances of scoring even one run also decrease during a sacrifice bunt, though these are fairly small changes.

One interesting scenario is bunting with runners on both first and second base and 0 outs, which we can see below (and highlighted in yellow on the Google Sheet):

| Year | Run Probability with Runner on 1st and 2nd, 0 out | Run Probability with Runner on 2nd and 3rd, 1 out | Run Probability Net Change |
| --- | --- | --- | --- |
| **2021, 0 out** | 0.609078 | 0.666888 | +0.05781 |
| **2022, 0 out** | 0.598619 | 0.662830 | +0.064211 |
| **2023, 0 out** | 0.594504 | 0.659987 | +0.065483 |

We can see that the run probability actually does increase by about 5%-6%. During each of these three years, sacrificing an out to advance two runners has actually put teams in a better position to score at least one run.

However, with run expectancy we noticed that it was not worth it to bunt due to the negative run expectancy change, so what should a team do when they have runners on first and second with 0 outs? Should they trust the run expectancy, which makes a decision based on the average number of runs scored in such situations, or should they trust the run probability, which goes by the percentage of instances which teams scored in such situations?

The answer is that it depends on the score and point in the game. Teams would keep in mind that all of this data are under the general assumption that teams are trying to score as many runs as possible in most innings. In scenarios where teams are trying to maximize scoring, they would likely choose not to bunt because run expectancy shows that teams score less runs on average when they bunt. In scenarios where teams just need one run to tie or win the game, they might strongly consider the bunt because the probability of scoring increases if the batter can record one out and advance the runners to second and third.

Of course, a team will strongly consider other factors, such as the lefty-righty matchup, the quality of the batting and pitching, and many other situational factors that may be of importance. But looking at these two analytical suggestions for this decision alone, we find a way to incorporate this data into the game.

This is a good example in which teams must make tough decisions and the data conflict with one another. I believe it is generally a good guideline to use the data as a suggestion and adapt the decisions to the situation using experience and baseball situational awareness.