Satchel

An open source model for projecting MLB season outcomes

Description

Satchel is a simple MLB projection model meant to provide a basic forecast with relatively little effort. Much like the Marcel The Monkey Forecasting System, it is not the most advanced projection method and its projections shouldn't be treated as the word of God. I like to describe it as a minimum effort projection. Satchel essentially boils each game in a given season down to a weighted coin flip based on the amount of talent on each team's 40-man roster and then flips that coin thousands of times.

Team talent is calculated by summing up the total WAR on their 40-man roster given by FanGraphs' Depth Charts projections, then dividing that number by the mean (or median if you so chose) total for the league. By default, Satchel uses the Bradley-Terry model to calculate probabilities. With this model, the probability that team A beats team B is:

P(A Wins) = exp(T_A) ÷ [exp(T_A) + exp(T_B)]

T_i is the talent level of team i.

Satchel also supports Elo-style probability calculations where the probability team A beats team B is instead given by:

1 ÷ (1 + 10^{[T_A - T_B] / 400})

Once each team's talent has been calculated, Satchel will simulate each season 10,000 times (the user can change that number if they wish), adding random noise to each team's talent each time. The results are then averaged to come up with the final projected win/loss totals and the probability each team win the wild card, division, World Series, etc.

How to use Satchel

Using Satchel is straightforward. Create an instance of the model class, tell it to run, then wait. Running the model will return a new class object, SatchelResults, which contains the results of the simulation and a few fun methods for analyzing those results. To get started, you only need three lines of code:

from satchel.model import Satchel

model = Satchel()
results = model.simulate()

The Satchel class supports a number of optional arguments to customize your model.

talent_measure: str "mean" or "median". Each team's total WAR will be compared to the league's talent_measure to determine their talent value
transactions: dict Dictionary containing any transactions to include in the simulation. The format of the dictionary should be: {player_fangraphs_id: {"team": new_team, "date": effective_date}}. The player_fangraphs_id variable is the ID FanGraphs has assigned to that player. The new_team is the three-letter abbreviation for the team the player is being traded to. The effective_date variable should be formatted YYYY-MM-DD.
noise: bool: If true, random noise will be added to each team's talent measure during the simulation
seed : int or float: seed used for random draws, by default None
steamer_p_wt: float: Weight placed on steamer pitcher projections
zips_p_wt: float: Weight placed on ZIPs pitcher projections
steamer_b_wt: float: Weight placed on steamer batter projections
zips_b_wt: float: Weight placed on ZIPs batter projections
schedule: Path, str: Path to a CSV with the season schedule
pitcher_proj: Path, str: Path to a CSV with pitcher WAR projections suitable for Satchel
batter_proj: Path, str: Path to a CSV with batter WAR projections suitable for Satchel
use_current_results: bool: If true, Satchel will simulate the season from today's date and add those results to each team's current record. This includes using both the team's records and the player's stats on the season in the talent ca* lculations. If false, Satchel will simulate the full season using the provided schedule and pre-season projections
war_method: str: Method used for calculating all player's remaining WAR. If only_projections a player's final WAR will be their WAR to date plus their projected WAR multiplied by the fraction of the season remaining. If current_pace, it will be their current WAR plus their projected WAR multiplied by the remaining fraction of the season and their relative production rate. The latter is calculated by multiplying their projection by the fraction of the season already played and dividing their WAR to date by that number
cache: bool: If true, the new scheudle generated will be cached

After running the model, the SatchelResults class has a number of methods and attributes for summarizing the results.

Attributes

al_central: A Data Frame summarizing the results of the AL Central teams
al_east: A Data Frame summarizing the results of the AL East teams
al_west: A Data Frame summarizing the results of the AL West teams
base_talent: A DataFrame containing each team's base talemt, i.e., their talent without any noise added.
div_counter: Counts the number of times each team won their division in the simulations
full_seasons: A list of DataFrames, each representing a simulated season.
league_counter: Counts the number of times each team won their league in the simulations.
merged_schedule: If the simulations are run mid-season and current results are merged in, this will be a DataFrame with the season-to-date results and remaining schedule merged together.
n: The number of times the simulations were run.
nl_central: A Data Frame summarizing the results of the NL Central teams
nl_east: A Data Frame summarizing the results of the NL East teams
nl_west: A Data Frame summarizing the results of the NL West teams
noise: The noise added to each team's talent in each simulation.
playoff_counter: Counts the number of times each team made the playoffs in the simulations.
playoff_matchups: A DataFrame containing all the playoff matchups in each simulation.
results_df: A DataFrame containing the full results of each simulation
schedule: A DataFrame with the schedule used in the simulations.
season_summary: A DataFrame summarizing the results.
seed: The random seed used in the simulation.
trades: A dictionary with all the trades used in the simulations.
wc_counter: Counts the number of times each team won a wild card spot in the simulations.
ws_counter: Counts the number of times each team won the World Series in the simulations.

Methods

boxplot(): Creates a boxplot that shows the distribution of wins for each team.
season_to_date(): Creates a table that shows season results to date, and the projected remainder of the season.
season_percentile(team, wins): Returns the percentile of the wins distribution for the specified team the given number of wins would be.
results_dist_chart(team, cmap): Creates a bar chart showing the distribution of results for the specified team.
results_grid(): Creates a grid of bar charts showing the distribution of results for each team.
results_scatter(team, offset, y): Creates a scatter plot that shows the distribution of results for the specified team.

General remarks

Despite its methodological simplicity, Satchel does a pretty good job. When I compared it to FanGraphs' projections in 2021, it held its own. And it can be used for fun experiments like trying to get the Angels to the postseason (a very hard task). Is this whole thing a bit overdone for how simple the model is? Possibly. And I don't care.

Name		Name	Last commit message	Last commit date
Latest commit History 501 Commits
.github/workflows		.github/workflows
2023projections		2023projections
2024projections		2024projections
fgprojections		fgprojections
satchel		satchel
.gitignore		.gitignore
README.md		README.md
createpickles.py		createpickles.py
environment.yml		environment.yml
fetchfgtables.py		fetchfgtables.py
get40man.py		get40man.py
pullprojections.py		pullprojections.py
requirements.txt		requirements.txt
satchelexample.ipynb		satchelexample.ipynb

andersonfrailey/satchel

Folders and files

Latest commit

History

Repository files navigation

Satchel

Description

How to use Satchel

General remarks

About

Resources

Stars

Watchers

Forks

Languages