# 2024 Season Simulation

This program simulates the 2024 regular season and playoffs based on fWAR projections. Of note, it considers teams' rosters as they are currently constructed, excluding free agents in its base simulation. It then creates a tool by which players can be added to rosters, allowing users to observe the changes in simulated results. This can be a useful tool in understanding the effects of various acquisitions and losses. Rather than observing the effects on a team individually, it allows the changes to manifest in the context of divisions and playoff structures. 

Authors: 
Ben Scartz, Pat Kavanagh

In [1]:
import pandas as pd
import numpy as np
import random as ran

### Data Collection

In [2]:
# Read Hitter Projections
hitter_proj = pd.read_excel('roster_resource.xlsx', sheet_name='Hitter Projections')
# Select and rename columns for Hitters
hitter_war = hitter_proj[['Name', 'WAR']].rename(columns={'Name': 'player_name'})

# Read Pitcher Projections
pitcher_proj = pd.read_excel('roster_resource.xlsx', sheet_name='Pitcher Projections')
# Select and rename columns for Pitchers
pitcher_war = pitcher_proj[['Name', 'WAR']].rename(columns={'Name': 'player_name'})

# Read Hitter Projections
hitter_proj = pd.read_excel('roster_resource.xlsx', sheet_name='Hitter Projections')
# Select and rename columns for Hitters
hitter_war = hitter_proj[['Name', 'WAR']].rename(columns={'Name': 'player_name'})

# Read team roles
team_roles = pd.read_csv('team_roles.csv').drop(columns=['Unnamed: 0'])



In [3]:
hitter_war.head()

Unnamed: 0,player_name,WAR
0,Ronald Acuña Jr.,7.4
1,Juan Soto,6.4
2,Aaron Judge,6.2
3,Mookie Betts,6.0
4,Adley Rutschman,5.9


In [4]:
team_roles.head()

Unnamed: 0,team,role,player_name
0,ARI,LF,Corbin Carroll
1,ARI,2B,Ketel Marte
2,ARI,3B,Eugenio Suárez
3,ARI,1B,Christian Walker
4,ARI,CF,Alek Thomas


In [11]:
team_role_war = pd.merge(team_roles, hitter_war, on='player_name', how='left')
team_role_war = pd.merge(team_role_war, pitcher_war, on='player_name', how='left')
team_role_war['WAR'] = np.where(pd.isna(team_role_war['WAR_x']), team_role_war['WAR_y'], team_role_war['WAR_x'])
team_role_war = team_role_war.drop(columns=['WAR_x', 'WAR_y'])
war_by_team = team_role_war.groupby('team')['WAR'].sum().reset_index()
# League Structure
league_structure = pd.read_csv('league_structure.csv').drop(columns=['Unnamed: 0'])

# Merge
war_by_team = pd.merge(war_by_team, league_structure, on='team', how='left')

war_by_team

Unnamed: 0,team,WAR,league,division
0,ARI,32.2,NL,West
1,ATL,50.0,NL,East
2,BAL,36.7,AL,East
3,BOS,31.8,AL,East
4,CHC,30.1,NL,Central
5,CHW,18.4,AL,Central
6,CIN,29.8,NL,Central
7,CLE,33.2,AL,Central
8,COL,14.0,NL,West
9,DET,28.2,AL,Central


* At this point in time (December 1, 2023), the simulation will bias toward teams with established rosters rather than those who plan to add through free agency.

### Regular Season simulation function

In [7]:
def calculate_win_counts(war_by_team):
    teams = war_by_team['team'].tolist()

    # Shuffle the teams randomly
    ran.shuffle(teams)

    # Split the shuffled teams into three lists of two
    team_lists = [teams[i:i+2] for i in range(0, len(teams), 2)]

    # Dictionary to store win counts
    win_counts = {team: 0 for team in teams}

    # Loop 162 times
    for _ in range(162):
        for team_list in team_lists:
            team1 = team_list[0]
            team2 = team_list[1]

            team1war = war_by_team[war_by_team['team'] == team1]['WAR'].sum()
            team2war = war_by_team[war_by_team['team'] == team2]['WAR'].sum()

            weights = [(team1war / (team1war + team2war)), (team2war / (team2war + team1war))]

            winner = ran.choices([team1, team2], weights=weights)[0]

            # Increment win count for the winner
            win_counts[winner] += 1

    # Create a DataFrame from the win counts
    win_df = pd.DataFrame(list(win_counts.items()), columns=['team', 'wins'])

    win_df = win_df.sort_values('wins', ascending=False).reset_index(drop=True)

    # League Structure
    league_structure = pd.read_csv('league_structure.csv').drop(columns=['Unnamed: 0'])

    # Merge
    win_df = pd.merge(win_df, league_structure, on='team', how='left')

    return win_df



In [46]:
# Example simulation

calculate_win_counts(war_by_team)

Unnamed: 0,team,wins,league,division
0,HOU,116,AL,West
1,TOR,108,AL,East
2,ATL,106,NL,East
3,BOS,104,AL,East
4,ARI,103,NL,West
5,LAD,102,NL,West
6,TBR,99,AL,East
7,CIN,96,NL,Central
8,SDP,89,NL,West
9,MIN,87,AL,Central


### Playoff Simulation Functions

In [47]:
# Randomly simulate a series of a given length, with random selection weighted by regular season wins.
def series_weighted(Team1, Team1regwins, Team2, Team2regwins, seriesLength):
    Team1wins = 0
    Team2wins = 0
    weights = [(Team1regwins / (Team1regwins + Team2regwins)), (Team2regwins / (Team2regwins + Team1regwins)) ]
    needed = seriesLength // 2 + 1 # wins needed to exit loop and end series

    while Team1wins < needed and Team2wins < needed:
        winner = ran.choices([Team1, Team2], weights = weights, k=1) # select k elements from list
        if winner[0] == Team1:
            Team1wins += 1
            #print(f'{Team1} wins game {Team1wins + Team2wins}')
        else:
            Team2wins += 1
            #print(f'{Team2} wins game {Team2wins + Team1wins}')

    if Team1wins == needed:
        #print(f'{Team1} win the series {Team1wins} games to {Team2wins}.')
        seriesWinner = Team1
    else:
        #print(f'{Team2} win the series {Team2wins} games to {Team1wins}.')
        seriesWinner = Team2

    return seriesWinner

In [48]:
# Using current MLB playoff structure, take the 12 playoff teams and orchestrate all rounds and World Series
def playoff_weighted(NLteam1, NLteam1regwins,
                     NLteam2, NLteam2regwins,
                     NLteam3, NLteam3regwins,
                     NLteam4, NLteam4regwins,
                     NLteam5, NLteam5regwins,
                     NLteam6, NLteam6regwins,
                     ALteam1, ALteam1regwins,
                     ALteam2, ALteam2regwins,
                     ALteam3, ALteam3regwins,
                     ALteam4, ALteam4regwins,
                     ALteam5, ALteam5regwins,
                     ALteam6, ALteam6regwins):
    # Dictionary stores win totals for weights with corresponding team
    winsDict = {NLteam1: NLteam1regwins,
                NLteam2: NLteam2regwins,
                NLteam3: NLteam3regwins,
                NLteam4: NLteam4regwins,
                NLteam5: NLteam5regwins,
                NLteam6: NLteam6regwins,
                ALteam1: ALteam1regwins,
                ALteam2: ALteam2regwins,
                ALteam3: ALteam3regwins,
                ALteam4: ALteam4regwins,
                ALteam5: ALteam5regwins,
                ALteam6: ALteam6regwins}
    
    # Wild Cards
    NLWCwinner1 = series_weighted(NLteam6, NLteam6regwins, NLteam3, NLteam3regwins, 3)
    NLWCwinner2 = series_weighted(NLteam5, NLteam5regwins, NLteam4, NLteam4regwins, 3)
    ALWCwinner1 = series_weighted(ALteam6, ALteam6regwins, ALteam3, ALteam3regwins, 3)
    ALWCwinner2 = series_weighted(ALteam5, ALteam5regwins, ALteam4, ALteam4regwins, 3)

    # Division Series
    NLDSwinner1 = series_weighted(NLteam2, NLteam2regwins, NLWCwinner1, winsDict[NLWCwinner1], 5)
    NLDSwinner2 = series_weighted(NLteam1, NLteam1regwins, NLWCwinner2, winsDict[NLWCwinner2], 5)
    ALDSwinner1 = series_weighted(ALteam2, ALteam2regwins, ALWCwinner1, winsDict[ALWCwinner1], 5)
    ALDSwinner2 = series_weighted(ALteam1, ALteam1regwins, ALWCwinner2, winsDict[ALWCwinner2], 5)

    # League Championship Series
    NLCSwinner = series_weighted(NLDSwinner1, winsDict[NLDSwinner1], NLDSwinner2, winsDict[NLDSwinner2], 7)
    ALCSwinner = series_weighted(ALDSwinner1, winsDict[ALDSwinner1], ALDSwinner2, winsDict[ALDSwinner2], 7)

    # World Series
    WSwinner = series_weighted(NLCSwinner, winsDict[NLCSwinner], ALCSwinner, winsDict[ALCSwinner], 7)

    return WSwinner

### Regular season + playoff sim

In [49]:
def season_sim(war_by_team): #war_by_team is any table of team war totals

    #######################  Calculate regular season win totals for each team
    league_wins = calculate_win_counts(war_by_team)

    #######################  Determine division winners
    
    # National League
    NLEastwins = league_wins[(league_wins['league'] == 'NL') & (league_wins['division'] == 'East')]
    NLEastWinner = NLEastwins.loc[NLEastwins['wins'].idxmax(), 'team']

    NLCentralwins = league_wins[(league_wins['league'] == 'NL') & (league_wins['division'] == 'Central')]
    NLCentralWinner = NLCentralwins.loc[NLCentralwins['wins'].idxmax(), 'team']

    NLWestwins = league_wins[(league_wins['league'] == 'NL') & (league_wins['division'] == 'West')]
    NLWestWinner = NLWestwins.loc[NLWestwins['wins'].idxmax(), 'team']

    # American League
    ALEastwins = league_wins[(league_wins['league'] == 'AL') & (league_wins['division'] == 'East')]
    ALEastWinner = ALEastwins.loc[ALEastwins['wins'].idxmax(), 'team']

    ALCentralwins = league_wins[(league_wins['league'] == 'AL') & (league_wins['division'] == 'Central')]
    ALCentralWinner = ALCentralwins.loc[ALCentralwins['wins'].idxmax(), 'team']

    ALWestwins = league_wins[(league_wins['league'] == 'AL') & (league_wins['division'] == 'West')]
    ALWestWinner = ALWestwins.loc[ALWestwins['wins'].idxmax(), 'team']

    ########################  Determine Wild Cards

    NLdivision_winners = [NLEastWinner, NLCentralWinner, NLWestWinner]
    # Filter to National League
    national = league_wins[league_wins['league'] == 'NL'] 
    # Remove division winners
    NLwild_cards1 = national[~national['team'].isin(NLdivision_winners)]
    # Most wins among remaining teams
    NLwild_card1 = NLwild_cards1.loc[NLwild_cards1['wins'].idxmax(), 'team']
    # Remove first wild card, most wins remaining is second wild card
    NLwild_cards2 = NLwild_cards1[~NLwild_cards1['team'].isin([NLwild_card1])]
    NLwild_card2 = NLwild_cards2.loc[NLwild_cards2['wins'].idxmax(), 'team']
    # Remove second wild card, most wins remaining is third wild card
    NLwild_cards3 = NLwild_cards2[~NLwild_cards2['team'].isin([NLwild_card2])]
    NLwild_card3 = NLwild_cards3.loc[NLwild_cards3['wins'].idxmax(), 'team']

    # Repeat for American League
    ALdivision_winners = [ALEastWinner, ALCentralWinner, ALWestWinner]
    american = league_wins[league_wins['league'] == 'AL']

    ALwild_cards1 = american[~american['team'].isin(ALdivision_winners)]
    ALwild_card1 = ALwild_cards1.loc[ALwild_cards1['wins'].idxmax(), 'team']

    ALwild_cards2 = ALwild_cards1[~ALwild_cards1['team'].isin([ALwild_card1])]
    ALwild_card2 = ALwild_cards2.loc[ALwild_cards2['wins'].idxmax(), 'team']

    ALwild_cards3 = ALwild_cards2[~ALwild_cards2['team'].isin([ALwild_card2])]
    ALwild_card3 = ALwild_cards3.loc[ALwild_cards3['wins'].idxmax(), 'team']


    ########################  Seed teams by most wins, regardless of division winner / wild card

    # List of playoff teams
    playoff_teams_national = [NLEastWinner, NLCentralWinner, NLWestWinner, NLwild_card1, NLwild_card2, NLwild_card3]
    # Get win totals of playoff teams
    NLplayoff_teams = league_wins[league_wins['team'].isin(playoff_teams_national)]
    # Order by wins and use indeces. (this is a more efficient approach than what is used above)
    NLplayoff_teams = NLplayoff_teams.sort_values(by='wins', ascending=False)
    # Select the first row as the #1 seed. Record corresponding wins
    NLteam1 = NLplayoff_teams.loc[NLplayoff_teams.index[0], 'team']
    NLteam1regwins = NLplayoff_teams.loc[NLplayoff_teams.index[0], 'wins']

    NLteam2 = NLplayoff_teams.loc[NLplayoff_teams.index[1], 'team']
    NLteam2regwins = NLplayoff_teams.loc[NLplayoff_teams.index[1], 'wins']

    NLteam3 = NLplayoff_teams.loc[NLplayoff_teams.index[2], 'team']
    NLteam3regwins = NLplayoff_teams.loc[NLplayoff_teams.index[2], 'wins']

    NLteam4 = NLplayoff_teams.loc[NLplayoff_teams.index[3], 'team']
    NLteam4regwins = NLplayoff_teams.loc[NLplayoff_teams.index[3], 'wins']

    NLteam5 = NLplayoff_teams.loc[NLplayoff_teams.index[4], 'team']
    NLteam5regwins = NLplayoff_teams.loc[NLplayoff_teams.index[5], 'wins']

    NLteam6 = NLplayoff_teams.loc[NLplayoff_teams.index[5], 'team']
    NLteam6regwins = NLplayoff_teams.loc[NLplayoff_teams.index[5], 'wins']

    # Repeat for American League
    playoff_teams_american = [ALEastWinner, ALCentralWinner, ALWestWinner, ALwild_card1, ALwild_card2, ALwild_card3]
    ALplayoff_teams = league_wins[league_wins['team'].isin(playoff_teams_american)]

    ALplayoff_teams = ALplayoff_teams.sort_values(by='wins', ascending=False)

    ALteam1 = ALplayoff_teams.loc[ALplayoff_teams.index[0], 'team']
    ALteam1regwins = ALplayoff_teams.loc[ALplayoff_teams.index[0], 'wins']

    ALteam2 = ALplayoff_teams.loc[ALplayoff_teams.index[1], 'team']
    ALteam2regwins = ALplayoff_teams.loc[ALplayoff_teams.index[1], 'wins']

    ALteam3 = ALplayoff_teams.loc[ALplayoff_teams.index[2], 'team']
    ALteam3regwins = ALplayoff_teams.loc[ALplayoff_teams.index[2], 'wins']

    ALteam4 = ALplayoff_teams.loc[ALplayoff_teams.index[3], 'team']
    ALteam4regwins = ALplayoff_teams.loc[ALplayoff_teams.index[3], 'wins']

    ALteam5 = ALplayoff_teams.loc[ALplayoff_teams.index[4], 'team']
    ALteam5regwins = ALplayoff_teams.loc[ALplayoff_teams.index[5], 'wins']

    ALteam6 = ALplayoff_teams.loc[ALplayoff_teams.index[5], 'team']
    ALteam6regwins = ALplayoff_teams.loc[ALplayoff_teams.index[5], 'wins']

    ############################ Apply playoff function to constructed playoff field

    WSChamp = playoff_weighted(NLteam1, NLteam1regwins,
                        NLteam2, NLteam2regwins,
                        NLteam3, NLteam3regwins,
                        NLteam4, NLteam4regwins,
                        NLteam5, NLteam5regwins,
                        NLteam6, NLteam6regwins,
                        ALteam1, ALteam1regwins,
                        ALteam2, ALteam2regwins,
                        ALteam3, ALteam3regwins,
                        ALteam4, ALteam4regwins,
                        ALteam5, ALteam5regwins,
                        ALteam6, ALteam6regwins)
    
    # result list stores not only World Series winner, but also all playoff teams for the given simulation
    result = [WSChamp, NLteam1, NLteam2, NLteam3, NLteam4, NLteam5, NLteam6, 
            ALteam1, ALteam2, ALteam3, ALteam4, ALteam5, ALteam6]

    return result



In [51]:
# Example simulation

season_sim(war_by_team)

['HOU',
 'PHI',
 'ATL',
 'NYM',
 'ARI',
 'CIN',
 'LAD',
 'BAL',
 'TOR',
 'CLE',
 'DET',
 'SEA',
 'HOU']

In [76]:
# [0] is the World Series winner
CHAMP = season_sim(war_by_team = war_by_team)[0]

CHAMP

'ATL'

### Iterations

In [52]:
def season_sim_iterated(iterations, war_by_team):

    # Start with empty counts of World Series wins and playoff appearances
    champ_counts = {}
    playoff_counts = {}
    # i keeps track of iterations in loop
    i = 1
    # iterations comes from user
    iterations = iterations

    for i in range(iterations):

        sim = season_sim(war_by_team = war_by_team)
        champ = sim[0] # World Series winner 
        # Add one to World Seriese winner row in table
        if champ in champ_counts:
            champ_counts[champ] += 1
        else:
            # Add a new row if first win of simulation
            champ_counts[champ] = 1

        playoff = sim[1:12] # All playoff teams (including WS winner) are credited in the same way 

        for j in playoff: 
            if j in playoff_counts:
                playoff_counts[j] += 1
            else: 
                playoff_counts[j] = 1
            
            if j not in champ_counts:
                # Creates a new row for WS array if no WS champ wins so that all arrays have the same length
                champ_counts[j] = 0
                
    # Combine arrays into data frame
    dataFrame = pd.DataFrame({'Team': list(champ_counts.keys()), 'World Series': list(champ_counts.values()),
                              'Playoffs': list(playoff_counts.values())})
    # Calculate percentages and round
    dataFrame['WS Percentage'] = round(dataFrame['World Series'] / iterations * 100,1)
    dataFrame['Playoff Percentage'] = round(dataFrame['Playoffs'] / iterations * 100,1)
    dataFrame = dataFrame.sort_values(by = 'World Series', ascending = False)

    return(dataFrame)

In [79]:
sim_table = season_sim_iterated(10000, war_by_team)

sim_table

Unnamed: 0,Team,World Series,Playoffs,WS Percentage,Playoff Percentage
5,ATL,1068,3662,10.7,36.6
10,HOU,677,6736,6.8,67.4
0,TBR,603,6944,6.0,69.4
1,PHI,571,3728,5.7,37.3
13,STL,563,3691,5.6,36.9
4,LAD,528,9299,5.3,93.0
15,TOR,512,5319,5.1,53.2
22,NYY,482,4779,4.8,47.8
7,MIN,475,4676,4.8,46.8
12,SDP,461,6934,4.6,69.3


## Program to Edit Rosters

Ability to add free agents, trade players, etc.

In [53]:
# Fuction to place a player in a specified role
def change_player(player_name):

    # This retains the original team_roles file
    team_roles1 = team_roles.copy()

    # User tells program to what team and position to add given player
    which_team = input(f'To which team would you like to add {player_name}?')
    which_role = input(f'What position?')

    # Find the appropriate column
    result = team_roles1[(team_roles1['team'] == which_team) & (team_roles1['role'] == which_role)].index

    new_player_name = player_name
    # Change the name to the input name in the specified row
    team_roles1.loc[result, 'player_name'] = new_player_name

    # Recalculate team WAR totals with the new player included
    team_role_war1 = pd.merge(team_roles1, hitter_war, on='player_name', how='left')
    team_role_war1 = pd.merge(team_role_war1, pitcher_war, on='player_name', how='left')
    team_role_war1['WAR'] = np.where(pd.isna(team_role_war1['WAR_x']), team_role_war1['WAR_y'], team_role_war1['WAR_x'])
    team_role_war1 = team_role_war1.drop(columns=['WAR_x', 'WAR_y'])
    war_by_team1 = team_role_war1.groupby('team')['WAR'].sum().reset_index()

    # Create new ratings
    mean_war1 = war_by_team1['WAR'].mean()
    war_by_team1['percentage'] = war_by_team1['WAR'] / mean_war1 * 0.5
    # Add league structure
    league_structure = pd.read_csv('league_structure.csv').drop(columns=['Unnamed: 0'])
    war_by_team1 = pd.merge(war_by_team1, league_structure, on='team', how='left')

    # Return both war_by_team (for simulations) and team_role_war (for if the user wants to make more changes)
    return([team_role_war1, war_by_team1])


### Apply change_player()

Assign Shohei Ohtani to the Los Angeles Dodgers

In [55]:
war_by_team1 = change_player('Shohei Ohtani')[1]

# Input: To what team would you like to add Shohei Ohtani? 
# Response: LAD

# Input: What position?
# Response: DH

In [64]:
# Simulate with new WAR table

# For efficiency, I will only simulate 100 iterations

season_sim_iterated(100, war_by_team1)

Unnamed: 0,Team,World Series,Playoffs,WS Percentage,Playoff Percentage
1,ATL,16,61,16.0,61.0
2,PHI,7,68,7.0,68.0
4,LAD,7,78,7.0,78.0
9,HOU,7,71,7.0,71.0
6,MIN,6,46,6.0,46.0
20,BOS,5,33,5.0,33.0
7,TEX,5,35,5.0,35.0
8,TBR,5,59,5.0,59.0
11,STL,5,20,5.0,20.0
16,TOR,5,32,5.0,32.0


* When compared to the original sim, signing Shohei Ohtani increases the Dodgers' chance of winning the World Series by 2.0%.

Assign Matt Chapman to the Toronto Blue Jays

In [67]:
war_by_team2 = change_player('Matt Chapman')[1]

# Input: To what team would you like to add Matt Chapman? 
# Response: TOR

# Input: What position?
# Response: 3B

season_sim_iterated(100, war_by_team2)

Unnamed: 0,Team,World Series,Playoffs,WS Percentage,Playoff Percentage
0,ATL,13,90,13.0,90.0
7,TBR,9,61,9.0,61.0
5,PHI,7,70,7.0,70.0
6,HOU,7,68,7.0,68.0
4,STL,6,70,6.0,70.0
16,TOR,5,49,5.0,49.0
24,TEX,5,36,5.0,36.0
19,CHC,5,18,5.0,18.0
17,DET,5,37,5.0,37.0
21,CLE,4,31,4.0,31.0


* Compared to the original sim, re-signing Matt Chapman would not significantly change the Blue Jays' chances of winning the World Series.