# Major League Baseball Strength of Schedule

The 2025 MLB season began on March 25th in Tokyo, Japan when the Chicago Cubs played the defending World Series champions the Los Angeles Dodgers. Starting the season against the defending champs can be a tall task, and unfortunately for Cubs their schedule was expected to continue to be tough well into April and May. The Cubs opponents were a common talking point during Cubs broadcasts. It was often stated that the Cubs had the hardest strength of schedule (SOS) to open the 2025 season. Every time the broadcast began talking on SOS, I would ask myself a few questions: 
1. How is strengh of schedule measured
2. How much harder is the Cubs schedule
3. How does the 2025 season compare to previous seasons

These questions led me to develop the following research questions
1. Opponents previous season win percentage
2. Opponents previous season run differential
3. Who had the hardest 2025 opening schedule? What about easiest
4. Distribution of previous years opponents win% 

In [47]:
import pandas as pd
import numpy as np

df_game_data = pd.read_csv("./clean_data/game_results.csv").drop(columns=['H/A', 'win_loss', 'game_time', 'day_night', 'attendance','game_length',])
df_season_data = pd.read_csv("./clean_data/season_results.csv").fillna({'teams': 'LAA'})

## Description of Data

Add description of both tables here

In [48]:
print(f'Dimensions of game data: {df_game_data.shape}')
print(f'Dimensions of season data: {df_season_data.shape}')

Dimensions of game data: (46740, 7)
Dimensions of season data: (330, 7)


There was one blank value in the `df_season_data` caused by Los Angeles Angels because they changed their name

In [49]:
df_season_run_differential= (df_game_data
                             .assign(run_differential = df_game_data['runs_scored']-df_game_data['runs_allowed'])
                            .groupby(['team', 'season'])
                            .agg('sum')
                            .drop(columns='game_number')
                            .reset_index()
                            .replace({'team': 'OAK'}, 'ATH')
                            )

#adding the run scored, allowed, and differential to the season data
df_season_data = pd.merge(df_season_data, df_season_run_differential, left_on=['teams', 'season'], right_on=['team', 'season']).drop(columns=['team'])


## Merge description

In [56]:
#join data on team and year
# need to offset the season in the game data

df_game_data = df_game_data.assign(previous_season=lambda x: x.season-1) 

df_merge_data = (
    pd.merge(df_game_data, df_season_data.loc[:, ['teams', 'season', 'win_%', 'run_differential']], how='left', left_on=['opponent', 'previous_season'], right_on=['teams', 'season'])
    .dropna(subset=['teams'])
    .drop(columns=['season_y','previous_season', 'teams'])
    .rename(columns={'win_%': 'opponent_win%', 'season_x': 'season', 'run_differential': 'opponent_run_differential'})
)
#6214 records => good

df_merge_data.head()

Unnamed: 0,date,team,opponent,runs_scored,runs_allowed,game_number,season,opponent_win%,opponent_run_differential
4858,2016-04-04,ARI,COL,5,10,1,2016,0.42,-107.0
4859,2016-04-05,ARI,COL,11,6,2,2016,0.42,-107.0
4860,2016-04-06,ARI,COL,3,4,3,2016,0.42,-107.0
4861,2016-04-07,ARI,CHC,6,14,4,2016,0.599,81.0
4862,2016-04-08,ARI,CHC,3,2,5,2016,0.599,81.0


In [70]:
(df_merge_data
 .loc[df_merge_data['game_number']<=40]
 .groupby(['team', 'season'])
 .agg(
     runs_scored = ('runs_scored', 'sum'),
     runs_allowed = ('runs_allowed', 'sum'),
     opponent_win= ('opponent_win%', 'mean'),
     opponent_run_differential = ('opponent_run_differential', 'mean'),
    )
.round(4)
.reset_index()
#.columns(['team', 'season','runs_scored', 'runs_allowed', 'opponent_win%', 'opponent_run_differential'])
.sort_values(by ='opponent_win', ascending=False)
)

Unnamed: 0,team,season,runs_scored,runs_allowed,opponent_win,opponent_run_differential
189,NYY,2024,178,124,0.5497,64.8333
98,DET,2023,143,183,0.5482,56.4500
286,TOR,2022,132,143,0.5468,56.0811
2,ARI,2018,164,136,0.5454,63.3000
156,MIL,2021,143,160,0.5453,21.6500
...,...,...,...,...,...,...
184,NYY,2019,200,164,0.4599,-61.4500
40,BOS,2025,192,174,0.4594,-62.7000
34,BOS,2019,179,165,0.4506,-88.2121
165,MIN,2020,175,140,0.4480,-94.8750
