# MLB Manager Predictive Analysis

### By David Montoto

## Abstract
This project utilizes Python to develop and implement a machine learning solution aimed at uncovering information within historical Major League Baseball (MLB) data. The primary objectives are the following: first, to create a predictive model that determines the likelihood of a manager's success based on various historical performance metrics; second, to analyze the impact of top players on overall team success. Using a dataset spanning from 1870 to 2016, we apply a range of machine learning techniques to build models for predicting managerial success and conducting regression analysis to explore the influence of key player performance. The results provide valuable insights into the factors driving team performance, offering practical implications for team management and strategic decision-making in MLB. Through detailed data preprocessing, exploratory data analysis, and rigorous model evaluation, this project demonstrates the effective use of machine learning in sports analytics.

## Goal
The goal of this assignment is to leverage Python to develop and implement a comprehensive machine learning project that involves building predictive models and conducting detailed data analysis. Specifically, the project aims to:

1. Predict Managerial Success: Create a predictive model to determine the likelihood of a manager's success based on historical MLB data, using various machine learning techniques

#### Data Cleaning and Preprocessing


In [265]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error

##### Load and Examine Data 

In [185]:
df = pd.read_csv('baseballdata.csv')

# Inspect first few rows
print(df.head())

# Inspect dataset info
print(df.info())

   Unnamed: 0  Rk  Year                    Tm       Lg    G   W   L  Ties  \
0           1   1  2016  Arizona Diamondbacks  NL West  162  69  93     0   
1           2   2  2015  Arizona Diamondbacks  NL West  162  79  83     0   
2           3   3  2014  Arizona Diamondbacks  NL West  162  64  98     0   
3           4   4  2013  Arizona Diamondbacks  NL West  162  81  81     0   
4           5   5  2012  Arizona Diamondbacks  NL West  162  81  81     0   

    W.L.  ...    R   RA Attendance BatAge  PAge  X.Bat X.P  \
0  0.426  ...  752  890  2,036,216   26.7  26.4     50  29   
1  0.488  ...  720  713  2,080,145   26.6  27.1     50  27   
2  0.395  ...  615  742  2,073,730   27.6  28.0     52  25   
3  0.500  ...  685  695  2,134,895   28.1  27.6     44  23   
4  0.500  ...  734  688  2,177,617   28.3  27.4     48  23   

            Top.Player                               Managers  \
0       J.Segura (5.7)                         C.Hale (69-93)   
1  P.Goldschmidt (8.8)            

### Handle Missing Values

In [187]:
# Check for missing values
print(df.isnull().sum())

Unnamed: 0       0
Rk               0
Year             0
Tm               0
Lg               0
G                0
W                0
L                0
Ties             0
W.L.             0
pythW.L.         0
Finish           0
GB               0
Playoffs      2163
R                0
RA               0
Attendance      74
BatAge           0
PAge             0
X.Bat            0
X.P              0
Top.Player       0
Managers         0
current          0
dtype: int64


### Change NULL in Playoffs Column to 'Did not make it'

We are going to be using Playoff Column for the predictive analysis by ranking manager's on all their post-season success or lackthereof. 

In [189]:
# Impute missing values in the 'Playoffs' column with 'Did not make it'
df['Playoffs'].fillna('Did not make it', inplace=True)

We will not be using Attendence column at all

In [191]:
# Drop unnecessary column
df = df.drop(columns=['Attendance'])

### Helper Functions for cleaning and preparing data 

1. extract_wins_losses: This function gets the wins and losses pertaining to each manager. Even though each row (season) has a total Wins and Losses column, some rows (seasons) had more than one coach, each with their own respective records for season
2. process_manager_row: This function is designed to iterate through rows which have more than one manager and create rows for each manager with their own respective records and playoff result.

In [203]:
def extract_wins_losses(record):
    # Find the index of the parentheses
    start = record.find('(')
    end = record.find(')')
    
    # Extract the content within parentheses
    if start != -1 and end != -1:
        record_content = record[start + 1:end]
        wins, losses = map(int, record_content.split('-'))
        return wins, losses
    return 0, 0

def process_manager_row(row):
    # Split managers by ',' or ' and '
    managers = [manager.strip() for part in row['Managers'].split(',') for manager in part.split(' and ')]
    
    # Extract manager records
    manager_records = [extract_wins_losses(manager) for manager in managers]
    
    rows = []
    for i, manager in enumerate(managers):
        manager_name = manager.split(' (')[0]
        wins, losses = manager_records[i]
        win_loss_record = f"{wins}-{losses}"
        
        if i == len(managers) - 1:  # Last manager gets the actual playoff result
            playoff_result = row['Playoffs']
        else:  # Other managers get 'Did not make it'
            playoff_result = 'Did not make it'
        
        # Append row for each manager
        rows.append({
            'Manager_Name': manager_name,
            'Wins': wins,
            'Losses': losses,
            'Win_Loss_Record': win_loss_record,
            'Playoff_Result': playoff_result,
            'Playoff_Score': calculate_playoff_score(playoff_result),
            'Year': row['Year'],  # Include other columns as needed
            'Team': row['current'],
        })
    
    return pd.DataFrame(rows)


### Call helper functions on whole df while ignoring index

In [205]:
df = pd.concat(df.apply(process_manager_row, axis=1).tolist(), ignore_index=True)
print(df.head())

  Manager_Name  Wins  Losses Win_Loss_Record   Playoff_Result  Playoff_Score  \
0       C.Hale    69      93           69-93  Did not make it              0   
1       C.Hale    79      83           79-83  Did not make it              0   
2     K.Gibson    63      96           63-96  Did not make it              0   
3   A.Trammell     1       2             1-2  Did not make it              0   
4     K.Gibson    81      81           81-81  Did not make it              0   

   Year                  Team  
0  2016  Arizona Diamondbacks  
1  2015  Arizona Diamondbacks  
2  2014  Arizona Diamondbacks  
3  2014  Arizona Diamondbacks  
4  2013  Arizona Diamondbacks  


In [207]:
total_unique_managers = df['Manager_Name'].nunique()
print(f"Total number of unique managers: {total_unique_managers}")

Total number of unique managers: 575


Checking to ensure that functions worked properly and Manager_Names are correct

In [209]:
# Get unique manager names
unique_managers = df['Manager_Name'].unique()

# Create a new DataFrame with unique managers
manager_df = pd.DataFrame(unique_managers, columns=['Manager_Name'])

# Reset index if needed
manager_df.reset_index(drop=True, inplace=True)

# Print the new DataFrame
print(manager_df)

     Manager_Name
0          C.Hale
1        K.Gibson
2      A.Trammell
3         A.Hinch
4        B.Melvin
..            ...
570  R.Hartsfield
571    M.Williams
572    T.Runnells
573     J.Fanning
574       K.Kuehl

[575 rows x 1 columns]


### 1. Calculate first of 4 Metrics for predicting managerial success: Individual Win/Loss Records

In [211]:
# Calculate Aggregated Win-Loss Data for Each Manager
manager_stats = df.groupby('Manager_Name').agg({
    'Wins': 'mean',
    'Losses': 'mean'
}).reset_index()

manager_stats['Win_Percentage'] = manager_stats['Wins'] / (manager_stats['Wins'] + manager_stats['Losses'])

# Rename columns in manager_stats to avoid overlap
manager_stats.rename(columns={'Wins': 'Avg_Wins', 'Losses': 'Avg_Losses'}, inplace=True)

# Add Aggregated Data to manager_df
manager_df = pd.merge(manager_df, manager_stats[['Manager_Name', 'Avg_Wins', 'Avg_Losses', 'Win_Percentage']], on='Manager_Name', how='left')

# Print the updated manager_df
print(manager_df.head(20))

   Manager_Name   Avg_Wins  Avg_Losses  Win_Percentage
0        C.Hale  74.000000   88.000000        0.456790
1      K.Gibson  70.600000   75.000000        0.484890
2    A.Trammell  46.750000   75.500000        0.382413
3       A.Hinch  64.750000   69.250000        0.483209
4      B.Melvin  73.461538   73.461538        0.500000
5      B.Brenly  75.750000   65.500000        0.536283
6    A.Pedrique  22.000000   61.000000        0.265060
7   B.Showalter  79.388889   73.055556        0.520773
8    F.Gonzalez  71.000000   69.200000        0.506419
9     B.Snitker  59.000000   65.000000        0.475806
10        B.Cox  86.344828   69.000000        0.555827
11      R.Nixon  46.200000   69.400000        0.399654
12     C.Tanner  71.157895   72.684211        0.494694
13       E.Haas  50.000000   71.000000        0.413223
14       B.Wine  16.000000   25.000000        0.390244
15      J.Torre  80.206897   68.862069        0.538052
16    D.Bristol  59.727273   69.454545        0.462350
17     T.T

Joe McCarthy is argued to be the best MLB Manager of all time due to his highest win percentage. This code is to ensure that win/loss records calculated from this dataset are equal to ones in ESPN database.

In [213]:
# Find the row for Joe McCarthy
joe_mccarthy_stats = manager_df[manager_df['Manager_Name'] == 'J.McCarthy']

# Print Joe McCarthy's statistics
print(joe_mccarthy_stats)

    Manager_Name   Avg_Wins  Avg_Losses  Win_Percentage
117   J.McCarthy  88.541667   55.541667        0.614517


### Identify all possible values in Playoff_Result before calculating second metric for managerial success 

In [217]:
unique_playoff_responses = df['Playoff_Result'].unique()

# Print the number of unique responses and the responses themselves
print(f"Number of unique playoff responses: {len(unique_playoff_responses)}")
print("Unique playoff responses:")
for response in unique_playoff_responses:
    print(response)

Number of unique playoff responses: 41
Unique playoff responses:
Did not make it
Lost LDS (3-2)
Lost NLCS (4-0)
Lost LDS (3-0)
Won WS (4-3)
Lost LDS (3-1)
Lost NLWC (1-0)
Lost NLCS (4-1)
Lost WS (4-0)
Lost NLCS (4-2)
Lost WS (4-2)
Won WS (4-2)
Lost WS (4-3)
Lost NLCS (3-0)
Won WS (4-0)
Won Series (5-0-1)
Lost ALWC (1-0)
Lost ALCS (4-0)
Lost ALCS (4-2)
Lost ALCS (4-1)
Won WS (4-1)
Lost ALCS (3-1)
Lost ALCS (3-2)
Lost WS (4-1)
Lost ALCS (4-3)
Won WS (5-3)
Lost NLCS (4-3)
Lost NLCS (3-2)
Won WS (4-0-1)
Tied in WS (3-3-1)
Lost WS (5-3)
Won WS (5-2)
Lost WS (4-0-1)
Lost ALCS (3-0)
Lost NLCS (3-1)
Lost WS (5-2)
Lost WS (6-3)
Won WS (6-3)
Won WS (6-4)
Lost WS (6-4)
Lost WS (10-5)


### Helper function designed to help make scoring managerial success in the post-season analytical

1. calculate_playoff_score: Analyzes how far each manager got in the playoffs every year, returning a score for their performance for each year

In [219]:
def calculate_playoff_score(playoff_result):
    if 'Lost ALWC' in playoff_result or 'Lost NLWC' in playoff_result:
        return 1
    elif 'Lost LDS' in playoff_result:
        return 2
    elif 'Lost ALCS' in playoff_result or 'Lost NLCS' in playoff_result:
        return 3
    elif 'Lost WS' in playoff_result or 'Tied in WS' in playoff_result:
        return 4
    elif 'Won WS' in playoff_result:
        return 5
    else: 
        return 0

test = calculate_playoff_score('Lost WS (6-4)')
print(test)

4


In [223]:
# Apply the function to the Playoffs column
df['Playoff_Score'] = df['Playoff_Result'].apply(calculate_playoff_score)
print(df[['Playoff_Result', 'Playoff_Score']].head(10)) 

    Playoff_Result  Playoff_Score
0  Did not make it              0
1  Did not make it              0
2  Did not make it              0
3  Did not make it              0
4  Did not make it              0
5  Did not make it              0
6   Lost LDS (3-2)              2
7  Did not make it              0
8  Did not make it              0
9  Did not make it              0


### 2. Calculate second metric used for predicting managerial success: Total_Playoff_Score for each manager

In [225]:
# Aggregate the Playoff Scores for Each Manager
playoff_scores = df.groupby('Manager_Name')['Playoff_Score'].sum().reset_index()

# Rename the column for clarity
playoff_scores.rename(columns={'Playoff_Score': 'Total_Playoff_Score'}, inplace=True)

# Merge the aggregated playoff scores with manager_df
manager_df = pd.merge(manager_df, playoff_scores, on='Manager_Name', how='left')

# Print the updated manager_df with playoff scores
print(manager_df.head(20))

   Manager_Name   Avg_Wins  Avg_Losses  Win_Percentage  Total_Playoff_Score
0        C.Hale  74.000000   88.000000        0.456790                    0
1      K.Gibson  70.600000   75.000000        0.484890                    2
2    A.Trammell  46.750000   75.500000        0.382413                    0
3       A.Hinch  64.750000   69.250000        0.483209                    2
4      B.Melvin  73.461538   73.461538        0.500000                    8
5      B.Brenly  75.750000   65.500000        0.536283                    7
6    A.Pedrique  22.000000   61.000000        0.265060                    0
7   B.Showalter  79.388889   73.055556        0.520773                   10
8    F.Gonzalez  71.000000   69.200000        0.506419                    3
9     B.Snitker  59.000000   65.000000        0.475806                    0
10        B.Cox  86.344828   69.000000        0.555827                   48
11      R.Nixon  46.200000   69.400000        0.399654                    0
12     C.Tan

Checking who hasa the highest playoff score. Joe Torre is one of MLB's greatest managers and well in the top ten in various manager rankings.

In [227]:
# Find the manager with the highest playoff score
highest_playoff_score = manager_df['Total_Playoff_Score'].max()
highest_scoring_manager = manager_df[manager_df['Total_Playoff_Score'] == highest_playoff_score]

print(f"Highest Playoff Score: {highest_playoff_score}")
print("Manager(s) with the Highest Playoff Score:")
print(highest_scoring_manager[['Manager_Name', 'Total_Playoff_Score']])

Highest Playoff Score: 50
Manager(s) with the Highest Playoff Score:
   Manager_Name  Total_Playoff_Score
15      J.Torre                   50


### 3. Calculate third metric for predicting managerial success: Total amount of World Series won by each Manager

In [229]:
# Initialize a column for Total Championships Won in manager_df
manager_df['Total_Championships_Won'] = 0

# Count the championships for each manager
for index, row in df.iterrows():
    playoff_result = row['Playoff_Result']
    manager_names = [name.strip() for name in row['Manager_Name'].split(',')]
    
    if 'Won WS' in playoff_result:
        # Extract the number of championships won from the result
        try:
            # Find the part of the string that represents the number of games
            # Example: 'Won WS (4-2)', extract '4'
            num_championships = int(playoff_result.split(' ')[2].split('-')[0])
        except ValueError:
            num_championships = 1  # Default to 1 if the number cannot be parsed
        
        for manager in manager_names:
            if manager in manager_df['Manager_Name'].values:
                manager_df.loc[manager_df['Manager_Name'] == manager, 'Total_Championships_Won'] += num_championships

# Print the updated manager_df with Total Championships Won
print(manager_df[['Manager_Name', 'Total_Championships_Won']].head(20))

highest_championships = manager_df['Total_Championships_Won'].max()
highest_championship_manager = manager_df[manager_df['Total_Championships_Won'] == highest_championships]

print(f"Highest Total Championships Won: {highest_championships}")
print("Manager(s) with the Highest Total Championships Won:")
print(highest_championship_manager[['Manager_Name', 'Total_Championships_Won']])

   Manager_Name  Total_Championships_Won
0        C.Hale                        0
1      K.Gibson                        0
2    A.Trammell                        0
3       A.Hinch                        0
4      B.Melvin                        0
5      B.Brenly                        1
6    A.Pedrique                        0
7   B.Showalter                        0
8    F.Gonzalez                        0
9     B.Snitker                        0
10        B.Cox                        1
11      R.Nixon                        0
12     C.Tanner                        1
13       E.Haas                        0
14       B.Wine                        0
15      J.Torre                        4
16    D.Bristol                        0
17     T.Turner                        0
18     V.Benson                        0
19       C.King                        0
Highest Total Championships Won: 7
Manager(s) with the Highest Total Championships Won:
    Manager_Name  Total_Championships_Won
35     C.

Both Joe McCarthy and Casey Stengel are on ESPN's Top Ten MLB Managers of all time, each with 7 World Series Titles. This validates that we achieved the third metric correctly. 

In [239]:
print(df.head(20))

   Manager_Name  Wins  Losses Win_Loss_Record   Playoff_Result  Playoff_Score  \
0        C.Hale    69      93           69-93  Did not make it              0   
1        C.Hale    79      83           79-83  Did not make it              0   
2      K.Gibson    63      96           63-96  Did not make it              0   
3    A.Trammell     1       2             1-2  Did not make it              0   
4      K.Gibson    81      81           81-81  Did not make it              0   
5      K.Gibson    81      81           81-81  Did not make it              0   
6      K.Gibson    94      68           94-68   Lost LDS (3-2)              2   
7       A.Hinch    31      48           31-48  Did not make it              0   
8      K.Gibson    34      49           34-49  Did not make it              0   
9      B.Melvin    12      17           12-17  Did not make it              0   
10      A.Hinch    58      75           58-75  Did not make it              0   
11     B.Melvin    82      8

### 4. Calculate fourth and final metric for predicting managerial success: Manager Average in direct comparison to Team's historical average

In [243]:
# Calculate win percentages
df['Win_Percentage'] = df['Wins'] / (df['Wins'] + df['Losses'])

# Calculate the average win percentage for each team
team_averages = df.groupby('Team')['Win_Percentage'].mean().reset_index()
team_averages.columns = ['Team', 'Historical_Average_Win_Percentage']

# Merge the historical averages back into the main DataFrame
df = pd.merge(df, team_averages, on='Team', how='left')

# Calculate the relative performance metric
df['Relative_Performance'] = df['Win_Percentage'] - df['Historical_Average_Win_Percentage']

# Print the updated DataFrame, showing each manager with their own respective Relative_Performance 
print(df.head(20))

   Manager_Name  Wins  Losses Win_Loss_Record   Playoff_Result  Playoff_Score  \
0        C.Hale    69      93           69-93  Did not make it              0   
1        C.Hale    79      83           79-83  Did not make it              0   
2      K.Gibson    63      96           63-96  Did not make it              0   
3    A.Trammell     1       2             1-2  Did not make it              0   
4      K.Gibson    81      81           81-81  Did not make it              0   
5      K.Gibson    81      81           81-81  Did not make it              0   
6      K.Gibson    94      68           94-68   Lost LDS (3-2)              2   
7       A.Hinch    31      48           31-48  Did not make it              0   
8      K.Gibson    34      49           34-49  Did not make it              0   
9      B.Melvin    12      17           12-17  Did not make it              0   
10      A.Hinch    58      75           58-75  Did not make it              0   
11     B.Melvin    82      8

Average Relative Performance for each manager is calculated and then merged in to manager_df

In [245]:
# Calculate the average relative performance for each manager
manager_relative_performance = df.groupby('Manager_Name')['Relative_Performance'].mean().reset_index()
manager_relative_performance.columns = ['Manager_Name', 'Average_Relative_Performance']

# Merge the average relative performance back into the manager_df
manager_df = pd.merge(manager_df, manager_relative_performance, on='Manager_Name', how='left')

# Print the updated manager_df
print(manager_df.head(20))

   Manager_Name   Avg_Wins  Avg_Losses  Win_Percentage  Total_Playoff_Score  \
0        C.Hale  74.000000   88.000000        0.456790                    0   
1      K.Gibson  70.600000   75.000000        0.484890                    2   
2    A.Trammell  46.750000   75.500000        0.382413                    0   
3       A.Hinch  64.750000   69.250000        0.483209                    2   
4      B.Melvin  73.461538   73.461538        0.500000                    8   
5      B.Brenly  75.750000   65.500000        0.536283                    7   
6    A.Pedrique  22.000000   61.000000        0.265060                    0   
7   B.Showalter  79.388889   73.055556        0.520773                   10   
8    F.Gonzalez  71.000000   69.200000        0.506419                    3   
9     B.Snitker  59.000000   65.000000        0.475806                    0   
10        B.Cox  86.344828   69.000000        0.555827                   48   
11      R.Nixon  46.200000   69.400000        0.3996

### Normalize Metrics for better Predictive Model using scaler

In [251]:
# Scale the 'Average_Relative_Performance'
manager_df['Scaled_Relative_Performance'] = manager_df['Average_Relative_Performance'] * 100

# Create a MinMaxScaler object
scaler = MinMaxScaler()

columns_to_normalize = ['Win_Percentage', 'Total_Playoff_Score', 'Total_Championships_Won', 'Scaled_Relative_Performance']

# Apply the scaler to the selected columns 
manager_df[columns_to_normalize] = scaler.fit_transform(manager_df[columns_to_normalize])

print(manager_df.head(20))

   Manager_Name   Avg_Wins  Avg_Losses  Win_Percentage  Total_Playoff_Score  \
0        C.Hale  74.000000   88.000000        0.456790                 0.00   
1      K.Gibson  70.600000   75.000000        0.484890                 0.04   
2    A.Trammell  46.750000   75.500000        0.382413                 0.00   
3       A.Hinch  64.750000   69.250000        0.483209                 0.04   
4      B.Melvin  73.461538   73.461538        0.500000                 0.16   
5      B.Brenly  75.750000   65.500000        0.536283                 0.14   
6    A.Pedrique  22.000000   61.000000        0.265060                 0.00   
7   B.Showalter  79.388889   73.055556        0.520773                 0.20   
8    F.Gonzalez  71.000000   69.200000        0.506419                 0.06   
9     B.Snitker  59.000000   65.000000        0.475806                 0.00   
10        B.Cox  86.344828   69.000000        0.555827                 0.96   
11      R.Nixon  46.200000   69.400000        0.3996

### Create weights for Composite Score to be used in Model

In [273]:
# Define weights for each metric
weights = {
    'Win_Percentage': 0.3,
    'Total_Playoff_Score': 0.2,
    'Total_Championships_Won': 0.3,
    'Average_Relative_Performance': 0.2
}

# Calculate the composite score
manager_df['Composite_Score'] = (
    weights['Win_Percentage'] * manager_df['Win_Percentage'] +
    weights['Total_Playoff_Score'] * manager_df['Total_Playoff_Score'] +
    weights['Total_Championships_Won'] * manager_df['Total_Championships_Won'] +
    weights['Average_Relative_Performance'] * manager_df['Average_Relative_Performance']
)

print(manager_df[['Manager_Name','Composite_Score']])

     Manager_Name  Composite_Score
0          C.Hale         0.134932
1        K.Gibson         0.155449
2      A.Trammell         0.088713
3         A.Hinch         0.150955
4        B.Melvin         0.186905
..            ...              ...
570  R.Hartsfield         0.071597
571    M.Williams         0.187421
572    T.Runnells         0.131581
573     J.Fanning         0.180099
574       K.Kuehl         0.071156

[575 rows x 2 columns]


### Begin working with Predicting Model on Training and Test Data

In [275]:
# Prepare features and target
features = ['Win_Percentage', 'Total_Playoff_Score', 'Total_Championships_Won', 'Average_Relative_Performance']
target = 'Composite_Score'

# Split data into training and testing sets
X = manager_df[features]
y = manager_df[target]

# Use a portion of the data for training and a portion for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')

# Predict for all managers
manager_df['Predicted_Score'] = model.predict(X[features])

# Rank managers based on predicted score
manager_df['Rank'] = manager_df['Predicted_Score'].rank(ascending=False)

# Get the top 20 managers based on predictions
top_20_managers = manager_df.sort_values(by='Rank').head(20)
print(top_20_managers[['Manager_Name', 'Predicted_Score', 'Rank']])

Mean Squared Error: 2.2564515174311576e-05
    Manager_Name  Predicted_Score  Rank
117   J.McCarthy         0.617717   1.0
35     C.Stengel         0.596033   2.0
15       J.Torre         0.522616   3.0
192   T.La Russa         0.473969   4.0
352     W.Alston         0.471648   5.0
429       C.Mack         0.466592   6.0
493     J.McGraw         0.449836   7.0
10         B.Cox         0.408414   8.0
441      A.Cohen         0.404638   9.0
557     D.Wilber         0.402114  10.0
18      V.Benson         0.401310  11.0
466    B.Burwell         0.401174  12.0
414    M.Huggins         0.397595  13.0
295  D.Tracewski         0.397232  14.0
278       B.Falk         0.396627  16.0
273      J.White         0.396627  16.0
271     M.Harder         0.396627  16.0
227   S.Anderson         0.395929  18.0
354  C.Sukeforth         0.395237  19.0
484      B.Bochy         0.385954  20.0


## Compare Results of Model with Online MLB Rankings:

<div style="display: flex;">

<div style="flex: 1; padding: 10px;">
<h3>Predictive Model Rankings</h2>
<ol>
  <li>Joe McCarthy</li>
  <li>Casey Stengel</li>
  <li>Joe Torre</li>
  <li>Tony La Russa</li>
  <li>Walter Alston</li>
  <li>Connie Mack</li>
  <li>John McGraw</li>
  <li>Bobby Cox</li>
  <li>Andy Cohen</li>
  <li>Del Wilber</li>
</ol>
</div>

<div style="flex: 1; padding: 10px;">
<h3>Online MLB Manager Rankings</h2>
<ol>
  <li>Joe McCarthy</li>
  <li>John McGraw</li>
  <li>Walter Alston</li>
  <li>Sparky Anderson</li>
  <li>Tony La Russa</li>
  <li>Joe Torre</li>
  <li>Miller Huggins</li>
  <li>Casey Stengel</li>
  <li>Connie Mack</li>
  <li>Bobby Cox</li>
</ol>
</div>

</div>

Both rankings highlight the exceptional careers of Joe McCarthy, Joe Torre, Tony La Russa, Walter Alston, Connie Mack, John McGraw, Casey Stengel, and Bobby Cox, indicating a consensus on these managers' successes. However, the predictive model places Andy Cohen and Del Wilber in the top 10, while the online rankings include Sparky Anderson and Miller Huggins instead. This divergence could be due to different metrics and evaluation methods used by the predictive model, which may emphasize certain performance indicators over others.

One primary difference between the online MLB Manager Rankings and the ones made from the Predictive Model is the inclusion of Average Relative Performance. This metric allows managers who significantly improved their MLB team to have a higher ranking in comparision to those who may have had lower winning percentages with the team in comparison to other coaches for the same team