# Hall Of Fame Predictor

The goal of this Jupyter Notebook is to gain more experience using Machine Learning models to determine which players are likely to be inducted in the Baseball Hall of Fame. This model will focus on players who played after 1960. 

To begin we will take the `HallOfFame` file from Sean Lahman's Baseball Databank that provides us with the target columns. We will drop all non-players and anyone who played before 1960. 

To be eligible for the Hall of Fame, a player must play 10 full seasons and have been retired for 5, so we will remove all ineligible players.

When thinking about this problem, it is important to note that there have been thousands of players, with only 273 inducted in the Hall of Fame. This can lead to a heavy class imbalance.

### Data Preprocessing

In [18]:
import pandas as pd
import numpy as np

df_target = pd.read_csv('./Data/baseballdatabank-2023.1/contrib/HallOfFame.csv')

"""Setting up target data"""
# Drop observations where 'yearID' < 1960 and 'category' != 'player'
# Map 'inducted' values to 1 for 'Y' and 0 for 'N'
droppedAges = df_target[ (df_target['yearID'] < 1960) ].index
df_target.drop(droppedAges , inplace=True)

droppedCategories = df_target[ (df_target['category'] != 'Player') ].index
df_target.drop(droppedCategories , inplace=True)

df_target['inducted'] = df_target['inducted'].map({'Y': 1, 'N': 0})

df_target.drop(columns=['yearID' , 'votedBy' , 'ballots', 'needed', 'votes', 'category', 'needed_note'], axis=1, inplace=True)


       playerID  inducted
1748  roushed01         0
1749   ricesa01         0
1750  rixeyep01         0
1751  grimebu01         0
1752  bottoji01         0
...         ...       ...
4316  crawfca02         0
4318  hodgegi01         1
4319   kaatji01         1
4320  minosmi01         1
4322  olivato01         1

[2516 rows x 2 columns]


In [None]:
"""
Possible sets to include

AwardsPlayers
- count of golden gloves
- rookie of the year
- mvp
- Pitcher of the year
- cy young 
- world series mvp
- tsn all star
- Rolaids Relief Man Award
- Babe Ruth Award
- Roberto Clemenete Award
- ALCS/ NLCS Mvp
- TSN Player of the year
- Silver Slugger
- TSN Fireman of the Year


AllstarFull
Appearances
BattingPost
Fielding
Pitching
PitchingPost


"""