# Predicting Batting statistics

## Goal
The main goal of this notebook is to understand machine learning.
Using baseball statistics to try and predict a batters offensive performance for the comming year based on past years performance.

### Things to achieve
- Calculate wOBA
- Group records by player and year.
- Find outliers (Injury, Team change, etc.)
- Project wOBA using a players past wOBA.

## Load Data from csv file

In [26]:
import pandas as pd

raw_data = pd.read_csv('data/Batting.csv')

## Baseball statistic to quantify offensive production - wOBA

wOBA is a sabermetric statistic used to measure a hitter's overall offensive contributions per plate appearance, assigning more accurate weights to different offensive outcomes.
General Formula:
$$
wOBA = \frac{(wBB \times BB) + (wHBP \times HBP) + (w1B \times 1B) + (w2B \times 2B) + (w3B \times 3B) + (wHR \times HR)}{AB + BB - IBB + SF + HBP}
$$
Where:
- \( wBB, wHBP, w1B, w2B, w3B, wHR \) are weights that change slightly every year.
- \( BB \) = Base on balls (walks)
- \( IBB \) = Intentional walks
- \( HBP \) = Hit by pitch
- \( AB \) = At bats
- \( SF \) = Sacrifice flies
- \( 1B \) = Singles


### Calculate Singles

Singles are not included in the dataset.
To represent this we take the total hits('H') and remove doubles('2B'), triples('3B') and homeruns('HR')

In [27]:
work_df = raw_data.copy()
work_df['1B'] = work_df['H'] - work_df['2B'] - work_df['3B'] - work_df['HR']
work_df['1B'] = work_df['1B'].fillna(0)
work_df['1B'] = work_df['1B'].astype(int)

## wOBA weights
During the prototyping phases static weights are used.

To improve accuracy yearly weights can be calculate form the dataset. 

In [28]:
wBB = 0.69
wHBP = 0.72
w1B = 0.88
w2B = 1.247
w3B = 1.578
wHR = 2.031

## Calculate wOBA Numerator & Denominator

In [29]:
# Fill NaN values with 0
work_df['HBP'] = work_df['HBP'].fillna(0)
work_df['SF'] = work_df['SF'].fillna(0)

# Numerator
work_df['wOBA_num'] = (
    wBB * (work_df['BB'] - work_df['IBB']) +
    wHBP * work_df['HBP'] +
    w1B * work_df['1B'] +
    w2B * work_df['2B'] +
    w3B * work_df['3B'] +
    wHR * work_df['HR']
)

#Denominator
work_df['wOBA_deno'] = work_df['AB'] + work_df['BB'] - work_df['IBB'] + work_df['HBP'] + work_df['SF']

### Calculate wOBA

In [30]:
work_df['wOBA'] = work_df['wOBA_num'] / work_df['wOBA_deno']
work_df['wOBA'] = work_df['wOBA'].round(3)