# 08 Assemble Training Data

- The purpose of this notebook is to aggregate all of the processed data into a single training set
- We will use the training stub as the basis for the dataframe and merge the other prepped data onto it
- Data that will be appended to it:
    - Static Fighter Stats (including info from fighter page info)
    - Historical Fight Averages

## Imports

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

## Functions

In [2]:
def check_nulls(df):    
    return df.loc[:,df.isnull().sum()!=0].isnull().sum()

## Pull in Training Stub

In [3]:
train = pd.read_csv('../../02_Data/02_Processed_Data/train_stub.csv', index_col=0)

## Get Fighter_Static_Stats

In [4]:
fighter_static_stats = pd.read_csv('../../02_Data/02_Processed_Data/fighter_static_stats.csv', index_col=0)

## Get Historical Fight Averages

In [5]:
hist_avg = pd.read_csv('../../02_Data/02_Processed_Data/historical_avgs.csv', index_col=0)

## Append static fighter stats to stub

In [6]:
# Prep dataframe for merge for fighter 1
f1_to_merge = fighter_static_stats.copy()
f1_to_merge.columns = ['eventid']+['f1_'+ col for col in f1_to_merge.columns if 'eventid' not in col]

# merge
train = train.merge(f1_to_merge, how='left', on=['eventid','f1_fighterid'])

In [7]:
# Prep dataframe for merge for fighter 1
f2_to_merge = fighter_static_stats.copy()
f2_to_merge.columns = ['eventid']+['f2_'+ col for col in f2_to_merge.columns if 'eventid' not in col]

# merge
train = train.merge(f2_to_merge, how='left', on=['eventid','f2_fighterid'])

## Append historical fight averages to stub

In [8]:
# Prep dataframe for merge for fighter 1
hist_avg_f1 = hist_avg.copy()

# First append 'F1_' for all fighter 1 data
hist_avg_f1.columns = ['eventid','fightid','f1_fighterid','date'] + \
                    ['f1_' + col for col in hist_avg_f1.columns if col not in \
                    ['eventid','fightid','f1_fighterid','date']]

# Merge F1 Expanding Means
train = train.merge(hist_avg_f1, how='inner',left_on=['eventid','fightid','f1_fighterid'],
                    right_on=['eventid','fightid','f1_fighterid'])

#Setup 
hist_avg_f2 = hist_avg.drop(columns=['date']).copy()
hist_avg_f2.columns = ['eventid','fightid','f2_fighterid'] + \
                    ['f2_' + col for col in hist_avg_f2.columns if col not in \
                    ['eventid','fightid','f1_fighterid']]

# Merge for fighter 2
train = train.merge(hist_avg_f2, how='inner',left_on=['eventid','fightid','f2_fighterid'], 
                    right_on=['eventid','fightid','f2_fighterid'])

In [9]:
check_nulls(train)

f1_outcome    34
dtype: int64

There are 34 rows without outcome.  They are all draws

In [10]:
train.loc[train.f1_outcome.isnull(),'f1_outcome'] = 'Draw'

In [11]:
train.f1_outcome.value_counts()

Win     1590
Loss    1590
Draw      34
Name: f1_outcome, dtype: int64

## Export the training data set

In [12]:
train.to_csv('../../02_Data/02_Processed_Data/train.csv')