# Data Modeling - [Your Project Name Here]

## Local Code Imports - Do not delete

In [3]:
# DO NOT REMOVE THESE
%load_ext autoreload
%autoreload 2

In [4]:
# DO NOT REMOVE This
%reload_ext autoreload

In [5]:
## DO NOT REMOVE
## import local src module -
## src in this project will contain all your local code
## clean_data.py, model.py, visualize.py, custom.py
from src import make_data as mk
from src import visualize as viz
from src import model as mdl
from src import pandas_operators as po

def test_src():
    mk.test_make_data()
    viz.test_viz()
    mdl.test_model()
    po.test_pandas()
    
    return 1

In [6]:
test_src()

In make_data
In Visualize
In Model
In pandas ops


1

## Code Imports

In [158]:
# For Dataframes and arrays
import numpy as np
import pandas as pd
# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Train:Test split
from sklearn.model_selection import train_test_split

# Scaling
from sklearn.preprocessing import StandardScaler

# Modelling
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# Neural Network
import tensorflow as tf
import keras
from keras.layers import Dense, Dropout, Activation, LeakyReLU
from keras.models import Sequential
from keras.optimizers import SGD

# Set random seeds
np.random.seed(123)
tf.set_random_seed(123)

# Project Overview

### Try out different models, as no model is always best
For this project we are classifying.  There are many choices of classification algorithm, each with its own strengths and weaknesses.  There is no single classifier that always works best across all scenarios so we will compare a handful of different learning algorithms to select the best model for our particular problem.  

We have decided not to use the perceptron algorithm because our data set is not perfectly linearly separable, and so the algorithm will never converge.  

### Why use KNN
KNN is a instance-based learning type of nonparamteric model.  It memorizes the training dataset and adapts immediately as we collect new training data.  

The downside of KNN is that the computational complexity for classifying new samples grows linearly with the number of samples in the training dataset.  i.e. with every fight that occurs, and updates the model, the model becomes slower and slower to run.  Our dataset is relatively very small so we are able to use this model

In [None]:
knn = KNeighborsClassifier(n_neighbors=5, 
                           p=2, 
                           metric='minkowski')
knn.fit(X_train_std, y_train)

Below we represent what KNN is doing, using the age of fighters and the win percentage leading up to the fight.  

In [None]:
# Plot the division of above
X_age_winpct = data['age', 'win_pct']  # choose 2 x columns to show

plot_decision_regions(X_age_winpct, y_combined, classifier=knn, test_idx=range(105, 150))
# Change the test_idx to whatever subset you want to show

plt.xlabel('Age of Fighter (years)')
plt.ylabel('Win Percentage of fighter before the fight')
plt.legend(loc='upper left')
plt.tight_layout()

Choosing the right number of neighbors (k) is critical to avoid over and underfitting our model.  
We are using a Minkowski distance, which requires our distances to be standardized.  
We do not have to regularize our data but we should use feature selection and dimensionality reduction techniques to avoid the "curse of dimensionality" (which would cause our model to overfit)

### Why we use a sigmoid function
Our goal is to predict the probability that a certain sample belongs to a particular class, given its features.  In other words the probability that a certain matchup of two fighters will result in a win for fighter 1.  

For this reason, we will use the logistic sigmoid function (abbreviated to sigmoid function) as our activation function.   The sigmoid function takes in any real number value and outputs a value between 0 and 1, which will represent the probability that we are after.  

### Using OvR Logistic Regression for multi-class classification
Logistic regression models only really work for binary classification tasks, limiting us to only predicting win or loss for a fighter in a fight.  There is however one-vs-rest (also known as one-vs-all) logistic regression (OvR) which supports multi-class classification.  Scikit-learn enables us to use OvR logistic regression so we can essentially predict a win, loss or draw (3 classes) for each fighter within a fight.

In [None]:
# Initiate the Logistic Regression Model
lr = LogisticRegression(C=100.0, random_state=1)
    # We set the C argument as 100 here.  a lower value will cause an increase in the regularization strength

# Fit the Logistic Regression Model    
lr.fit(X_train_std, y_train)

# Show the predicted class for each observation
lr.predict(X_test_std[:3, :])

### Regularization to help with overfitting and underfitting our model
Overfitting is one issue we face when using machine learning models.  This is where our model performs very well with the training data that we provide, but is unable to perform well on the test data.  Overfitting can be caused by a number of factors including having too many parameters.  This would then lead to a model that is too complex for the underlying data.  
There exists a trade off between overfitting and underfitting our model.  If a model is overfitting, the model is said to have high variance, and if the model is underfitting it is said to have high bias.  This bias-variance tradeoff can be dealt with by regularization.  Regularization will reduce the complexity of the model by accounting for high correlation between features (collinearity) and filtering out noise from our data. 

Regularization works by penalizing any extreme weights that we have.  One of the most common methods of regularization is L2 regularization (also known as Ridge Regression).  With the code example above, the "C" argument allows us to regularize the weights.

### Backpropogation
A benefit of using logitic regression is that the resulting logistic cost function is convex (U-shaped).  This makes it very easy to find the global cost minimum.  When we incorporate a logistic activation function into a multi-layer neural network however this U shape becomes more uneven, resulting in several local minima.  These local minima can "trap" our optimization algorithm, i.e. prevent our model from reaching the global minimum.  To help improve our model we can take advantage of backpropagation.  This will help us to reach a more satisfactory local minimum that yields powerful enough results (high accuracy in the case of this project).    


# Data Modeling

In [31]:
# Show the contents of the processed data folder
!ls ../data/processed/

bouts_cleaned    combined         fighters_cleaned


In [133]:
data = pd.read_csv('../data/processed/combined')

In [134]:
data.head(3)

Unnamed: 0,date,location,fighter1,fighter2,winner_is_fighter1,title_fight,method_DEC,method_DQ,method_KO/TKO,method_SUB,...,fighter2_dob,fighter2_age_today,fighter2_slpm,fighter2_str_acc,fighter2_sapm,fighter2_str_def,fighter2_td_avg,fighter2_td_acc,fighter2_td_def,fighter2_sub_avg
0,2018-11-17,Argentina,Santiago Ponzinibbio,Neil Magny,1.0,0.0,0.0,0.0,1.0,0.0,...,1987-08-03,32.0,3.86,46.0,2.22,56.0,2.62,46.0,60.0,0.3
1,2018-11-17,Argentina,Ricardo Lamas,Darren Elkins,1.0,0.0,0.0,0.0,1.0,0.0,...,1984-05-16,35.0,3.36,37.0,2.83,53.0,2.68,35.0,57.0,1.3
2,2018-11-17,Argentina,Khalil Rountree Jr.,Johnny Walker,0.0,0.0,0.0,0.0,1.0,0.0,...,1992-03-30,27.0,5.37,70.0,3.36,25.0,0.89,100.0,100.0,2.6


## Use shortened data until 227 values fixed and 226 

In [135]:
data = data[:-226]

In [136]:
data.isna().sum()[:10]

date                  226
location              226
fighter1              226
fighter2              226
winner_is_fighter1    226
title_fight           226
method_DEC            226
method_DQ             226
method_KO/TKO         226
method_SUB            226
dtype: int64

In [137]:
data.isnull().values.any()

True

In [138]:
data = data[~data.winner_is_fighter1.isna()]

## Train:Test Split

In [139]:
data_list = data.columns.tolist()
data_list

['date',
 'location',
 'fighter1',
 'fighter2',
 'winner_is_fighter1',
 'title_fight',
 'method_DEC',
 'method_DQ',
 'method_KO/TKO',
 'method_SUB',
 'fighter1_win',
 'fighter1_lose',
 'fighter1_draw',
 'fighter1_total_bouts',
 'fighter1_win_rate',
 'fighter1_height_inches',
 'fighter1_reach',
 'fighter1_stance',
 'fighter1_dob',
 'fighter1_age_today',
 'fighter1_slpm',
 'fighter1_str_acc',
 'fighter1_sapm',
 'fighter1_str_def',
 'fighter1_td_avg',
 'fighter1_td_acc',
 'fighter1_td_def',
 'fighter1_sub_avg',
 'fighter2_win',
 'fighter2_lose',
 'fighter2_draw',
 'fighter2_total_bouts',
 'fighter2_win_rate',
 'fighter2_height_inches',
 'fighter2_reach',
 'fighter2_stance',
 'fighter2_dob',
 'fighter2_age_today',
 'fighter2_slpm',
 'fighter2_str_acc',
 'fighter2_sapm',
 'fighter2_str_def',
 'fighter2_td_avg',
 'fighter2_td_acc',
 'fighter2_td_def',
 'fighter2_sub_avg']

In [140]:
data.head(3)

Unnamed: 0,date,location,fighter1,fighter2,winner_is_fighter1,title_fight,method_DEC,method_DQ,method_KO/TKO,method_SUB,...,fighter2_dob,fighter2_age_today,fighter2_slpm,fighter2_str_acc,fighter2_sapm,fighter2_str_def,fighter2_td_avg,fighter2_td_acc,fighter2_td_def,fighter2_sub_avg
0,2018-11-17,Argentina,Santiago Ponzinibbio,Neil Magny,1.0,0.0,0.0,0.0,1.0,0.0,...,1987-08-03,32.0,3.86,46.0,2.22,56.0,2.62,46.0,60.0,0.3
1,2018-11-17,Argentina,Ricardo Lamas,Darren Elkins,1.0,0.0,0.0,0.0,1.0,0.0,...,1984-05-16,35.0,3.36,37.0,2.83,53.0,2.68,35.0,57.0,1.3
2,2018-11-17,Argentina,Khalil Rountree Jr.,Johnny Walker,0.0,0.0,0.0,0.0,1.0,0.0,...,1992-03-30,27.0,5.37,70.0,3.36,25.0,0.89,100.0,100.0,2.6


In [141]:
categories = ['title_fight', 'method_DEC', 'method_DQ', 'method_KO/TKO', 'method_SUB',
 'fighter1_win', 'fighter1_lose', 'fighter1_draw', 'fighter1_total_bouts', 'fighter1_win_rate', 'fighter1_height_inches',
 'fighter1_reach', 'fighter1_age_today', 
 'fighter1_slpm', 'fighter1_str_acc', 'fighter1_sapm', 'fighter1_str_def',
 'fighter1_td_avg', 'fighter1_td_acc', 'fighter1_td_def','fighter1_sub_avg',
 'fighter2_win', 'fighter2_lose', 'fighter2_draw', 'fighter2_total_bouts', 'fighter2_win_rate', 'fighter2_height_inches',
 'fighter2_reach', 'fighter2_age_today',
 'fighter2_slpm', 'fighter2_str_acc', 'fighter2_sapm', 'fighter2_str_def',
 'fighter2_td_avg', 'fighter2_td_acc', 'fighter2_td_def', 'fighter2_sub_avg']

X = data[categories]
y = data['winner_is_fighter1']

In [142]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
# we use a test_size of 0.3, i.e. 30% of the data has been held out, and we will use 70% of the data to train our model
# we set the random_state so that our results are reproducable
# we stratify so that we maintain the proportion of class labels, i.e. the same proportion of red wins and blue wins

print('X_Train Rows: {}, Columns: {}'.format(X_train.shape[0], X_train.shape[1]))
print('X_Test Rows: {}, Columns: {}'.format(X_test.shape[0], X_test.shape[1]))

X_Train Rows: 3040, Columns: 37
X_Test Rows: 1304, Columns: 37


### Scaling

Many of the machine learning and optimization algorithms that we will be using require feature scaling in order to optimize performance.  We will standardize the features using StandardScaler from scikit-learn's preprocessing module.

In [143]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3040 entries, 2477 to 2270
Data columns (total 37 columns):
title_fight               3040 non-null float64
method_DEC                3040 non-null float64
method_DQ                 3040 non-null float64
method_KO/TKO             3040 non-null float64
method_SUB                3040 non-null float64
fighter1_win              3040 non-null float64
fighter1_lose             3040 non-null float64
fighter1_draw             3040 non-null float64
fighter1_total_bouts      3040 non-null float64
fighter1_win_rate         3040 non-null float64
fighter1_height_inches    3040 non-null float64
fighter1_reach            3040 non-null float64
fighter1_age_today        3023 non-null float64
fighter1_slpm             3040 non-null float64
fighter1_str_acc          3040 non-null float64
fighter1_sapm             3040 non-null float64
fighter1_str_def          3040 non-null float64
fighter1_td_avg           3040 non-null float64
fighter1_td_acc           

In [144]:
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)

We will use the same scaling parameters to standardize the test set, so that the values in the training and test dataset are comparable to each other.

In [147]:
X_test_std = sc.transform(X_test)

### Mean Centering and Normalization

In [None]:
mean_vals = np.mean(X_train, axis=0)
std_val = np.std(X_train)
    # Centered
X_train_centered = (X_train - mean_vals) / std_val
X_test_centered  = (X_test  - mean_vals) / std_val
    # No longer nead the X_train or X_test
del X_train, X_test
print('X_train shape: {} \n y_train shape: {}'.format((X_train_centered.shape, y_train.shape))
print('X_testshape: {} \n y_test shape: {}'.format((X_test_centered.shape, y_test.shape))

## One-Hot Encoding for Class Label

In [None]:
y_train_onehot = keras.utils.to_categorical(y_train)
print('First 3 labels: ', y_train[:3])
print('\nFirst 3 labels (one-hot): \n', y_train_onehot[:3])

## Implementing Neural Network with tanh

In [153]:
model_tanh = Sequential()

In [165]:
# Input Layer
model_tanh.add(Dense(64, input_dim=X_train_std.shape[1], kernel_initializer = 'glorot_uniform', 
                bias_initializer = 'zeros', activation = 'tanh'))

# Hidden Layer 1
model_tanh.add(Dense(64, input_dim = 64, kernel_initializer = 'glorot_uniform',
                bias_initializer = 'zeros', activation = 'tanh'))
# model_tanh.add(Drouput(0.5, seed=123))

# Hidden Layer 2
model_tanh.add(Dense(units = X_train.shape[1], input_dim = 64, kernel_initializer = 'glorot_uniform', 
                bias_initializer = 'zeros', activation = 'softmax'))
# model_tanh.add(Drouput(0.5, seed=123))

# Output Layer
sgd_optimizer = SGD(lr=0.001, decay=1e-7, momentum=0.9) #learn rate, weight decay constant, momentum learning

There is a tradeoff when we are finding an appropriate learning rate.  
- Too large and the alorithm may overshoot the global cost minimum.  
- Too small and the algorithm requires far more epochs until convergence, which results in unecessary computational energy

In [168]:
model_tanh.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

model_tanh.fit(X_train, y_train, epochs=20, batch_size=128)
score = model_tanh.evaluate(x_test, y_test, batch_size=128)

ValueError: Error when checking target: expected dense_15 to have shape (37,) but got array with shape (1,)

## Implementing Neural Network with leaky ReLU

In [169]:
model_lrelu = Sequential()

In [174]:
# Input Layer
model_lrelu.add(Dense(64, input_dim=X_train_std.shape[1], kernel_initializer = 'glorot_uniform', 
                bias_initializer = 'zeros'))
model_lrelu.add(LeakyReLU(alpha=0.01))

# Hidden Layer 1
model_lrelu.add(Dense(64, input_dim = 64, kernel_initializer = 'glorot_uniform', bias_initializer = 'zeros'))
model_lrelu.add(LeakyReLU(alpha=0.01))
# model_lrelu.add(Drouput(0.5, seed=123))

# Hidden Layer 2
# model_lrelu.add(Dense(units = y_train.shape[1], input_dim = 64, kernel_initializer = 'glorot_uniform', 
#                 bias_initializer = 'zeros', activation = 'softmax'))
# model_lrelu.add(LeakyReLU(alpha=0.01))
# model_lrelu.add(Drouput(0.5, seed=123))

# Output Layer
sgd_optimizer = SGD(lr=0.001, decay=1e-7, momentum=0.9) #learn rate, weight decay constant, momentum learning

In [177]:
model_lrelu.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

model_lrelu.fit(X_train, y_train, epochs=20, batch_size=128)
score = model_lrelu.evaluate(X_test, y_test, batch_size=128)

ValueError: Error when checking target: expected leaky_re_lu_8 to have shape (64,) but got array with shape (1,)

## Fitting the Model

In [182]:
history = model_tanh.fit(X_train_std, y_train,
                    batch_size=64, epochs=50,
                    verbose=1,
                    validation_split=0.1)  # verbose lets us follow optimization of cost function, validation split helps monitor if model is overfitting in training


ValueError: Error when checking target: expected dense_15 to have shape (37,) but got array with shape (1,)

In [183]:
history = model_lrelu.fit(X_train_std, y_train,
                    batch_size=64, epochs=50,
                    verbose=1,
                    validation_split=0.1)  # verbose lets us follow optimization of cost function, validation split helps monitor if model is overfitting in training


ValueError: Error when checking target: expected leaky_re_lu_8 to have shape (64,) but got array with shape (1,)

## Predicting Class Labels

In [185]:
y_train_pred = model_tanh.predict_classes(X_train_std, verbose=0)
print('First 3 predictions: ', y_train_pred[:3])

First 3 predictions:  [12 24 23]


In [186]:
y_train_pred = model_lrelu.predict_classes(X_train_std, verbose=0)
print('First 3 predictions: ', y_train_pred[:3])

First 3 predictions:  [34 34 34]


## Model Accuracy

### Tanh Accuracy

In [188]:
y_train_pred = model_tanh.predict_classes(X_train_std, verbose=0)
print('First 3 predictions: ', y_train_pred[:3])

# Training Accuracy
correct_preds = np.sum(y_train == y_train_pred, axis=0)
train_acc = correct_preds / y_train.shape[0]
print('Training Accuracy: {}%'.format(train_acc * 100))

First 3 predictions:  [12 24 23]
Training Accuracy: 2.3355263157894735%


In [190]:
# Testing Accuracy
y_test_pred = model_tanh.predict_classes(X_test_std, verbose=0)
correct_preds = np.sum(y_test == y_test_pred, axis=0)
test_acc = correct_preds / y_test.shape[0]
print('Test Accuracy: {}%'.format(test_acc * 100))

Test Accuracy: 2.147239263803681%


### Leaky RELU Accuracy

In [192]:
y_train_pred = model_lrelu.predict_classes(X_train_std, verbose=0)
print('First 3 predictions: ', y_train_pred[:3])

# Training Accuracy
correct_preds = np.sum(y_train == y_train_pred, axis=0)
train_acc = correct_preds / y_train.shape[0]
print('Training Accuracy: {}%'.format(train_acc * 100))

First 3 predictions:  [34 34 34]
Training Accuracy: 0.5263157894736842%


In [191]:
# Testing Accuracy
y_test_pred = model_lrelu.predict_classes(X_test_std, verbose=0)
correct_preds = np.sum(y_test == y_test_pred, axis=0)
test_acc = correct_preds / y_test.shape[0]
print('Test Accuracy: {}%'.format(test_acc * 100))

Test Accuracy: 0.5368098159509203%
