.## Playoff Prediction Machine Learning

Training a model to predict if an MLB team will make the playoffs or not, as determined by their end of season team level statistics

In [1]:
# Dependencies

import pandas as pd
import numpy as np

from collections import Counter

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from keras.utils import to_categorical

from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from imblearn.keras import BalancedBatchGenerator
from imblearn.under_sampling import RandomUnderSampler, NearMiss

Using TensorFlow backend.


In [2]:
# Read in csv to pandas

team_df = pd.read_csv('../assets/data/mlb_stats.csv')
print(team_df.head())
print(len(team_df))
print(team_df.columns)

  franchid                name     city state                        curname  \
0      ANA  Los Angeles Angels  Anaheim    CA  Los Angeles Angels of Anaheim   
1      ANA  Los Angeles Angels  Anaheim    CA  Los Angeles Angels of Anaheim   
2      ANA  Los Angeles Angels  Anaheim    CA  Los Angeles Angels of Anaheim   
3      ANA  Los Angeles Angels  Anaheim    CA  Los Angeles Angels of Anaheim   
4      ANA   California Angels  Anaheim    CA  Los Angeles Angels of Anaheim   

  lgid  Year    G   w   l  ...  sho  sv ipouts    ha  hra  bba  soa    e   dp  \
0   AL  1961  162  70  91  ...    5  34   4314  1391  180  713  973  192  154   
1   AL  1962  162  86  76  ...   15  47   4398  1412  118  616  858  175  153   
2   AL  1963  161  70  91  ...   13  31   4365  1317  120  578  889  163  155   
3   AL  1964  162  82  80  ...   28  41   4350  1273  100  530  965  138  168   
4   AL  1965  162  75  87  ...   14  33   4323  1259   91  563  847  123  149   

      fp  
0  0.969  
1  0.972  

In [3]:
# Drop unnecessary/unwanted columns
# Determine whether or not we want to keep or drop Wins. Perhaps run model twice, once with wins and once without

dropped_columns = ['franchid', 'name', 'city', 'state', 'curname', 'lgid', 'Year', 'G', 'l', 'divwin', 'wcwin', 'lgwin', 'wswin']
df = team_df.drop(dropped_columns, axis=1)
df.head()
print(df.columns)

Index(['w', 'postseason', 'r', 'ab', 'h', '2B', '3B', 'hr', 'bb', 'so', 'sb',
       'cs', 'hbp', 'sf', 'ra', 'er', 'era', 'cg', 'sho', 'sv', 'ipouts', 'ha',
       'hra', 'bba', 'soa', 'e', 'dp', 'fp'],
      dtype='object')


## Data Cleaning

In [4]:
# See surface level correlation between postseason and other statistics
import seaborn
import matplotlib.pyplot as plt


corrmat = df.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
#plot heat map
g=seaborn.heatmap(df[top_corr_features].corr(),annot=True,cmap="RdYlGn")

In [5]:
# Find null values

print(df.isnull().sum(axis=0).tolist())

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 200, 838, 838, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


Null values are in caught stealing, hit by pitch, sac flies.  We can delete caught stealing altogether, as it is not an overly important metric.  However, we need to keep hbp and sf in order to create OBS and OPS metrics down the line, so we will need to fill the null values.  We will impute the missing data by using the median value of the column in question.  While this will not account for changes in playstyle throughout the era, our model already was not going to account for it, so this should be fine.

In [6]:
# Delete caught stealing from dataframe

df.drop('cs', axis=1, inplace=True)

In [7]:
# Fill null values in hbp and sf

df['hbp'] = df['hbp'].fillna(df['hbp'].median())
df['sf'] = df['sf'].fillna(df['sf'].median())

In [8]:
# Check to see how balanced target data is

Counter(df['postseason'])

Counter({0: 1788, 1: 404})

## Model Training (Base Stats)

The first time, we will train the model using the numerical columns we were given (with null values imputed) and no further feature engineering.

In [9]:
base_X = df.drop("postseason", axis = 1)
base_Y = df["postseason"]
print(base_X.shape, base_Y.shape)

(2192, 26) (2192,)


In [10]:
base_X_train, base_X_test, base_Y_train, base_Y_test = train_test_split(
    base_X, base_Y, random_state=1, stratify=base_Y)
base_X_scaler = StandardScaler().fit(base_X_train)
base_X_train_scaled = base_X_scaler.transform(base_X_train)
base_X_test_scaled = base_X_scaler.transform(base_X_test)

In [11]:
# Step 1: Label-encode data set
base_label_encoder = LabelEncoder()
base_label_encoder.fit(base_Y_train)
base_encoded_Y_train = base_label_encoder.transform(base_Y_train)
base_encoded_Y_test = base_label_encoder.transform(base_Y_test)

# Step 2: Convert encoded labels to one-hot-encoding
base_Y_train_categorical = to_categorical(base_encoded_Y_train)
base_Y_test_categorical = to_categorical(base_encoded_Y_test)

In [12]:
# Create model and add layers
base_model = Sequential()
base_model.add(Dense(units=100, activation='relu', input_dim=26))
base_model.add(Dense(units=100, activation='relu'))
base_model.add(Dense(units=2, activation='softmax'))

In [13]:
# # Compile and fit the model
# base_model.compile(optimizer='adam',
#               loss='categorical_crossentropy',
#               metrics=['accuracy'])

# base_training_generator = BalancedBatchGenerator(
#     base_X_train_scaled, base_Y_train_categorical, 
#     sampler=NearMiss(), batch_size=10, random_state=42)

# callback_history = base_model.fit_generator(
#     generator=base_training_generator,
#     epochs=500,
#     shuffle=True,
#     verbose=2,
#     callbacks=[EarlyStopping(monitor='accuracy', patience=75, verbose=2)]
# )

In [14]:
# Compile and fit the model
base_model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
base_model.fit(
    base_X_train_scaled,
    base_Y_train_categorical,
    epochs=1000,
    shuffle=True,
    verbose=2,
    callbacks=[EarlyStopping(monitor='accuracy', patience=75, verbose=2)]
)

Epoch 1/1000
 - 0s - loss: 0.3394 - accuracy: 0.8607
Epoch 2/1000
 - 0s - loss: 0.2343 - accuracy: 0.9015
Epoch 3/1000
 - 0s - loss: 0.2031 - accuracy: 0.9191
Epoch 4/1000
 - 0s - loss: 0.1879 - accuracy: 0.9191
Epoch 5/1000
 - 0s - loss: 0.1724 - accuracy: 0.9264
Epoch 6/1000
 - 0s - loss: 0.1651 - accuracy: 0.9355
Epoch 7/1000
 - 0s - loss: 0.1533 - accuracy: 0.9373
Epoch 8/1000
 - 0s - loss: 0.1492 - accuracy: 0.9404
Epoch 9/1000
 - 0s - loss: 0.1403 - accuracy: 0.9440
Epoch 10/1000
 - 0s - loss: 0.1399 - accuracy: 0.9440
Epoch 11/1000
 - 0s - loss: 0.1258 - accuracy: 0.9507
Epoch 12/1000
 - 0s - loss: 0.1211 - accuracy: 0.9489
Epoch 13/1000
 - 0s - loss: 0.1176 - accuracy: 0.9538
Epoch 14/1000
 - 0s - loss: 0.1088 - accuracy: 0.9623
Epoch 15/1000
 - 0s - loss: 0.1016 - accuracy: 0.9641
Epoch 16/1000
 - 0s - loss: 0.1008 - accuracy: 0.9586
Epoch 17/1000
 - 0s - loss: 0.0921 - accuracy: 0.9659
Epoch 18/1000
 - 0s - loss: 0.0917 - accuracy: 0.9647
Epoch 19/1000
 - 0s - loss: 0.0835 - 

<keras.callbacks.callbacks.History at 0x22898926668>

In [15]:
# Evaluate the model using test data split

base_model_loss, base_model_accuracy = base_model.evaluate(
    base_X_test_scaled, base_Y_test_categorical, verbose=2)
print(
    f"Base Normal Neural Network - Loss: {base_model_loss}, Accuracy: {base_model_accuracy}")

Base Normal Neural Network - Loss: 0.7448995698542491, Accuracy: 0.9032846689224243


In [16]:
base_encoded_predictions = base_model.predict_classes(base_X_test_scaled[:25])
base_prediction_labels = base_label_encoder.inverse_transform(base_encoded_predictions)

In [17]:
print(f"Base Predicted classes: {list(base_prediction_labels)}")
print(f"Base Actual Labels:     {list(base_Y_test[:25])}")

Base Predicted classes: [0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Base Actual Labels:     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]


In [18]:
base_df_table = pd.DataFrame(base_prediction_labels, columns=['Predicted'])
base_df_table.insert(loc=1, column='Actual', value= list(base_Y_test[:25]))

In [19]:
base_df_table

Unnamed: 0,Predicted,Actual
0,0,0
1,0,0
2,0,0
3,0,0
4,1,0
5,0,0
6,0,0
7,1,0
8,0,0
9,1,0


## Model Training (Advanced Stats)

This time we will create columns for more advanced statistics and run the model again using those.  We will also drop many other columns in an effort to avoid overfitting.

In [20]:
# Create columns for advanced statistics (OBS, OPS, WHIP) as well as run differential and batting average
# For plate appearances, we don't have sacrifice hits, so only using sac flies will have to suffice

# Run Differential per Game
df['rundif'] = (df['r'] - df['ra'])

# Batting Average
df['ave'] = df['h'] / df['ab']

# On Base Percent (OBP) = (hits + walks + hbp)/plate appearances
plate_app = (df['ab'] + df['bb'] + df['sf'] +df['hbp'])
df['obp'] = (df['h'] + df['bb'] + df['hbp']) / plate_app

# Slugging Percent
singles = ((df['h'] - df['2B']) - df['3B']) - df['hr']
df['slug_percent'] = ((df['hr']*4) + (df['3B']*3) + (df['2B']*2) + singles) / df['ab']

# On Base plus Slugging (OPS)
df['ops'] = df['obp'] + df['slug_percent']

# Walks plus Hits per Inning Pitched (whip)
df['whip'] = (df['bba'] + df['ha']) / (df['ipouts']/3)


# Calculate whip, then drop nonadvnaced columns, then copy code for Logistic Regression, but replace all 'base_' with 'adv_'

In [21]:
df.columns

Index(['w', 'postseason', 'r', 'ab', 'h', '2B', '3B', 'hr', 'bb', 'so', 'sb',
       'hbp', 'sf', 'ra', 'er', 'era', 'cg', 'sho', 'sv', 'ipouts', 'ha',
       'hra', 'bba', 'soa', 'e', 'dp', 'fp', 'rundif', 'ave', 'obp',
       'slug_percent', 'ops', 'whip'],
      dtype='object')

In [22]:
adv_df = df.drop(['w', 'r', 'ab', 'h', '2B', '3B', 'hr', 
                  'bb', 'so', 'sb', 'hbp', 'sf', 'ra', 'er', 'era',
                  'cg', 'sho', 'sv', 'ipouts', 'ha', 'hra', 'bba', 'soa', 'e', 'dp', 'fp',
                  'obp', 'slug_percent', 'rundif'
                 ], axis=1)

In [23]:
# Uncomment this part if you comment out above code in order to use same stats as base model, just adding the newly created advanced stats
# If commenting out above code, also need to change adv_ input_dim from 8 to 32

# adv_df = df

In [24]:
adv_df.head()

Unnamed: 0,postseason,ave,ops,whip
0,0,0.245391,0.729948,1.463143
1,0,0.250409,0.706669,1.383356
2,0,0.250272,0.662835,1.302405
3,0,0.241887,0.649576,1.243448
4,0,0.238887,0.641036,1.2644


In [25]:
adv_X = adv_df.drop("postseason", axis = 1)
adv_Y = adv_df["postseason"]
print(adv_X.shape, adv_Y.shape)

(2192, 3) (2192,)


In [26]:
adv_X_train, adv_X_test, adv_Y_train, adv_Y_test = train_test_split(
    adv_X, adv_Y, random_state=1, stratify=adv_Y)
adv_X_scaler = StandardScaler().fit(adv_X_train)
adv_X_train_scaled = adv_X_scaler.transform(adv_X_train)
adv_X_test_scaled = adv_X_scaler.transform(adv_X_test)

In [27]:
# Step 1: Label-encode data set
adv_label_encoder = LabelEncoder()
adv_label_encoder.fit(adv_Y_train)
adv_encoded_Y_train = adv_label_encoder.transform(adv_Y_train)
adv_encoded_Y_test = adv_label_encoder.transform(adv_Y_test)

# Step 2: Convert encoded labels to one-hot-encoding
adv_Y_train_categorical = to_categorical(adv_encoded_Y_train)
adv_Y_test_categorical = to_categorical(adv_encoded_Y_test)

In [28]:
# Create model and add layers
adv_model = Sequential()
adv_model.add(Dense(units=100, activation='relu', input_dim=3))
adv_model.add(Dense(units=100, activation='relu'))
adv_model.add(Dense(units=2, activation='softmax'))

In [29]:
# Compile and fit the model
adv_model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
adv_model.fit(
    adv_X_train_scaled,
    adv_Y_train_categorical,
    epochs=1000,
    shuffle=True,
    verbose=2,
    callbacks=[EarlyStopping(monitor='accuracy', patience=75, verbose=2)]
)

Epoch 1/1000
 - 0s - loss: 0.4125 - accuracy: 0.8327
Epoch 2/1000
 - 0s - loss: 0.3062 - accuracy: 0.8498
Epoch 3/1000
 - 0s - loss: 0.2918 - accuracy: 0.8619
Epoch 4/1000
 - 0s - loss: 0.2859 - accuracy: 0.8577
Epoch 5/1000
 - 0s - loss: 0.2827 - accuracy: 0.8644
Epoch 6/1000
 - 0s - loss: 0.2850 - accuracy: 0.8656
Epoch 7/1000
 - 0s - loss: 0.2805 - accuracy: 0.8656
Epoch 8/1000
 - 0s - loss: 0.2807 - accuracy: 0.8674
Epoch 9/1000
 - 0s - loss: 0.2814 - accuracy: 0.8686
Epoch 10/1000
 - 0s - loss: 0.2812 - accuracy: 0.8704
Epoch 11/1000
 - 0s - loss: 0.2790 - accuracy: 0.8662
Epoch 12/1000
 - 0s - loss: 0.2795 - accuracy: 0.8723
Epoch 13/1000
 - 0s - loss: 0.2787 - accuracy: 0.8698
Epoch 14/1000
 - 0s - loss: 0.2763 - accuracy: 0.8662
Epoch 15/1000
 - 0s - loss: 0.2758 - accuracy: 0.8717
Epoch 16/1000
 - 0s - loss: 0.2754 - accuracy: 0.8747
Epoch 17/1000
 - 0s - loss: 0.2752 - accuracy: 0.8680
Epoch 18/1000
 - 0s - loss: 0.2772 - accuracy: 0.8668
Epoch 19/1000
 - 0s - loss: 0.2765 - 

Epoch 152/1000
 - 0s - loss: 0.2421 - accuracy: 0.8863
Epoch 153/1000
 - 0s - loss: 0.2416 - accuracy: 0.8856
Epoch 154/1000
 - 0s - loss: 0.2405 - accuracy: 0.8887
Epoch 155/1000
 - 0s - loss: 0.2386 - accuracy: 0.8850
Epoch 156/1000
 - 0s - loss: 0.2377 - accuracy: 0.8929
Epoch 157/1000
 - 0s - loss: 0.2442 - accuracy: 0.8887
Epoch 158/1000
 - 0s - loss: 0.2385 - accuracy: 0.8875
Epoch 159/1000
 - 0s - loss: 0.2393 - accuracy: 0.8911
Epoch 160/1000
 - 0s - loss: 0.2404 - accuracy: 0.8911
Epoch 161/1000
 - 0s - loss: 0.2383 - accuracy: 0.8844
Epoch 162/1000
 - 0s - loss: 0.2422 - accuracy: 0.8893
Epoch 163/1000
 - 0s - loss: 0.2419 - accuracy: 0.8869
Epoch 164/1000
 - 0s - loss: 0.2377 - accuracy: 0.8893
Epoch 165/1000
 - 0s - loss: 0.2376 - accuracy: 0.8893
Epoch 166/1000
 - 0s - loss: 0.2370 - accuracy: 0.8893
Epoch 167/1000
 - 0s - loss: 0.2401 - accuracy: 0.8832
Epoch 168/1000
 - 0s - loss: 0.2411 - accuracy: 0.8850
Epoch 169/1000
 - 0s - loss: 0.2363 - accuracy: 0.8893
Epoch 170/

Epoch 301/1000
 - 0s - loss: 0.2081 - accuracy: 0.9039
Epoch 302/1000
 - 0s - loss: 0.2140 - accuracy: 0.9009
Epoch 303/1000
 - 0s - loss: 0.2118 - accuracy: 0.8978
Epoch 304/1000
 - 0s - loss: 0.2072 - accuracy: 0.9057
Epoch 305/1000
 - 0s - loss: 0.2091 - accuracy: 0.9069
Epoch 306/1000
 - 0s - loss: 0.2122 - accuracy: 0.9021
Epoch 307/1000
 - 0s - loss: 0.2075 - accuracy: 0.9057
Epoch 308/1000
 - 0s - loss: 0.2066 - accuracy: 0.9027
Epoch 309/1000
 - 0s - loss: 0.2114 - accuracy: 0.9009
Epoch 310/1000
 - 0s - loss: 0.2094 - accuracy: 0.9015
Epoch 311/1000
 - 0s - loss: 0.2073 - accuracy: 0.8996
Epoch 312/1000
 - 0s - loss: 0.2046 - accuracy: 0.9039
Epoch 313/1000
 - 0s - loss: 0.2102 - accuracy: 0.9009
Epoch 314/1000
 - 0s - loss: 0.2040 - accuracy: 0.9100
Epoch 315/1000
 - 0s - loss: 0.2073 - accuracy: 0.9015
Epoch 316/1000
 - 0s - loss: 0.2052 - accuracy: 0.8990
Epoch 317/1000
 - 0s - loss: 0.2032 - accuracy: 0.9033
Epoch 318/1000
 - 0s - loss: 0.2099 - accuracy: 0.9009
Epoch 319/

Epoch 450/1000
 - 0s - loss: 0.1837 - accuracy: 0.9106
Epoch 451/1000
 - 0s - loss: 0.1818 - accuracy: 0.9185
Epoch 452/1000
 - 0s - loss: 0.1800 - accuracy: 0.9155
Epoch 453/1000
 - 0s - loss: 0.1824 - accuracy: 0.9161
Epoch 454/1000
 - 0s - loss: 0.1804 - accuracy: 0.9173
Epoch 455/1000
 - 0s - loss: 0.1805 - accuracy: 0.9142
Epoch 456/1000
 - 0s - loss: 0.1808 - accuracy: 0.9185
Epoch 457/1000
 - 0s - loss: 0.1809 - accuracy: 0.9136
Epoch 458/1000
 - 0s - loss: 0.1813 - accuracy: 0.9173
Epoch 459/1000
 - 0s - loss: 0.1918 - accuracy: 0.9191
Epoch 460/1000
 - 0s - loss: 0.1832 - accuracy: 0.9221
Epoch 461/1000
 - 0s - loss: 0.1834 - accuracy: 0.9185
Epoch 462/1000
 - 0s - loss: 0.1831 - accuracy: 0.9155
Epoch 463/1000
 - 0s - loss: 0.1815 - accuracy: 0.9136
Epoch 464/1000
 - 0s - loss: 0.1822 - accuracy: 0.9148
Epoch 465/1000
 - 0s - loss: 0.1771 - accuracy: 0.9148
Epoch 466/1000
 - 0s - loss: 0.1772 - accuracy: 0.9197
Epoch 467/1000
 - 0s - loss: 0.1776 - accuracy: 0.9155
Epoch 468/

Epoch 599/1000
 - 0s - loss: 0.1613 - accuracy: 0.9246
Epoch 600/1000
 - 0s - loss: 0.1608 - accuracy: 0.9367
Epoch 601/1000
 - 0s - loss: 0.1579 - accuracy: 0.9264
Epoch 602/1000
 - 0s - loss: 0.1592 - accuracy: 0.9288
Epoch 603/1000
 - 0s - loss: 0.1573 - accuracy: 0.9264
Epoch 604/1000
 - 0s - loss: 0.1555 - accuracy: 0.9319
Epoch 605/1000
 - 0s - loss: 0.1565 - accuracy: 0.9343
Epoch 606/1000
 - 0s - loss: 0.1566 - accuracy: 0.9282
Epoch 607/1000
 - 0s - loss: 0.1539 - accuracy: 0.9386
Epoch 608/1000
 - 0s - loss: 0.1566 - accuracy: 0.9252
Epoch 609/1000
 - 0s - loss: 0.1510 - accuracy: 0.9343
Epoch 610/1000
 - 0s - loss: 0.1583 - accuracy: 0.9240
Epoch 611/1000
 - 0s - loss: 0.1601 - accuracy: 0.9215
Epoch 612/1000
 - 0s - loss: 0.1538 - accuracy: 0.9294
Epoch 613/1000
 - 0s - loss: 0.1609 - accuracy: 0.9276
Epoch 614/1000
 - 0s - loss: 0.1562 - accuracy: 0.9313
Epoch 615/1000
 - 0s - loss: 0.1602 - accuracy: 0.9264
Epoch 616/1000
 - 0s - loss: 0.1692 - accuracy: 0.9221
Epoch 617/

Epoch 748/1000
 - 0s - loss: 0.1461 - accuracy: 0.9337
Epoch 749/1000
 - 0s - loss: 0.1403 - accuracy: 0.9367
Epoch 750/1000
 - 0s - loss: 0.1358 - accuracy: 0.9386
Epoch 751/1000
 - 0s - loss: 0.1396 - accuracy: 0.9416
Epoch 752/1000
 - 0s - loss: 0.1404 - accuracy: 0.9373
Epoch 753/1000
 - 0s - loss: 0.1422 - accuracy: 0.9355
Epoch 754/1000
 - 0s - loss: 0.1426 - accuracy: 0.9380
Epoch 755/1000
 - 0s - loss: 0.1439 - accuracy: 0.9325
Epoch 756/1000
 - 0s - loss: 0.1563 - accuracy: 0.9294
Epoch 757/1000
 - 0s - loss: 0.1467 - accuracy: 0.9331
Epoch 758/1000
 - 0s - loss: 0.1332 - accuracy: 0.9398
Epoch 759/1000
 - 0s - loss: 0.1314 - accuracy: 0.9453
Epoch 760/1000
 - 0s - loss: 0.1346 - accuracy: 0.9434
Epoch 761/1000
 - 0s - loss: 0.1339 - accuracy: 0.9404
Epoch 762/1000
 - 0s - loss: 0.1312 - accuracy: 0.9453
Epoch 763/1000
 - 0s - loss: 0.1337 - accuracy: 0.9386
Epoch 764/1000
 - 0s - loss: 0.1346 - accuracy: 0.9392
Epoch 765/1000
 - 0s - loss: 0.1355 - accuracy: 0.9446
Epoch 766/

Epoch 897/1000
 - 0s - loss: 0.1177 - accuracy: 0.9519
Epoch 898/1000
 - 0s - loss: 0.1208 - accuracy: 0.9489
Epoch 899/1000
 - 0s - loss: 0.1187 - accuracy: 0.9483
Epoch 900/1000
 - 0s - loss: 0.1283 - accuracy: 0.9422
Epoch 901/1000
 - 0s - loss: 0.1268 - accuracy: 0.9446
Epoch 902/1000
 - 0s - loss: 0.1180 - accuracy: 0.9544
Epoch 903/1000
 - 0s - loss: 0.1199 - accuracy: 0.9446
Epoch 904/1000
 - 0s - loss: 0.1152 - accuracy: 0.9526
Epoch 905/1000
 - 0s - loss: 0.1189 - accuracy: 0.9556
Epoch 906/1000
 - 0s - loss: 0.1199 - accuracy: 0.9477
Epoch 907/1000
 - 0s - loss: 0.1213 - accuracy: 0.9428
Epoch 908/1000
 - 0s - loss: 0.1190 - accuracy: 0.9513
Epoch 909/1000
 - 0s - loss: 0.1206 - accuracy: 0.9495
Epoch 910/1000
 - 0s - loss: 0.1231 - accuracy: 0.9483
Epoch 911/1000
 - 0s - loss: 0.1219 - accuracy: 0.9544
Epoch 912/1000
 - 0s - loss: 0.1222 - accuracy: 0.9501
Epoch 913/1000
 - 0s - loss: 0.1206 - accuracy: 0.9446
Epoch 914/1000
 - 0s - loss: 0.1112 - accuracy: 0.9580
Epoch 915/

<keras.callbacks.callbacks.History at 0x2289942ccf8>

In [30]:
# Evaluate the model using test data split

adv_model_loss, adv_model_accuracy = adv_model.evaluate(
    adv_X_test_scaled, adv_Y_test_categorical, verbose=2)
print(
    f"Advanced Stats Normal Neural Network - Loss: {adv_model_loss}, Accuracy: {adv_model_accuracy}")

Advanced Stats Normal Neural Network - Loss: 0.7642889931852365, Accuracy: 0.8266423344612122


In [31]:
adv_encoded_predictions = adv_model.predict_classes(adv_X_test_scaled[:30])
adv_prediction_labels = adv_label_encoder.inverse_transform(adv_encoded_predictions)

In [32]:
print(f"Advanced Stats Predicted classes: {list(adv_prediction_labels)}")
print(f"Advanced Stats Actual Labels:     {list(adv_Y_test[:30])}")

Advanced Stats Predicted classes: [0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
Advanced Stats Actual Labels:     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]


In [33]:
adv_df_table = pd.DataFrame(adv_prediction_labels, columns=['Predicted'])
adv_df_table.insert(loc=1, column='Actual', value= list(adv_Y_test[:30]))

In [34]:
adv_df_table

Unnamed: 0,Predicted,Actual
0,0,0
1,0,0
2,0,0
3,0,0
4,1,0
5,0,0
6,1,0
7,0,0
8,0,0
9,0,0


## Comparison

In [35]:
print(f'Base Stats Normal Neural Network - Loss {base_model_loss}, Accuracy: {base_model_accuracy}')
print(f"Advanced Stats Normal Neural Network - Loss: {adv_model_loss}, Accuracy: {adv_model_accuracy}")

Base Stats Normal Neural Network - Loss 0.7448995698542491, Accuracy: 0.9032846689224243
Advanced Stats Normal Neural Network - Loss: 0.7642889931852365, Accuracy: 0.8266423344612122


In [36]:
team_df['G'].mean()

158.06569343065692