# Post-Analysis Deep Learning of the Injury Datasets

This will process the imported cleaned data from SQL 

---

# Dependencies

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import tensorflow as tf

pd.set_option('mode.chained_assignment', None)
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
random_state = 42

In [2]:
## Connect to the Database
import sqlalchemy as db
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
import psycopg2

# Import the Data

The data has been merged and mostly cleaned in SQL and previous ETL steps. This will be extracted from the NFL_Injuries server

### Make a Connection to the SQL Server

In [3]:
# Make connection to the database
from config import db_password
db_string = f"postgresql://postgres:{db_password}@127.0.0.1:5433/NFL_Injuries"
engine = db.create_engine(db_string)
conn = engine.connect()
metadata = db.MetaData()

del db_password

# Read in the injuries table:
table = db.Table('ml_injuries', metadata, autoload=True, autoload_with=engine)
query = db.select(table)
Results = conn.execute(query).fetchall()

# Create the new dataframe and set the keys
ml = pd.DataFrame(Results)
ml.columns = Results[0].keys()

# Close the connection and remove the unneccesary files
conn.close()

del Results, metadata, conn
ml.drop(columns=['PlayerGame', 'RosterPosition_Num'], inplace=True)

ml.head()


Unnamed: 0,PlayKey,time,x,y,s,PlayerGamePlay,SyntheticField,Outdoor,Position_Num,PlayCode,DaysPlayed,InjuryType,InjuryDuration,SevereInjury,IsInjured,Twist
0,26624-1-13,0.0,46.31,21.97,0.09,13,1,1,0,0.0,64,0.0,0.0,0.0,0,79.49
1,26624-1-13,0.1,46.31,21.98,0.15,13,1,1,0,0.0,64,0.0,0.0,0.0,0,67.96
2,26624-1-13,0.2,46.33,21.97,0.21,13,1,1,0,0.0,64,0.0,0.0,0.0,0,58.52
3,26624-1-13,0.3,46.34,21.98,0.26,13,1,1,0,0.0,64,0.0,0.0,0.0,0,36.34
4,26624-1-13,0.4,46.33,22.01,0.29,13,1,1,0,0.0,64,0.0,0.0,0.0,0,25.57


### Find the End Moment of each play

Since the tracking data is randomly sampled when creating the Train-Test datasets, we are separating all instances except for the final moment of each play. This will provide us with a single value per PlayKey as well as the duration of each play.

In [4]:
end_play = ml.sort_values(by=['PlayKey', 'time'], ascending=True)
end_play.drop_duplicates(subset=['PlayKey'], keep='last', inplace=True)
end_play.dropna(inplace=True)
end_play.set_index('PlayKey', inplace=True)
end_play.head()


Unnamed: 0_level_0,time,x,y,s,PlayerGamePlay,SyntheticField,Outdoor,Position_Num,PlayCode,DaysPlayed,InjuryType,InjuryDuration,SevereInjury,IsInjured,Twist
PlayKey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
26624-1-13,25.5,45.2,21.74,0.23,13,1,1,0,0.0,64,0.0,0.0,0.0,0,94.41
26624-10-48,36.6,75.53,32.04,0.27,48,0,1,0,1.0,137,0.0,0.0,0.0,0,24.67
26624-11-1,25.6,26.22,25.57,1.25,1,1,1,0,0.0,144,0.0,0.0,0.0,0,105.54
26624-11-5,18.3,75.64,28.84,3.74,5,1,1,0,0.0,144,0.0,0.0,0.0,0,95.28
26624-12-10,24.3,58.66,31.11,2.37,10,1,1,0,0.0,151,0.0,0.0,0.0,0,41.25


---
# Deep Learning
## The Tests

We want to test the following conditions: 
1. Can the model predict whether an injury occurred? 
2. Can the model predict the type of injury?
3. Can the model predict whether an injury is severe? 
4. Can the model predict the duration of the injury?

Considerations for each of these tests:
Remove PlayKey from all analyses

1. Can the model predict whether an injury occurred? 
    - y = IsInjured; 
    - Remove: InjuryType, InjuryDuration, SevereInjury, since these are all 100% correlated with injuries
    <br>
2. Can the model predict the type of injury?
    - y = InjuryType
    - Remove: IsInjured, InjuryDuration, SevereInjury
    - The injury duration is more likely due to the injury type, and not the other way around, so remove injury duration and severity
    <br>
3. Can the model predict whether an injury is severe? 
    - y = SevereInjury
    - Remove: InjuryDuration, IsInjured
    <br>
4. Can the model predict the duration of the injury?
    - y = InjuryDuration
    - Remove: SevereInjury, IsInjured


In [5]:
# Create an output table
columns = ['Test', 'Model', 'Nodes', 'Epochs', 'Accuracy', 'Loss', 'Precision', 'Recall']
nn_table = pd.DataFrame(columns=columns)
model = 'Neural Network'

## 1. Can the model predict whether an injury occurred? 

In [6]:
# Test 1, Can the model Predict the occurrence of an Injury
X = end_play.drop(columns=['IsInjured', 'SevereInjury', 'InjuryDuration', 'InjuryType'])
y = end_play.IsInjured

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=random_state)


In [7]:
# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)

In [8]:
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

nn.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 256)               3072      
                                                                 
 dense_1 (Dense)             (None, 128)               32896     
                                                                 
 dense_2 (Dense)             (None, 1)                 129       
                                                                 
Total params: 36,097
Trainable params: 36,097
Non-trainable params: 0
_________________________________________________________________


In [9]:
epochs = 100

# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [10]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

61/61 - 0s - loss: 0.1499 - accuracy: 0.9871 - precision: 0.0000e+00 - recall: 0.0000e+00 - 235ms/epoch - 4ms/step


In [11]:
test = "Is Injured"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",100,0.9871,0.1499,0.0,0.0


---
## 2. Can the model predict whether an injury is severe? 


In [12]:
y = end_play.SevereInjury
X = end_play.drop(
    columns=['IsInjured', 'SevereInjury', 'InjuryDuration'])
# This does contain injury type - to predict whether the injury will be severe

X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=random_state, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

nn.summary()


Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 256)               3328      
                                                                 
 dense_4 (Dense)             (None, 128)               32896     
                                                                 
 dense_5 (Dense)             (None, 1)                 129       
                                                                 
Total params: 36,353
Trainable params: 36,353
Non-trainable params: 0
_________________________________________________________________


In [13]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [14]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

test = "Severe Injury"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table


61/61 - 0s - loss: 0.0746 - accuracy: 0.9948 - precision_1: 0.3333 - recall_1: 0.2500 - 208ms/epoch - 3ms/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",100,0.9871,0.1499,0.0,0.0
0,Severe Injury,Neural Network,"[256, 128]",100,0.9948,0.0746,0.3333,0.25


---
## 3. Can the model predict the type of injury?


In [15]:
# Format this to do encoding
X_cat = end_play.copy(deep=True)
X_cat.drop(columns=['IsInjured', 'SevereInjury', 'InjuryDuration'], inplace=True)

# Change the Injury Types back the Categorical
injury = {3.0: 'Knee', 2.0: 'Ankle', 1.0: 'Foot', 0.0: 'NoInjury'}
X_cat['BodyPart'] = X_cat.InjuryType.map(injury)
X_cat.drop(columns='InjuryType', inplace=True)

X_cat.head()

# Grab all categorical variables and create a list for encoding
cat = X_cat.dtypes[X_cat.dtypes == 'object'].index.tolist()

# Create a OneHotEncoder Instance
# Create the instance
enc = OneHotEncoder(sparse=False)

# Fit and transform the OneHot to the columns necessary
encode_df = pd.DataFrame(enc.fit_transform(X_cat[cat]), index=X_cat.index)


# Add the original variable names to the df
encode_df.columns = enc.get_feature_names_out(cat)

# Merge the OneHot features and drop the variables
X_encoded = X_cat.merge(encode_df, left_index=True, right_index=True)
X_encoded.drop(columns=cat, inplace=True)

X_encoded.head()


Unnamed: 0_level_0,time,x,y,s,PlayerGamePlay,SyntheticField,Outdoor,Position_Num,PlayCode,DaysPlayed,Twist,BodyPart_Ankle,BodyPart_Foot,BodyPart_Knee,BodyPart_NoInjury
PlayKey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
26624-1-13,25.5,45.2,21.74,0.23,13,1,1,0,0.0,64,94.41,0.0,0.0,0.0,1.0
26624-10-48,36.6,75.53,32.04,0.27,48,0,1,0,1.0,137,24.67,0.0,0.0,0.0,1.0
26624-11-1,25.6,26.22,25.57,1.25,1,1,1,0,0.0,144,105.54,0.0,0.0,0.0,1.0
26624-11-5,18.3,75.64,28.84,3.74,5,1,1,0,0.0,144,95.28,0.0,0.0,0.0,1.0
26624-12-10,24.3,58.66,31.11,2.37,10,1,1,0,0.0,151,41.25,0.0,0.0,0.0,1.0


In [16]:
y = X_encoded.loc[:, 'BodyPart_Ankle':]
X_enc = X_encoded.drop(
    columns=['BodyPart_Ankle', 'BodyPart_Foot', 'BodyPart_Knee', 'BodyPart_NoInjury'])

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=random_state, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=4, activation='sigmoid'))

nn.summary()


Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_6 (Dense)             (None, 256)               3328      
                                                                 
 dense_7 (Dense)             (None, 128)               32896     
                                                                 
 dense_8 (Dense)             (None, 4)                 516       
                                                                 
Total params: 36,740
Trainable params: 36,740
Non-trainable params: 0
_________________________________________________________________


In [17]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [18]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

test = "Body Part"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table


61/61 - 0s - loss: 0.0056 - accuracy: 0.9974 - precision_2: 0.9979 - recall_2: 0.9974 - 216ms/epoch - 4ms/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",100,0.9871,0.1499,0.0,0.0
0,Severe Injury,Neural Network,"[256, 128]",100,0.9948,0.0746,0.3333,0.25
0,Body Part,Neural Network,"[256, 128]",100,0.9974,0.0056,0.9979,0.9974


---
## 4. Can the model predict the duration of the injury?

In [19]:
# Format this to do encoding
X_cat = end_play.copy(deep=True)
X_cat.drop(columns=['IsInjured', 'SevereInjury', 'InjuryType'], inplace=True)

# Change the Injury Types back the Categorical
duration = {0.0: 'NoInjury', 1.0: 'Under_1_Week', 7.0: 'Under_4_Weeks', 28.0: 'Under_6_Weeks', 42.0: 'Over_6_Weeks'}
X_cat['Durations'] = X_cat.InjuryDuration.map(duration)
X_cat.drop(columns='InjuryDuration', inplace=True)

# Grab all categorical variables and create a list for encoding
cat = X_cat.dtypes[X_cat.dtypes == 'object'].index.tolist()

# Create a OneHotEncoder Instance
# Create the instance
enc = OneHotEncoder(sparse=False)

# Fit and transform the OneHot to the columns necessary
encode_df = pd.DataFrame(enc.fit_transform(X_cat[cat]), index=X_cat.index)

# Add the original variable names to the df
encode_df.columns = enc.get_feature_names_out(cat)

# Merge the OneHot features and drop the variables
X_encoded = X_cat.merge(encode_df, left_index=True, right_index=True)
X_encoded.drop(columns=cat, inplace=True)

X_encoded.head()


Unnamed: 0_level_0,time,x,y,s,PlayerGamePlay,SyntheticField,Outdoor,Position_Num,PlayCode,DaysPlayed,Twist,Durations_NoInjury,Durations_Over_6_Weeks,Durations_Under_1_Week,Durations_Under_4_Weeks,Durations_Under_6_Weeks
PlayKey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
26624-1-13,25.5,45.2,21.74,0.23,13,1,1,0,0.0,64,94.41,1.0,0.0,0.0,0.0,0.0
26624-10-48,36.6,75.53,32.04,0.27,48,0,1,0,1.0,137,24.67,1.0,0.0,0.0,0.0,0.0
26624-11-1,25.6,26.22,25.57,1.25,1,1,1,0,0.0,144,105.54,1.0,0.0,0.0,0.0,0.0
26624-11-5,18.3,75.64,28.84,3.74,5,1,1,0,0.0,144,95.28,1.0,0.0,0.0,0.0,0.0
26624-12-10,24.3,58.66,31.11,2.37,10,1,1,0,0.0,151,41.25,1.0,0.0,0.0,0.0,0.0


In [20]:
y = X_encoded.loc[:, 'Durations_NoInjury':]
X_enc = X_encoded.drop(
    columns=['Durations_NoInjury', 'Durations_Over_6_Weeks', 'Durations_Under_1_Week', 'Durations_Under_4_Weeks', 'Durations_Under_6_Weeks'])

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=random_state, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=5, activation='sigmoid'))

nn.summary()


Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_9 (Dense)             (None, 256)               3328      
                                                                 
 dense_10 (Dense)            (None, 128)               32896     
                                                                 
 dense_11 (Dense)            (None, 5)                 645       
                                                                 
Total params: 36,869
Trainable params: 36,869
Non-trainable params: 0
_________________________________________________________________


In [21]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [22]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

# Add results to table
test = "Injury Duration"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table

61/61 - 0s - loss: 0.0376 - accuracy: 0.9928 - precision_3: 0.9943 - recall_3: 0.9928 - 210ms/epoch - 3ms/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",100,0.9871,0.1499,0.0,0.0
0,Severe Injury,Neural Network,"[256, 128]",100,0.9948,0.0746,0.3333,0.25
0,Body Part,Neural Network,"[256, 128]",100,0.9974,0.0056,0.9979,0.9974
0,Injury Duration,Neural Network,"[256, 128]",100,0.9928,0.0376,0.9943,0.9928


In [23]:
# Export the results table to the repo

# nn_table.to_csv("NeuralNetwork_Results.csv")

In [24]:
# # Make connection to the database
# from config import db_password
# db_string = f"postgresql://postgres:{db_password}@127.0.0.1:5433/NFL_Injuries"
# engine = db.create_engine(db_string)
# del db_string, db_password

# Write table to database
# nn_table.to_sql(name='Neural_Network_Outputs', con=engine, index=False)

---

# Summary 

For an injury analysis such as this, it is more important that our model achieve a high precision, rather than a high accuracy or recall. The accuracy only tells us how many true positives have been classified, however, the data are extremely imbalanced, which is a known problem with the accuracy measure. While it looks great to have 4 models acheive nearly 99% accuracy, this isn't indicative of a good predictive model for our purpose. Even if all of the injuries were classified as Non-Injuries, the model would be predicting above 95% if the Non-Injury is evaluated as the True Positive Measure, and nearly 0% if we consider the Injuries as the true positive. Meanwhile, the Precision gives us the count of the True Positives with respect to the True Positives and False Negatives. 

In the case that the True Positive is the injury we are evaluating, the False Negative would represent a player who is injured, but was classified as Not Injured. Our precision varied, with a very low precision in predicting whether an injury occurred without considering any details about injury type. An explanation for this is that plays meeting the critera of a high-risk play, potentially prone to injury did not result in an injury at that time, but the activity could not be differentiated from similar circumnstances that did lead to injury. 

From a medical-analytical perspective, this gives us insights as to what parameters can lead to injurious plays based on the utilized features of the players along with the other features analyzed. This improves upon our previous model that had erroneously applied the tracking data. 


## Future Analysis

This model uses the end-time from each play, since the Train-Test Split randomly splits the data. In this case, we were able to split by index, separating the positive and negative cases fairly. However, I would like to further split the testing-training set indices prior to the tracking merge so that we can utilize the full path of the player leading to the injury for further insight. 