# Post-Analysis Deep Learning of the Injury Datasets

This will process the imported cleaned data from SQL 

---

# Dependencies

In [64]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import tensorflow as tf

pd.set_option('mode.chained_assignment', None)
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
random_state = 42

In [65]:
## Connect to the Database
import sqlalchemy as db
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
import psycopg2

# Import the Messy Data

Import the playlist and injuries lists, then clean and merge the data using the functions defined in NFL_Injury_Cleaning_Functions. Because these data will be processed with Random Forests and Neural Network models, we will need to convert all categorical data to numerical data.  

### Make a Connection to the SQL Server

1. Connect to the NFL_Turf Database
2. Retrieve the data from the 'injuries' table
3. Retrieve the data from the 'playlist' table


In [66]:
# Make connection to the database
from config import db_password
db_string = f"postgresql://postgres:{db_password}@127.0.0.1:5433/NFL_Injuries"
engine = db.create_engine(db_string)
conn = engine.connect()
metadata = db.MetaData()

del db_password

# Read in the injuries table:
table = db.Table('ml_injuries', metadata, autoload=True, autoload_with=engine)
query = db.select(table)
Results = conn.execute(query).fetchall()

# Create the new dataframe and set the keys
ml = pd.DataFrame(Results)
ml.columns = Results[0].keys()

# Close the connection and remove the unneccesary files
conn.close()

del Results, metadata, conn
ml.drop(columns=['PlayerGame', 'RosterPosition_Num'], inplace=True)

ml.head()


Unnamed: 0,PlayKey,time,x,y,s,PlayerGamePlay,SyntheticField,Outdoor,Position_Num,PlayCode,DaysPlayed,InjuryType,InjuryDuration,SevereInjury,IsInjured,Twist
0,26624-1-13,0.0,46.31,21.97,0.09,13,1,1,0,0.0,64,0.0,0.0,0.0,0,79.49
1,26624-1-13,0.1,46.31,21.98,0.15,13,1,1,0,0.0,64,0.0,0.0,0.0,0,67.96
2,26624-1-13,0.2,46.33,21.97,0.21,13,1,1,0,0.0,64,0.0,0.0,0.0,0,58.52
3,26624-1-13,0.3,46.34,21.98,0.26,13,1,1,0,0.0,64,0.0,0.0,0.0,0,36.34
4,26624-1-13,0.4,46.33,22.01,0.29,13,1,1,0,0.0,64,0.0,0.0,0.0,0,25.57


### Find the End Moment of each play

Since the tracking data is randomly sampled when creating the Train-Test datasets, we are separating all instances except for the final moment of each play. This will provide us with a single value per PlayKey as well as the duration of each play.

In [67]:
end_play = ml.sort_values(by=['PlayKey', 'time'], ascending=True)
end_play.drop_duplicates(subset=['PlayKey'], keep='last', inplace=True)
end_play.dropna(inplace=True)
end_play.set_index('PlayKey', inplace=True)
end_play.head()


Unnamed: 0_level_0,time,x,y,s,PlayerGamePlay,SyntheticField,Outdoor,Position_Num,PlayCode,DaysPlayed,InjuryType,InjuryDuration,SevereInjury,IsInjured,Twist
PlayKey,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
26624-1-13,25.5,45.2,21.74,0.23,13,1,1,0,0.0,64,0.0,0.0,0.0,0,94.41
26624-10-48,36.6,75.53,32.04,0.27,48,0,1,0,1.0,137,0.0,0.0,0.0,0,24.67
26624-11-1,25.6,26.22,25.57,1.25,1,1,1,0,0.0,144,0.0,0.0,0.0,0,105.54
26624-11-5,18.3,75.64,28.84,3.74,5,1,1,0,0.0,144,0.0,0.0,0.0,0,95.28
26624-12-10,24.3,58.66,31.11,2.37,10,1,1,0,0.0,151,0.0,0.0,0.0,0,41.25


---
# Deep Learning
## The Tests

We want to test the following conditions: 
1. Can the model predict whether an injury occurred? 
2. Can the model predict the type of injury?
3. Can the model predict whether an injury is severe? 
4. Can the model predict the duration of the injury?

Considerations for each of these tests:
Remove PlayKey from all analyses

1. Can the model predict whether an injury occurred? 
    - y = IsInjured; 
    - Remove: InjuryType, InjuryDuration, SevereInjury, since these are all 100% correlated with injuries
    <br>
2. Can the model predict the type of injury?
    - y = InjuryType
    - Remove: IsInjured, InjuryDuration, SevereInjury
    - The injury duration is more likely due to the injury type, and not the other way around, so remove injury duration and severity
    <br>
3. Can the model predict whether an injury is severe? 
    - y = SevereInjury
    - Remove: InjuryDuration, IsInjured
    <br>
4. Can the model predict the duration of the injury?
    - y = InjuryDuration
    - Remove: SevereInjury, IsInjured


In [68]:
# Create an output table
columns = ['Test', 'Model', 'Nodes', 'Epochs', 'Accuracy', 'Loss', 'Precision', 'Recall']
nn_table = pd.DataFrame(columns=columns)
model = 'Neural Network'

In [17]:
ml_merged.head(2)

Unnamed: 0,PlayKey,x,y,s,Twist,RosterPosition,Temperature,PlayerGamePlay,Position,Outdoor,Precipitation,DaysPlayed,PlayCode,InjuryType,InjuryDuration,SevereInjury,IsInjured
0,26624-1-45,21.32,29.14,0.88,23.24,0,63,45,0,1,0.0,64,0.0,0.0,0.0,0.0,0
1,26624-1-45,21.31,29.21,0.91,15.59,0,63,45,0,1,0.0,64,0.0,0.0,0.0,0.0,0


In [18]:
X = ml_merged.copy(deep=True)
X.drop(columns=['PlayKey', 'InjuryType', 'SevereInjury', 'Position', 'InjuryDuration', 'IsInjured'], inplace=True)

In [19]:
X.head()

Unnamed: 0,x,y,s,Twist,RosterPosition,Temperature,PlayerGamePlay,Outdoor,Precipitation,DaysPlayed,PlayCode
0,21.32,29.14,0.88,23.24,0,63,45,1,0.0,64,0.0
1,21.31,29.21,0.91,15.59,0,63,45,1,0.0,64,0.0
2,21.3,29.29,0.93,7.61,0,63,45,1,0.0,64,0.0
3,21.28,29.38,0.93,0.42,0,63,45,1,0.0,64,0.0
4,21.26,29.45,0.89,6.2,0,63,45,1,0.0,64,0.0


Using IsInjured as the label, there are no categorical columns that need to be encoded

In [69]:
# Test 1, Can the model Predict the occurrence of an Injury
X = end_play.drop(columns=['IsInjured', 'SevereInjury', 'InjuryDuration', 'InjuryType'])
y = end_play.IsInjured

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=random_state)


In [70]:
# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)

In [71]:
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

nn.summary()

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_28 (Dense)            (None, 256)               3072      
                                                                 
 dense_29 (Dense)            (None, 128)               32896     
                                                                 
 dense_30 (Dense)            (None, 1)                 129       
                                                                 
Total params: 36,097
Trainable params: 36,097
Non-trainable params: 0
_________________________________________________________________


In [72]:
epochs = 100

# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [73]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

61/61 - 0s - loss: 0.1592 - accuracy: 0.9856 - precision_15: 0.0909 - recall_15: 0.0526 - 220ms/epoch - 4ms/step


In [74]:
test = "Is Injured"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",100,0.9856,0.1592,0.0909,0.0526


---
# Injury Type Prediction

In [75]:
y = end_play.InjuryType
X = end_play.drop(columns=['IsInjured', 'SevereInjury',
                  'InjuryDuration', 'InjuryType'])

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=random_state, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

nn.summary()


Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_31 (Dense)            (None, 256)               3072      
                                                                 
 dense_32 (Dense)            (None, 128)               32896     
                                                                 
 dense_33 (Dense)            (None, 1)                 129       
                                                                 
Total params: 36,097
Trainable params: 36,097
Non-trainable params: 0
_________________________________________________________________


In [76]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [77]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

test = "Severe Injury"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table


61/61 - 0s - loss: 0.1640 - accuracy: 0.9959 - precision_16: 0.0000e+00 - recall_16: 0.0000e+00 - 206ms/epoch - 3ms/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",100,0.9856,0.1592,0.0909,0.0526
0,Severe Injury,Neural Network,"[256, 128]",100,0.9959,0.164,0.0,0.0


The ability to predict whether a player will be injured remains one of the lowest accuracies and precisions of any of our models; however, the specific types of injuries have much higher accuracies and predictions. This is likely explained by the differences the conditions that lead to the different injuries. If they were all overallping, the IsInjured condition should be easier to predict. But if the conditions leading to the different types of injuries are unique, then there is a loss in predictability on just the yes/no, will these conditions lead to an injury. Thus, the model for IsInjured would be improved my using the specific body-parts and using the union of those datasets as an injury predictor.

---
## Injury Type Prediction - General Model

- There are several Injury Type Models:
    - General Model Classifies into 4 categories
    - Foot Injury
    - Ankle Injury
    - Knee Injury

In [29]:
# Format this to do encoding
X_cat = end_play.copy(deep=True)
X_cat.drop(columns=['IsInjured', 'SevereInjury', 'InjuryDuration'], inplace=True)

# Change the Injury Types back the Categorical
injury = {3.0: 'Knee', 2.0: 'Ankle', 1.0: 'Foot', 0.0: 'NoInjury'}
X_cat['BodyPart'] = X_cat.InjuryType.map(injury)
X_cat.drop(columns='InjuryType', inplace=True)

X_cat.head()

# Grab all categorical variables and create a list for encoding
cat = X_cat.dtypes[X_cat.dtypes == 'object'].index.tolist()

# Create a OneHotEncoder Instance
# Create the instance
enc = OneHotEncoder(sparse=False)

# Fit and transform the OneHot to the columns necessary
encode_df = pd.DataFrame(enc.fit_transform(X_cat[cat]))

# Add the original variable names to the df
encode_df.columns = enc.get_feature_names_out(cat)

# Merge the OneHot features and drop the variables
X_encoded = X_cat.merge(encode_df, left_index=True, right_index=True)
X_encoded.drop(columns=cat, inplace=True)

X_encoded.head()

Unnamed: 0,x,y,s,Twist,RosterPosition,Temperature,PlayerGamePlay,Outdoor,Precipitation,DaysPlayed,PlayCode,BodyPart_Ankle,BodyPart_Foot,BodyPart_Knee,BodyPart_NoInjury
0,21.32,29.14,0.88,23.24,0,63,45,1,0.0,64,0.0,0.0,0.0,0.0,1.0
1,21.31,29.21,0.91,15.59,0,63,45,1,0.0,64,0.0,0.0,0.0,0.0,1.0
2,21.3,29.29,0.93,7.61,0,63,45,1,0.0,64,0.0,0.0,0.0,0.0,1.0
3,21.28,29.38,0.93,0.42,0,63,45,1,0.0,64,0.0,0.0,0.0,0.0,1.0
4,21.26,29.45,0.89,6.2,0,63,45,1,0.0,64,0.0,0.0,0.0,0.0,1.0


In [30]:
y = X_encoded.loc[:, 'BodyPart_Ankle':]
X_enc = X_encoded.drop(
    columns=['BodyPart_Ankle', 'BodyPart_Foot', 'BodyPart_Knee', 'BodyPart_NoInjury'])

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=seed, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=4, activation='sigmoid'))

nn.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_5 (Dense)             (None, 256)               3072      
                                                                 
 dense_6 (Dense)             (None, 128)               32896     
                                                                 
 dense_7 (Dense)             (None, 4)                 516       
                                                                 
Total params: 36,484
Trainable params: 36,484
Non-trainable params: 0
_________________________________________________________________


In [31]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/2
Epoch 2/2


In [32]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

# Add results to table
test = "Injury Type 4-Classes"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table

17338/17338 - 10s - loss: 0.0020 - accuracy: 0.9988 - precision_2: 0.9986 - recall_2: 0.9987 - 10s/epoch - 604us/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",2,0.9952,0.014,0.9706,0.5412
0,Severe Injury,Neural Network,"[256, 128]",2,0.9995,0.0013,0.9174,0.9537
0,Injury Type 4-Classes,Neural Network,"[256, 128]",2,0.9988,0.002,0.9986,0.9987


### Breaking this down to the different injury types
We can't get the specific Precision and Recall for the individual injuries like we were able to with the Random Forests algorith, so we broke up this into 3 additional analyses
 
--- 
#### Foot Injury Prediction 

In [33]:
# Foot is encoded by the value 7.0, ankle is 42.0, and knee is 48.0
y = ml_merged.InjuryType.apply(lambda row: 1 if row == 7.0 else 0) # To evaluate Foot Injuries

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=seed, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

nn.summary()


Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (None, 256)               3072      
                                                                 
 dense_9 (Dense)             (None, 128)               32896     
                                                                 
 dense_10 (Dense)            (None, 1)                 129       
                                                                 
Total params: 36,097
Trainable params: 36,097
Non-trainable params: 0
_________________________________________________________________


In [34]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/2
Epoch 2/2


In [35]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

# Add results to table
test = "Foot Injury"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table


17338/17338 - 11s - loss: 6.6102e-05 - accuracy: 1.0000 - precision_3: 0.9953 - recall_3: 0.9930 - 11s/epoch - 657us/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",2,0.9952,0.014,0.9706,0.5412
0,Severe Injury,Neural Network,"[256, 128]",2,0.9995,0.0013,0.9174,0.9537
0,Injury Type 4-Classes,Neural Network,"[256, 128]",2,0.9988,0.002,0.9986,0.9987
0,Foot Injury,Neural Network,"[256, 128]",2,1.0,0.0001,0.9953,0.993


---
### Ankle Injury

In [36]:
# Foot is encoded by the value 7.0, ankle is 42.0, and knee is 48.0
y = ml_merged.InjuryType.apply(
    lambda row: 1 if row == 42.0 else 0)  # To evaluate Ankle Injuries

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=seed, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

nn.summary()


Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_11 (Dense)            (None, 256)               3072      
                                                                 
 dense_12 (Dense)            (None, 128)               32896     
                                                                 
 dense_13 (Dense)            (None, 1)                 129       
                                                                 
Total params: 36,097
Trainable params: 36,097
Non-trainable params: 0
_________________________________________________________________


In [37]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/2
Epoch 2/2


In [38]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

# Add results to table
test = "Ankle Injury"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table

17338/17338 - 11s - loss: 0.0016 - accuracy: 0.9995 - precision_4: 0.9559 - recall_4: 0.9247 - 11s/epoch - 643us/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",2,0.9952,0.014,0.9706,0.5412
0,Severe Injury,Neural Network,"[256, 128]",2,0.9995,0.0013,0.9174,0.9537
0,Injury Type 4-Classes,Neural Network,"[256, 128]",2,0.9988,0.002,0.9986,0.9987
0,Foot Injury,Neural Network,"[256, 128]",2,1.0,0.0001,0.9953,0.993
0,Ankle Injury,Neural Network,"[256, 128]",2,0.9995,0.0016,0.9559,0.9247


---
### Knee Injury

In [39]:
# Foot is encoded by the value 7.0, ankle is 42.0, and knee is 48.0
y = ml_merged.InjuryType.apply(
    lambda row: 1 if row == 48.0 else 0)  # To evaluate Knee Injuries

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=seed, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

nn.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_14 (Dense)            (None, 256)               3072      
                                                                 
 dense_15 (Dense)            (None, 128)               32896     
                                                                 
 dense_16 (Dense)            (None, 1)                 129       
                                                                 
Total params: 36,097
Trainable params: 36,097
Non-trainable params: 0
_________________________________________________________________


In [40]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/2
Epoch 2/2


In [41]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

# Add results to table
test = "Knee Injury"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table

17338/17338 - 11s - loss: 0.0012 - accuracy: 0.9996 - precision_5: 0.9959 - recall_5: 0.9134 - 11s/epoch - 643us/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",2,0.9952,0.014,0.9706,0.5412
0,Severe Injury,Neural Network,"[256, 128]",2,0.9995,0.0013,0.9174,0.9537
0,Injury Type 4-Classes,Neural Network,"[256, 128]",2,0.9988,0.002,0.9986,0.9987
0,Foot Injury,Neural Network,"[256, 128]",2,1.0,0.0001,0.9953,0.993
0,Ankle Injury,Neural Network,"[256, 128]",2,0.9995,0.0016,0.9559,0.9247
0,Knee Injury,Neural Network,"[256, 128]",2,0.9996,0.0012,0.9959,0.9134


---
## Injury Duration Predictor - 5-Way Classifier

In [42]:
# Format this to do encoding
X_cat = ml_merged.copy(deep=True)
X_cat.drop(columns=['PlayKey', 'IsInjured', 'SevereInjury',
           'Position', 'InjuryType'], inplace=True)

# Change the Injury Types back the Categorical
duration = {0.0: 'NoInjury', 1.0: 'Under_1_Week', 7.0: 'Under_4_Weeks', 28.0: 'Under_6_Weeks', 42.0: 'Over_6_Weeks'}
X_cat['Durations'] = X_cat.InjuryDuration.map(duration)
X_cat.drop(columns='InjuryDuration', inplace=True)

# Grab all categorical variables and create a list for encoding
cat = X_cat.dtypes[X_cat.dtypes == 'object'].index.tolist()

# Create a OneHotEncoder Instance
# Create the instance
enc = OneHotEncoder(sparse=False)

# Fit and transform the OneHot to the columns necessary
encode_df = pd.DataFrame(enc.fit_transform(X_cat[cat]))

# Add the original variable names to the df
encode_df.columns = enc.get_feature_names_out(cat)

# Merge the OneHot features and drop the variables
X_encoded = X_cat.merge(encode_df, left_index=True, right_index=True)
X_encoded.drop(columns=cat, inplace=True)

X_encoded.head()


Unnamed: 0,x,y,s,Twist,RosterPosition,Temperature,PlayerGamePlay,Outdoor,Precipitation,DaysPlayed,PlayCode,Durations_NoInjury,Durations_Over_6_Weeks,Durations_Under_1_Week,Durations_Under_4_Weeks,Durations_Under_6_Weeks
0,21.32,29.14,0.88,23.24,0,63,45,1,0.0,64,0.0,1.0,0.0,0.0,0.0,0.0
1,21.31,29.21,0.91,15.59,0,63,45,1,0.0,64,0.0,1.0,0.0,0.0,0.0,0.0
2,21.3,29.29,0.93,7.61,0,63,45,1,0.0,64,0.0,1.0,0.0,0.0,0.0,0.0
3,21.28,29.38,0.93,0.42,0,63,45,1,0.0,64,0.0,1.0,0.0,0.0,0.0,0.0
4,21.26,29.45,0.89,6.2,0,63,45,1,0.0,64,0.0,1.0,0.0,0.0,0.0,0.0


In [43]:
y = X_encoded.loc[:, 'Durations_NoInjury':]
X_enc = X_encoded.drop(
    columns=['Durations_NoInjury', 'Durations_Over_6_Weeks', 'Durations_Under_1_Week', 'Durations_Under_4_Weeks', 'Durations_Under_6_Weeks'])

# Because the True case only represents 1% of the data, the training split is stratifying on y
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=seed, stratify=y)

# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit(X_train)
X_train_scaled = X_scaled.transform(X_train)
X_test_scaled = X_scaled.transform(X_test)


# Establish the NN Model
number_input_features = len(X_train_scaled[0])
hidden_layer1 = 256
hidden_layer2 = 128

nn = tf.keras.models.Sequential()

# Layers
nn.add(tf.keras.layers.Dense(units=hidden_layer1,
       input_dim=number_input_features, activation='relu'))
nn.add(tf.keras.layers.Dense(units=hidden_layer2, activation='relu'))
nn.add(tf.keras.layers.Dense(units=5, activation='sigmoid'))

nn.summary()


Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_17 (Dense)            (None, 256)               3072      
                                                                 
 dense_18 (Dense)            (None, 128)               32896     
                                                                 
 dense_19 (Dense)            (None, 5)                 645       
                                                                 
Total params: 36,613
Trainable params: 36,613
Non-trainable params: 0
_________________________________________________________________


In [44]:
# Compile the model
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=[
           'accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()])

# Train the model
fit_model = nn.fit(X_train_scaled, y_train, epochs=epochs)

Epoch 1/2
Epoch 2/2


In [45]:
# Evaluate the model using the test data
results = nn.evaluate(X_test_scaled, y_test, verbose=2)

# Add results to table
test = "Injury Duration 5-Classes"
loss = round(results[0], 4)
accuracy = round(results[1], 4)
precision = round(results[2], 4)
recall = round(results[3], 4)
nodes = [hidden_layer1, hidden_layer2]

row = pd.DataFrame(
    [[test, model, nodes, epochs, accuracy, loss, precision, recall]], columns=columns)
nn_table = nn_table.append(row)
nn_table

17338/17338 - 11s - loss: 0.0014 - accuracy: 0.9989 - precision_6: 0.9988 - recall_6: 0.9989 - 11s/epoch - 612us/step


Unnamed: 0,Test,Model,Nodes,Epochs,Accuracy,Loss,Precision,Recall
0,Is Injured,Neural Network,"[256, 128]",2,0.9952,0.014,0.9706,0.5412
0,Severe Injury,Neural Network,"[256, 128]",2,0.9995,0.0013,0.9174,0.9537
0,Injury Type 4-Classes,Neural Network,"[256, 128]",2,0.9988,0.002,0.9986,0.9987
0,Foot Injury,Neural Network,"[256, 128]",2,1.0,0.0001,0.9953,0.993
0,Ankle Injury,Neural Network,"[256, 128]",2,0.9995,0.0016,0.9559,0.9247
0,Knee Injury,Neural Network,"[256, 128]",2,0.9996,0.0012,0.9959,0.9134
0,Injury Duration 5-Classes,Neural Network,"[256, 128]",2,0.9989,0.0014,0.9988,0.9989


In [46]:
# Export the results table to the repo

nn_table.to_csv("NeuralNetwork_Results.csv")

In [48]:
# Make connection to the database
from config import db_password
db_string = f"postgresql://postgres:{db_password}@127.0.0.1:5433/NFL_Injuries"
engine = db.create_engine(db_string)
del db_string, db_password

# Write table to database
# nn_table.to_sql(name='Neural_Network_Outputs', con=engine, index=False)

---

# Summary 

For an injury analysis such as this, it is more important that our model achieve a high precision, rather than a high accuracy or recall. The accuracy only tells us how many true positives have been classified, however, the data are extremely imbalanced, which is a known problem with the accuracy measure. Even if all of the injuries were classified as Non-Injuries, the model would be predicting at 99.99%, if the Non-Injury is evaluated as the True Positive Measure, and nearly 0% if we consider the Injuries as the true positive. Meanwhile, the Precision gives us the count of the True Positives with respect to the True Positives and False Negatives. 

In the case that the True Positive is the injury we are evaluating, the False Negative would represent a player who is injured, but was classified as Not Injured. In most of our analyses, the precisiion was extremely high, though the recall tended to lag. An explanation for this is that plays meeting the critera of a high-risk play, potentially prone to injury did not result in an injury at that time, but the activity could not be differentiated from similar circumnstances that did lead to injury. 

From a medical-analytical perspective, this gives us insights as to what parameters can lead to injurious plays based on the locations of the players along with the other features analyzed. 


## Future Analysis

We would like to use the features analyzed with the Random Forests analysis to try to remove some, futher finding the most critical features leading to these lower body injuries.