## NFL First and Future - Impact Detection
-------------------
# Impact Occurrence Classifier | Train

In this notebook, we will train an algorithm to predict the probablity of an impact occurring IN GENERAL for a particular frame in a video. We'll be reducing the problem to a simple binary classification problem. We can however use the results from this algorithm in many ways - either to add as a meta-feature for the final algorithm, or to perform an operation on the final prediction probablities (e.g. multiply the probabilities together).

 - What data do we have on what kind of play it is? Certain plays are more likely to have collisions. 
 - Cross-validation. When splitting, we should consider game, team, play, AND camera angle.



#### CLEANING STEPS:
 1. `impactType` - shoulder values
 - Bin player positions and add as a feature
 
 
## 1.00 Import Packages

In [11]:
# General packages
import pandas as pd
import numpy as np
import os
import gc
import random
from tqdm import tqdm, tqdm_notebook

import time
import warnings
warnings.filterwarnings('ignore')

# Data vis packages
import matplotlib.pyplot as plt
%matplotlib inline

# Data prep
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel
from sklearn.decomposition import PCA

# Modelling packages
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.keras import backend as k
# Key layers
from tensorflow.keras.models import Model, Sequential, load_model
from tensorflow.keras.layers import Input, Add, Dense, Flatten
# Activation layers
from tensorflow.keras.layers import ReLU, LeakyReLU, ELU, ThresholdedReLU
# Dropout layers
from tensorflow.keras.layers import Dropout, AlphaDropout, GaussianDropout
# Normalisation layers
from tensorflow.keras.layers import BatchNormalization
# Embedding layers
from tensorflow.keras.layers import Embedding, Concatenate, Reshape
# Callbacks
from tensorflow.keras.callbacks import Callback, EarlyStopping, LearningRateScheduler, ModelCheckpoint
# Optimisers
from tensorflow.keras.optimizers import SGD, RMSprop, Adam, Adadelta, Adagrad, Adamax, Nadam, Ftrl
# Model cross validation and evaluation
from sklearn.model_selection import StratifiedKFold
from sklearn import metrics

# For Bayesian hyperparameter searching
from skopt import gbrt_minimize, gp_minimize
from skopt.utils import use_named_args
from skopt.space import Real, Categorical, Integer

In [2]:
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

strategy = tf.distribute.get_strategy()
REPLICAS = strategy.num_replicas_in_sync
print(f'REPLICAS: {REPLICAS}')

# Data access
gpu_options = tf.compat.v1.GPUOptions(allow_growth=True)

Num GPUs Available:  1
REPLICAS: 1


## 2.00 Read in Data

In [10]:
# Directory and file paths
input_dir               = '../input'
train_video_labels_path = os.path.join(input_dir, 'train_labels.csv')
train_track_data_path   = os.path.join(input_dir, 'train_player_tracking.csv')
sample_submission_path  = os.path.join(input_dir, 'sample_submission.csv')

# Read in data
train_video_labels = pd.read_csv(train_video_labels_path)
train_track_data   = pd.read_csv(train_track_data_path)
sample_submission  = pd.read_csv(sample_submission_path)

del train_video_labels_path, train_track_data_path, sample_submission_path

print(f'train_video_labels shape: \t{train_video_labels.shape}')
print(f'train_track_data shape: \t{train_track_data.shape}')
print(f'sample_submission shape: \t{sample_submission.shape}')

train_video_labels shape: 	(983885, 14)
train_track_data shape: 	(333811, 12)
sample_submission shape: 	(56230, 9)


In [14]:
# Define key parameters
SEED = 14
np.random.seed(SEED)

SCALER_METHOD = StandardScaler()

FEATURE_SELECTOR = RandomForestClassifier(random_state=SEED)

NUM_COMPONENTS = 200
PCA_METHOD = PCA(n_components=NUM_COMPONENTS, random_state=SEED)

EPOCHS = 100
BATCH_SIZE = 64
KFOLDS = 10
PATIENCE = 10

USE_EMBEDDING = True
MODEL_TO_USE = 'nn'
model_name_save = MODEL_TO_USE + '_impact_occurrence_classifier_seed' + str(SEED)

# Create weights path if does not exist already
if not os.path.exists(f'models/{model_name_save}'):
    os.mkdir(f'models/{model_name_save}')

print(f'Model name: {model_name_save}')

Model name: nn_impact_occurrence_classifier_seed14


## 3.00 Data Preparation

In [None]:
def merge_datasets(video_labels, track_data):
    """
    Merges video label features to the player tracking data.
    
    Parameters
    ----------     
    video_labels : pd.DataFrame 
        pd.DataFrame of video label data
    track_data : pd.DataFrame
        pd.DataFrame of player tracking data
    
    Returns
    -------
    pd.DataFrame of player tracking data with impact label features merged in
        
    """
    
    
    
    
    
    
    
    
    return(merged_track_data)
    