# Data Preparation 

When creating an AI model curating the data to train the model is most important consideration. Using meaningful variables while taking in to account different facets of the game can create a powerful model. 

## Player-Specific Variables
### General Player Information
    - Player ID: Unique identifier for each player
    - Position: Position played (QB, WR, RB, TE)
    - Team: Player's team
    - Height, Weight, Age: Physical attributes
    
### Historical Performance Metrics
    - Games Played: Number of games played
    - Games Started: Number of games started
    - Snaps Played: Number of snaps played
    - Passing Yards (QB): Total passing yards
    - Passing Touchdowns (QB): Total passing touchdowns
    - Interceptions (QB): Number of interceptions thrown
    - Rushing Yards (RB/QB/WR/TE): Total rushing yards
    - Rushing Touchdowns (RB/QB/WR/TE): Total rushing touchdowns
    - Receptions (WR/TE/RB): Number of receptions
    - Receiving Yards (WR/TE/RB): Total receiving yards
    - Receiving Touchdowns (WR/TE/RB): Total receiving touchdowns
    - Targets (WR/TE/RB): Number of targets
    - Fumbles: Number of fumbles

### Team-Specific Variables
    - Team Offensive Metrics: Overall offensive performance of the team (total yards, points scored, etc.)
    - Team Defensive Metrics: Overall defensive performance of the team (yards allowed, points allowed, etc.)
    - Offensive Line Strength: Metrics like sacks allowed, pressures allowed, etc.
    - Defensive Line Strength: Metrics like sacks made, pressures applied, etc.
    - Injuries: Current injury report for the team

### Opponent-Specific Variables
    - Opponent Team Defensive Metrics: Performance metrics of the opposing team's defense (yards allowed, points allowed, turnovers forced, etc.)
    - Opponent Defensive Schemes: Information about the defensive schemes commonly used by the opponent (e.g., 4-3 defense, zone coverage tendencies, etc.)
    - Matchup History: Historical performance of the player/team against the upcoming opponent
    - Opponent Injuries: Current injury report for the opposing team

### Coaching and Strategy Variables
    - Head Coach and Coordinators: Information about the head coach, offensive coordinator, and defensive coordinator
    - Play Calling Tendencies: Historical data on play calling tendencies (run vs. pass ratio, play action usage, etc.)
    - Scheme Preferences: Offensive and defensive scheme preferences (e.g., West Coast offense, Cover 2 defense)
    - Game Plan Adjustments: Any reported adjustments or strategies for the upcoming game

### Contextual and Situational Variables
    - Game Location: Home or away game
    - Weather Conditions: Weather forecast for the game day (temperature, precipitation, wind speed, etc.)
    - Game Type: Regular season, playoff, or preseason game
    - Time of Season: Early, mid, or late season
    - Time of Game: Day game, night game, or prime-time game
    - Field Surface: Type of field surface (grass, turf, etc.)
    - Vegas Odds: Betting lines, point spreads, and over/under totals

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Ensure GPU is available
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

# Example dataset
player_stats_data = {
    'Player ID': [1, 2],
    'Name': ['Player A', 'Player B'],
    'Team': ['XYZ', 'XYZ'],
    'Position': ['QB', 'WR'],
    'Games Played': [10, 10],
    'Passing Yards': [2500, 0],
    'Passing TDs': [20, 0],
    'Interceptions': [5, 0],
    'Rushing Yards': [200, 100],
    'Receptions': [0, 50],
    'Receiving Yards': [0, 600],
    'Receiving TDs': [0, 6],
    'Targets': [0, 75],
    'Fumbles': [2, 1]
}

team_stats_data = {
    'Team': ['XYZ'],
    'Offensive Yards': [4000],
    'Points Scored': [300],
    'Defensive Yards Allowed': [3500],
    'Points Allowed': [250],
    'Sacks Allowed': [15],
    'Sacks Made': [20],
    'Current Injuries': ['Player A (Out), Player B (Q)']
}

opponent_stats_data = {
    'Opponent': ['ABC'],
    'Defensive Yards Allowed': [3200],
    'Points Allowed': [200],
    'Turnovers Forced': [15],
    'Sacks Made': [18],
    'Defensive Scheme': ['4-3, Zone Coverage'],
    'Current Injuries': ['Player C (Out), Player D (Q)']
}

contextual_data = {
    'Game ID': [101],
    'Home Team': ['XYZ'],
    'Away Team': ['ABC'],
    'Location': ['Home'],
    'Weather': ['Clear'],
    'Game Type': ['Regular'],
    'Time of Season': ['Mid'],
    'Time of Game': ['Night'],
    'Field Surface': ['Turf'],
    'Vegas Odds': ['XYZ -3.5']
}

# Convert dictionaries to DataFrames
player_stats_df = pd.DataFrame(player_stats_data)
team_stats_df = pd.DataFrame(team_stats_data)
opponent_stats_df = pd.DataFrame(opponent_stats_data)
contextual_data_df = pd.DataFrame(contextual_data)

# Merge data into a single DataFrame
merged_data = pd.concat([player_stats_df, team_stats_df, opponent_stats_df, contextual_data_df], axis=1)

# Select features and target
features = merged_data[['Games Played', 'Passing Yards', 'Passing TDs', 'Interceptions', 'Rushing Yards', 
                        'Receptions', 'Receiving Yards', 'Receiving TDs', 'Targets', 'Fumbles',
                        'Offensive Yards', 'Points Scored', 'Defensive Yards Allowed', 'Points Allowed',
                        'Sacks Allowed', 'Sacks Made', 'Defensive Yards Allowed', 'Points Allowed', 
                        'Turnovers Forced', 'Sacks Made']]

target = merged_data[['Passing Yards', 'Passing TDs', 'Rushing Yards', 'Receptions', 'Receiving Yards', 'Receiving TDs']]

# Normalize the data
features = (features - features.mean()) / features.std()

# Convert to NumPy arrays
features = np.array(features)
target = np.array(target)

# Split data into training and testing sets
train_features = features[:int(0.8 * len(features))]
test_features = features[int(0.8 * len(features)):]

train_target = target[:int(0.8 * len(target))]
test_target = target[int(0.8 * len(target)):]

# Define the model
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=[train_features.shape[1]]),
    layers.Dense(64, activation='relu'),
    layers.Dense(train_target.shape[1])
])

# Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae', 'mse'])

# Train the model
history = model.fit(train_features, train_target, epochs=100, validation_split=0.2, verbose=1)

# Evaluate the model
loss, mae, mse = model.evaluate(test_features, test_target, verbose=2)
print("Testing set Mean Abs Error: {:5.2f}".format(mae))

# Predict future games
predictions = model.predict(test_features)
print("Predictions: ", predictions)

# Continuously train with new data (dummy new data here for demonstration)
new_features = test_features
new_target = test_target

# Re-train the model with new data
model.fit(new_features, new_target, epochs=50, validation_split=0.2, verbose=1)

2024-07-31 15:35:57.897478: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-31 15:35:57.903844: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2024-07-31 15:35:57.919158: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-31 15:35:57.953496: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-31 15:35:57.965481: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-31 15:35:57.987112: I tensorflow/core/platform/cpu_feature_gu

Num GPUs Available:  0


I0000 00:00:1722461762.084624 1238821 cuda_executor.cc:1001] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-07-31 15:36:02.087309: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2343] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


ValueError: Training data contains 1 samples, which is not sufficient to split it into a validation and training set as specified by `validation_split=0.2`. Either provide more data, or a different value for the `validation_split` argument.