# NBA AI - Deep Learning Base Model - PyTorch

### 1. Neural Networks Structure
- A neural network comprises various layers: the input layer (for receiving data), hidden layers (where computation and feature extraction occur), and the output layer (producing the final prediction). Each layer contains neurons that perform weighted sums of their inputs followed by an activation function.
- **PyTorch Layers**: In PyTorch, `torch.nn.Module` is the base class for all neural network modules. Use `nn.Linear` for fully connected layers, suitable for tabular data. `nn.Conv2d` is used for convolutional layers, which are effective for image data due to their ability to capture spatial hierarchies. `nn.LSTM` or `nn.GRU` layers are used for sequential data like time series or text, capable of capturing temporal dynamics.

### 2. Activation Functions
- Activation functions introduce non-linear properties to the network. This non-linearity is crucial as it allows the network to learn complex patterns. ReLU (Rectified Linear Unit) is widely used in hidden layers due to its computational efficiency and ability to mitigate the vanishing gradient problem. However, other functions like sigmoid (squashes outputs between 0 and 1) and tanh (outputs between -1 and 1) are also important, particularly in output layers for binary classification or when normalized output is required.
- **PyTorch Implementation**: PyTorch provides these activation functions in `torch.nn.functional`. For example, `F.relu` for applying ReLU, `F.sigmoid` for sigmoid, and `F.tanh` for the tanh function. They are typically used within the `forward` method of a `torch.nn.Module` class.

### 3. Loss Functions
- The choice of the loss function is crucial and should align with the nature of the problem. For regression tasks, Mean Squared Error (MSE) is commonly used as it penalizes larger errors more severely. For classification tasks, Cross-Entropy Loss is standard as it measures the difference between two probability distributions - the actual labels and the predicted probabilities.
- **PyTorch Implementation**: PyTorch's `torch.nn` module contains various loss functions. `nn.MSELoss()` is used for regression tasks, and `nn.CrossEntropyLoss()` is used for multi-class classification tasks. CrossEntropyLoss in PyTorch combines a SoftMax activation with the negative log-likelihood loss in one single class.

### 4. Optimizers
- Optimizers are algorithms used for changing the attributes of the neural network such as weights and learning rate to reduce the losses. Optimizers aim to minimize (or maximize) the loss (or objective) function. While the SGD optimizer is straightforward and effective for many problems, Adam is popular due to its adaptive learning rate capabilities, which can lead to faster convergence.
- **PyTorch Implementation**: Optimizers in PyTorch are available under `torch.optim`. For instance, `torch.optim.SGD` for stochastic gradient descent and `torch.optim.Adam` for the Adam optimizer. They require the parameters to optimize (typically obtained using `model.parameters()`) and a learning rate.

### 5. Backpropagation and Gradient Descent
- Backpropagation is a mechanism used to update the weights of the network efficiently. It calculates the gradient (partial derivatives) of the loss function with respect to each weight in the network by the chain rule, enabling efficient computation of gradients. Gradient Descent, on the other hand, is an optimization algorithm used to minimize the loss function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.
- **PyTorch Mechanism**: In PyTorch, backpropagation is implemented through automatic differentiation provided by the `torch.autograd` module. Use `loss.backward()` to compute the gradient of the loss with respect to each weight and `optimizer.step()` to perform a single optimization step.

### 6. Overfitting and Underfitting
- Overfitting occurs when a model learns the training data too well, including its noise and outliers, resulting in poor performance on new data. Underfitting occurs when a model is too simple to capture the underlying pattern in the data, leading to poor performance both on the training and new data. Both issues are critical in the development of robust models.
- **PyTorch Tools**: PyTorch offers various tools to combat overfitting. For instance, `nn.Dropout` is a layer that randomly zeroes some of the elements of the input tensor with probability `p` during training, which helps prevent overfitting by providing a form of regularization.

### 7. Regularization Techniques
- Regularization techniques are used to prevent overfitting, which is a common problem in deep learning models. These techniques add information or constraints to the loss function or the network itself to reduce its complexity. Common techniques include L1 and L2 regularization, which add penalties to the loss function based on the size of the weights, and dropout, which randomly drops units from the neural network during training to prevent the network from becoming too dependent on certain pathways.
- **PyTorch Implementation**: In PyTorch, L1/L2 regularization is often included in the optimizer's weight decay parameter. Dropout is implemented as a layer (`nn.Dropout`) and can be added to the network architecture in `torch.nn`.

### 8. Batch Size and Epochs
- The batch size and the number of epochs are hyperparameters that have significant effects on the training process and model performance. The batch size determines how many examples you look at before making a weight update. Smaller batch sizes provide a regularizing effect and lower generalization error. The number of epochs determines how many times the entire dataset is passed forward and backward through the neural network.
- **PyTorch Usage**: In PyTorch, the batch size is set when creating a DataLoader object (`torch.utils.data.DataLoader`), which also handles the shuffling and organization of the data. The number of epochs is controlled manually in the training loop.

### 9. Learning Rate
- The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing a learning rate is critical as it makes the model converge too slowly or diverge when too large.
- **PyTorch Implementation**: In PyTorch, the learning rate is set when defining an optimizer, e.g., `torch.optim.SGD(model.parameters(), lr=0.01)`. PyTorch also provides learning rate schedulers (e.g., `torch.optim.lr_scheduler`), which adjust the learning rate during training, typically reducing it according to a pre-defined schedule or in response to model performance.

### 10. Data Preprocessing
- Data preprocessing involves transforming raw data into an understandable format. In deep learning, it often includes normalization (scaling input data to a standard range), and in the case of images, augmentation techniques such as rotations, scaling, and flipping can be used to artificially expand the dataset.
- **PyTorch Tools**: For image data, PyTorch offers the `torchvision.transforms` module, which provides common image transformations. For other data types, custom transformations can be applied to the dataset before passing it to a DataLoader.


## Table of Contents

* [Data Setup](#data-setup)
* [MLP Regression](#mlp-regression)
* [MLP Classification](#mlp-classification)

### Imports and Global Settings

In [1]:
import datetime
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    r2_score,
    mean_absolute_error,
    accuracy_score,
    precision_score,
)

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

# Pandas Settings
pd.set_option("display.max_columns", 1000)
pd.set_option("display.max_rows", 1000)
pd.options.display.max_info_columns = 200
pd.options.display.precision = 5

### Load Data

In [2]:
df_2021_2022 = pd.read_csv("../data/nba_ai/cleaned_data_2021-2022.csv")
df_2022_2023 = pd.read_csv("../data/nba_ai/cleaned_data_2022-2023.csv")

<a name="data-setup"></a>

## Data Preparation

### Train Test Split

In [3]:
def prepare_datasets(train_df, cls_target, reg_target, test_df=None, test_size=0.3):
    """
    Prepares datasets for training and testing for both classification and regression targets,
    ensuring time-sensitive splitting based on a 'date' column.

    Parameters:
    train_df (DataFrame): The training dataframe.
    cls_target (str): The name of the classification target column.
    reg_target (str): The name of the regression target column.
    test_df (DataFrame, optional): An optional testing dataframe. If not provided, a portion of the training data is used.
    test_size (float, optional): The proportion of the dataset to include in the test split (if test_df is not provided).

    Returns:
    tuple: A tuple containing six dataframes:
           (X_train, X_test, y_train_cls, y_test_cls, y_train_reg, y_test_reg).
    """

    # Sort the dataframe based on the 'date' column
    train_df = train_df.sort_values(by="date")

    # If a test dataframe is not provided, split the training dataframe
    if test_df is None:
        X_train, X_test, y_train, y_test = train_test_split(
            train_df.drop([cls_target, reg_target], axis=1),
            train_df[[cls_target, reg_target]],
            test_size=test_size,
            shuffle=False,  # Important to maintain time order
        )
    else:
        # If a test dataframe is provided, ensure it is also sorted by date
        test_df = test_df.sort_values(by="date")

        # Use provided test dataframe and separate features and targets
        X_train = train_df.drop([cls_target, reg_target], axis=1)
        y_train = train_df[[cls_target, reg_target]]
        X_test = test_df.drop([cls_target, reg_target], axis=1)
        y_test = test_df[[cls_target, reg_target]]

    # Separate classification and regression targets
    y_train_cls = y_train[[cls_target]]
    y_train_reg = y_train[[reg_target]]
    y_test_cls = y_test[[cls_target]]
    y_test_reg = y_test[[reg_target]]

    return X_train, X_test, y_train_cls, y_test_cls, y_train_reg, y_test_reg

In [4]:
X_train, X_test, y_train_cls, y_test_cls, y_train_reg, y_test_reg = prepare_datasets(
    df_2021_2022, "CLS_TARGET", "REG_TARGET", test_df=df_2022_2023
)

### Features

In [5]:
betting_feature_set = [
    "home_opening_spread",
    "opening_total",
    "home_moneyline",
    "road_moneyline",
]

base_feature_set = [
    "day_of_season",
    "home_team_rest",
    "road_team_rest",
    "home_win_pct",
    "road_win_pct",
    "home_win_pct_l2w",
    "road_win_pct_l2w",
    "home_avg_pts",
    "road_avg_pts",
    "home_avg_pts_l2w",
    "road_avg_pts_l2w",
    "home_avg_oeff",
    "road_avg_oeff",
    "home_avg_oeff_l2w",
    "road_avg_oeff_l2w",
    "home_avg_deff",
    "road_avg_deff",
    "home_avg_deff_l2w",
    "road_avg_deff_l2w",
    "home_avg_eFG%",
    "road_avg_eFG%",
    "home_avg_eFG%_l2w",
    "road_avg_eFG%_l2w",
    "home_avg_TOV%",
    "road_avg_TOV%",
    "home_avg_TOV%_l2w",
    "road_avg_TOV%_l2w",
    "home_avg_ORB%",
    "road_avg_ORB%",
    "home_avg_ORB%_l2w",
    "road_avg_ORB%_l2w",
    "home_avg_FT%",
    "road_avg_FT%",
    "home_avg_FT%_l2w",
    "road_avg_FT%_l2w",
    "home_avg_pts_allowed",
    "road_avg_pts_allowed",
    "home_avg_pts_allowed_l2w",
    "road_avg_pts_allowed_l2w",
]

lineup_vectors = ["home_lineup_vector", "road_lineup_vector"]

In [None]:
features = base_feature_set

In [None]:
def flatten_vector_columns(df, vector_columns):
    """
    Flatten vector columns into separate feature columns.

    This function takes a DataFrame and a list of column names that store vector data as strings
    (typically after being read from a CSV file), and returns a new DataFrame where the vectors
    have been flattened into separate feature columns.

    Parameters:
    df (pandas.DataFrame): The input DataFrame.
    vector_columns (list): A list of column names in df that store vector data as strings.

    Returns:
    pandas.DataFrame: The DataFrame with vector columns flattened.
    """
    for column in vector_columns:
        if column not in df.columns:
            continue
        # Convert the string representation of the vector into a numpy array
        df[column] = df[column].apply(
            lambda x: np.array(x.strip("[]").replace("\n", " ").split(), dtype=float)
        )

        # Flatten the numpy array into separate columns
        vector_df = pd.DataFrame(df[column].tolist(), index=df.index)
        vector_df.columns = [f"{column}_{i}" for i in range(vector_df.shape[1])]

        # Drop the original vector column and concatenate the new DataFrame
        df = df.drop(column, axis=1)
        df = pd.concat([df, vector_df], axis=1)

    return df

In [None]:
X_train = X_train[features]
X_test = X_test[features]

In [None]:
# Flatten lineup vectors
X_train = flatten_vector_columns(X_train, lineup_vectors)
X_test = flatten_vector_columns(X_test, lineup_vectors)

### Combined Data

In [None]:
combined_train_df = pd.concat([X_train, y_train_cls, y_train_reg], axis=1)
combined_test_df = pd.concat([X_test, y_test_cls, y_test_reg], axis=1)

In [None]:
combined_train_df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1323 entries, 0 to 1322
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   day_of_season             1323 non-null   int64  
 1   home_team_rest            1323 non-null   int64  
 2   road_team_rest            1323 non-null   int64  
 3   home_win_pct              1323 non-null   float64
 4   road_win_pct              1323 non-null   float64
 5   home_win_pct_l2w          1323 non-null   float64
 6   road_win_pct_l2w          1323 non-null   float64
 7   home_avg_pts              1323 non-null   float64
 8   road_avg_pts              1323 non-null   float64
 9   home_avg_pts_l2w          1323 non-null   float64
 10  road_avg_pts_l2w          1323 non-null   float64
 11  home_avg_oeff             1323 non-null   float64
 12  road_avg_oeff             1323 non-null   float64
 13  home_avg_oeff_l2w         1323 non-null   float64
 14  road_avg

In [None]:
combined_test_df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1320 entries, 0 to 1319
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   day_of_season             1320 non-null   int64  
 1   home_team_rest            1320 non-null   int64  
 2   road_team_rest            1320 non-null   int64  
 3   home_win_pct              1320 non-null   float64
 4   road_win_pct              1320 non-null   float64
 5   home_win_pct_l2w          1320 non-null   float64
 6   road_win_pct_l2w          1320 non-null   float64
 7   home_avg_pts              1320 non-null   float64
 8   road_avg_pts              1320 non-null   float64
 9   home_avg_pts_l2w          1320 non-null   float64
 10  road_avg_pts_l2w          1320 non-null   float64
 11  home_avg_oeff             1320 non-null   float64
 12  road_avg_oeff             1320 non-null   float64
 13  home_avg_oeff_l2w         1320 non-null   float64
 14  road_avg

<a name="mlp-regression"></a>

## Multi-Layer Perceptron (MLP) - Regression

### Data Conversion

In [7]:
# Convert Pandas DataFrames to PyTorch Tensors
X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train_reg.values, dtype=torch.float32)

# Create a TensorDataset - this wraps tensors into a dataset
train_data = TensorDataset(X_train_tensor, y_train_tensor)

# DataLoader for batching and shuffling
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)

# Note: Shuffle is set to True for training data. For time-series data, consider the impact of shuffling.

### Model Definition


In [8]:
# Define a simple regression neural network
class RegressionModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RegressionModel, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.layer2(x)
        return x

In [9]:
# Instantiate the model
reg_mlp_model = RegressionModel(
    input_size=X_train.shape[1], hidden_size=5, output_size=1
)

### Model Training


In [10]:
# Define loss function and optimizer
loss_function = nn.MSELoss()
optimizer = optim.Adam(reg_mlp_model.parameters(), lr=0.001)

# Training loop
num_epochs = 100  # Set the number of epochs
for epoch in range(num_epochs):
    for inputs, targets in train_loader:
        optimizer.zero_grad()  # Zero the gradient buffers
        outputs = reg_mlp_model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward()  # Backpropagation
        optimizer.step()  # Update weights

    # Optional: Print the loss every few epochs
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item()}")

Epoch 0, Loss: 74.28823852539062
Epoch 10, Loss: 96.0931396484375
Epoch 20, Loss: 137.65530395507812
Epoch 30, Loss: 195.98065185546875
Epoch 40, Loss: 92.8561019897461
Epoch 50, Loss: 105.47020721435547
Epoch 60, Loss: 335.4338684082031
Epoch 70, Loss: 134.19989013671875
Epoch 80, Loss: 315.7101135253906
Epoch 90, Loss: 275.2518005371094


### Model Evaluation and Prediction

In [11]:
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)

# Disable gradient computation for evaluation and prediction
with torch.no_grad():
    train_predictions_reg = reg_mlp_model(X_train_tensor)
    test_predictions_reg = reg_mlp_model(X_test_tensor)

# Convert predictions to a NumPy array or Pandas Series for further evaluation
train_predictions_reg_np = train_predictions_reg.numpy()
test_predictions_reg_np = test_predictions_reg.numpy()

In [12]:
train_mae = mean_absolute_error(train_predictions_reg_np, y_train_reg)
train_r2 = r2_score(train_predictions_reg_np, y_train_reg)

test_mae = mean_absolute_error(test_predictions_reg_np, y_test_reg)
test_r2 = r2_score(test_predictions_reg_np, y_test_reg)

In [13]:
print(f"Train MAE: {train_mae:.2f}")
print(f"Train R2: {train_r2:.2f}")
print(f"Test MAE: {test_mae:.2f}")
print(f"Test R2: {test_r2:.2f}")

Train MAE: 11.81
Train R2: -30.65
Test MAE: 10.73
Test R2: -38.40


### Model Saving and Loading

In [14]:
problem_type = "Regression"
base_model = "MLP"
train_performance = round(train_mae, 2)
test_performance = round(test_mae, 2)

model_id = f"{problem_type}_{base_model}_{train_performance}_{test_performance}_{datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}"

model_id

'Regression_MLP_11.81_10.73_2024-01-03_15-50-13'

In [15]:
# Save the model state
torch.save(reg_mlp_model.state_dict(), f"../models/{model_id}.pth")

In [16]:
# To load the model, first initialize the model structure, then load the state
# reg_mlp_model = RegressionModel(input_size=10, hidden_size=5, output_size=1)
# reg_mlp_model.load_state_dict(torch.load(f"../models/{model_id}.pth"))
# reg_mlp_model.eval()  # Set the model to evaluation mode

<a name="mlp-classification"></a>

## Multi-Layer Perceptron (MLP) - Classification

### Data Conversion

In [17]:
# Convert Pandas DataFrames to PyTorch Tensors
X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train_cls.values, dtype=torch.float32)

# Create a TensorDataset - this wraps tensors into a dataset
train_data = TensorDataset(X_train_tensor, y_train_tensor)

# DataLoader for batching and shuffling
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)

# Note: Shuffle is set to True for training data. For time-series data, consider the impact of shuffling.

### Model Definition


In [18]:
class ClassificationModel(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(ClassificationModel, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(
            hidden_size, 1
        )  # Output size is 1 for binary classification

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = torch.sigmoid(
            self.layer2(x)
        )  # Sigmoid activation for binary classification
        return x

In [19]:
# Instantiate the model
cls_mlp_model = ClassificationModel(input_size=X_train.shape[1], hidden_size=5)

### Model Training


In [20]:
# Define loss function and optimizer for binary classification
loss_function = nn.BCELoss()
optimizer = optim.Adam(cls_mlp_model.parameters(), lr=0.001)

# Training loop
num_epochs = 100
for epoch in range(num_epochs):
    for inputs, targets in train_loader:
        optimizer.zero_grad()
        outputs = cls_mlp_model(inputs)
        loss = loss_function(outputs, targets)
        loss.backward()
        optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item()}")

Epoch 0, Loss: 0.7315127849578857
Epoch 10, Loss: 0.6890249848365784
Epoch 20, Loss: 0.690775990486145
Epoch 30, Loss: 0.6824485063552856
Epoch 40, Loss: 0.704181969165802
Epoch 50, Loss: 0.6825932860374451
Epoch 60, Loss: 0.7145680785179138
Epoch 70, Loss: 0.6818594336509705
Epoch 80, Loss: 0.6659519672393799
Epoch 90, Loss: 0.6734258532524109


### Model Evaluation and Prediction

In [21]:
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)

# Disable gradient computation for evaluation and prediction
with torch.no_grad():
    train_predictions_cls = cls_mlp_model(X_train_tensor)
    test_predictions_cls = cls_mlp_model(X_test_tensor)

In [22]:
# Assuming your model outputs probabilities for class 1
threshold = 0.5

# Convert probabilities to class labels based on the threshold
train_predictions_cls_np = train_predictions_cls.numpy() > threshold
test_predictions_cls_np = test_predictions_cls.numpy() > threshold

In [23]:
train_accuracy = accuracy_score(train_predictions_cls_np, y_train_cls)
train_precision = precision_score(train_predictions_cls_np, y_train_cls)

test_accuracy = accuracy_score(test_predictions_cls_np, y_test_cls)
test_precision = precision_score(test_predictions_cls_np, y_test_cls)

In [24]:
print(f"Train Accuracy: {train_accuracy:.2f}")
print(f"Train Precision: {train_precision:.2f}")
print(f"Test Accuracy: {test_accuracy:.2f}")
print(f"Test Precision: {test_precision:.2f}")

Train Accuracy: 0.52
Train Precision: 0.00
Test Accuracy: 0.48
Test Precision: 0.00


### Model Saving and Loading

In [25]:
problem_type = "Classification"
base_model = "MLP"
train_performance = round(train_mae, 2)
test_performance = round(test_mae, 2)

model_id = f"{problem_type}_{base_model}_{train_performance}_{test_performance}_{datetime.datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}"

model_id

'Classification_MLP_11.81_10.73_2024-01-03_15-50-20'

In [26]:
# Save the model state
torch.save(cls_mlp_model.state_dict(), f"../models/{model_id}.pth")

In [27]:
# To load the model, first initialize the model structure, then load the state
# cls_mlp_model = ClassificationModel(input_size=10, hidden_size=5, output_size=1)
# cls_mlp_model.load_state_dict(torch.load(f"../models/{model_id}.pth"))
# cls_mlp_model.eval()  # Set the model to evaluation mode