# NBA Games Predictor
### Author: Noah Chinitz (noahchinitz@gwu.edu)
### GitHub: NoahChinitzGWU

*A neural network that predicts if the home team won, or lost, in any previous NBA game. Created using Keras.*

---

**TensorBoard Block:** Run TensorBoard after running model.fit() to see previous run's accuracy and loss

---

In [None]:
%load_ext tensorboard

---

**Data Engineering Block:** Convert `games.csv` to a Pandas DataFrame and manipulate it.

**Current implementation:** Must remove Win/Loss Column as well as any non-numerical column (ex. team abbreviations). Moreover, we must create a label column which mirrors the Win/Loss column. We create our x and y arrays as well as testing and training data sets. Lastly, we must scale the data to unit variance.

---

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from datetime import datetime
from keras.models import Sequential
from keras.layers import Dense, Dropout
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# Read in games.csv
df = pd.read_csv('games.csv')

# Remove all non-numerical fields
fields = list(df.columns)
fields.remove('TEAM_ABBREVIATION_AWAY')
fields.remove('TEAM_ABBREVIATION_HOME')
fields.remove('WL_HOME')

# Create label column
df['label'] = df['WL_HOME']

# Convert a subsection of the DataFrame to a numpy array, removing any NaN values (samples), and convert the values of the fields to a list (features)
x = np.array(df[df['WL_HOME'].isin([0.0, 1.0])][fields].values.tolist()) # x[samples, features]
# Convert a subsection of the DataFrame to a numpy array, removing any NaN values, and convert the value to an integer (binary label)
y = np.array(df[df['WL_HOME'].isin([0.0, 1.0])]['label'].to_list()).astype(int).reshape(-1, 1)  # y[labels]

# Remove data mean and scale to unit variance (range=[-1, 1])
scaler = StandardScaler()
x[:, 1::] = scaler.fit_transform(x[:, 1::])

# Take 70% of data for training, 30% of data for validation
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42, shuffle=True)

# Pull out GAME_ID's so that results can be linked back to team names (Index of GAME_ID changes during shuffling)
train_ids = X_train[:, 0]
test_ids = X_test[:, 0]

# Remove GAME_ID's from set of features (not useful for training)
X_train = X_train[:, 1::]
X_test = X_test[:, 1::]

---

**Model Block:** Compile the model. 

**Current implementation:** Input -> Dense -> Dense -> Dropout -> Output. Using Adam optimizer with Binary Cross Entropy.

---

In [None]:
# Create the sequential fully-connected model
model = Sequential()
model.add(Dense(12, input_dim=19, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dropout(0.2)) # Add dropout to help reduce overfitting
model.add(Dense(1, activation='sigmoid')) # Add Sigmoid to ensure the output sums to one

# Compile the model using Adam and Binary Cross Entropy Loss
model.compile(
    optimizer=tf.optimizers.Adam(learning_rate=0.001),
    loss=tf.losses.BinaryCrossentropy(),
    metrics=['accuracy'],
)

---

**Model Fit Block:** Run our model with the engineered data (as well as log all data for TensorBoard)

**Current implementation:** Create a log directory for the model and then fit it.

---

In [4]:
# Create the log directory for TensorBoard
log_dir = "logs/fit/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tb_callback = tf.keras.callbacks.TensorBoard(log_dir='logs/')

# Fit the model, remembering to add callbacks for TensorBoard
model.fit(
    X_train,
    y_train,
    validation_data=(X_test,y_test),
    epochs=10,
    batch_size=4,
    callbacks=[tb_callback]
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f9b194a2850>

---

**References:**  
*[Building our first neural network in keras](https://towardsdatascience.com/building-our-first-neural-network-in-keras-bdc8abbc17f5)*  
*[Model training APIs](https://keras.io/api/models/model_training_apis/)*  
*[Probabilistic losses](https://keras.io/api/losses/probabilistic_losses/)*  
*[Training and evaluation with the built-in methods](https://www.tensorflow.org/guide/keras/train_and_evaluate#other_input_formats_supported)*  
*[sklearn.model_selection.train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)*  
*[sklearn.metrics.confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)*  
*[Markdown Cheat Sheet](https://www.markdownguide.org/cheat-sheet/)*  
*[Getting started with TensorBoard](https://www.tensorflow.org/tensorboard/get_started)*  
*[sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)*  

---