## Comparing a Neural Network and Logistic Regression Approach to Predicting Game Outcomes

The idea below is simple. Take two variables: (1) the home team's adjusted net margin for the season and (2) the away team's adjusted net margin to predict the proabbility the home team will win. 

The main variables are the average margin of victory over the season, adjusted for strengths of opponents. This is not as straightforward as you think: early in the season there is no game data (eg Game 1) and for those early games I used an empirical bayes approach to create a best guess for each team's prior performance, and slowly update as the current season gets more data. By Game 20, only the current season's data is used (sort of a smooth updating process).

Then, I manually constructed a neural network with one layer that tries to best classify victories, and compare its performance to a simple logistic regression. ```NBAData_NeuralNet.R``` is quite thorough and well-commented, so hopefully every step is clear: I go through each step of the calculation (as custom functions) from instantiating parameters, to forward propogation, gradient computation, backpropogation to update weights, and iterations until there is some convergence or a certain number of iterations is reach. Then I run code to compare the out-of-sample and in-sample performance of each model.

I run the entire process below. Hopefully the outputs are clear.

In [1]:
#INPUT: folder where git repository lives on your computer
setwd("/Users/To/GitHub/NBADataProject/R/")
suppressWarnings({source("NBAData_NeuralNet.R")}) #gets data from public dropbox link with my data

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



STEP 1: Training Neur Net Model 

Iteration 100  | loss:  0.6106747  | converge:  3.768604e-06 
Iteration 200  | loss:  0.6104923  | converge:  8.706614e-07 
Iteration 300  | loss:  0.6116609  | converge:  0.0001735078 
Iteration 400  | loss:  0.611166  | converge:  7.155492e-06 
Iteration 500  | loss:  0.6110971  | converge:  7.774001e-06 
Iteration 600  | loss:  0.6110845  | converge:  7.71363e-06 
Iteration 700  | loss:  0.6110771  | converge:  6.994936e-06 
Iteration 800  | loss:  0.6110714  | converge:  6.237015e-06 
Iteration 900  | loss:  0.611067  | converge:  5.494937e-06 
Iteration 1000  | loss:  0.6110639  | converge:  4.75953e-06 
Iteration finished at  1000  | loss:  0.6110639  no converge 

STEP 2a: Confusion Matrix for NN (out of sample test) 

      y_predclass
y_test    0    1
     0   81  795
     1   31 1271



STEP 2b: Confusion Matrix for Logistic Reg (out of sample test) 

      logistic_predclass
y_test    0    1
     0  403  473
     1  260 1042


STEP 3: Calcul

### Discussion
This shows something I like to call the "kind-of law of small/medium samples". That is, when you have a strong intuition for the data, and what features will be important (in my view, a logistic regression linear in the average net margins is a pretty close approximation to the "true" data process), and the sample is not very big (~20,000 rows), then simple regressions tend to perform really well, and more advanced techniques can often have a lot of problems. And, of course, you get the interpretibility/efficiency of linear regression without all the opaqueness of more advanced models.

Of course, it is a matter of execution. In my example, I intentionally created a simple neural network (1 layer with 4 neurons) to show how quickly/easily these sorts of models over-fit the training data. They perform much better in-sample, but much worse out-of-sample. In this case I could literally replicate the logistic regression process in some form in the neural network, but have chosen not to (in other words, just compute the minimized loss function using gradient descent). You have to be very careful about selecting the number of weighting parameters and activation functions used, as well as the loss function (in this case regularization would go a long way). 

I could have also implemented a more efficient stochastic gradient descent, but that would simply require random sub-samples of the data rather than estimating it all at once for each iteration, and have chosen not to do so because it would be a trivial addition to the code.

Steps 2 and 3 compare my simple neural network to the logistic regression. The binary cross entropy on the out-of-sample test is much lower for logistic regression (.623 vs .647 for the neural network).

### Takeaways
I have done much more complicated versions of each (many more variables) and the difficulty of using the neural network scales with the number and complexity of the features you choose to include, but, of course, both produce fairly accurate results once you have cross-validated successfully. You should view this as a "proof of concept" that I know what underlies the neural network process. Any more complicated code scales in a similar manner, and of course I usually tend to utilize pre-existing packages in R and Python if the network I am fitting is relatively straightforward.