# Results

In [1]:
import pandas as pd

In the modelling part, we have fit different models using different methodologies on different sets.  
We present here the results of the different approaches and discuss the elements that should guide our choice when selecting a model.  

Previously, we discussed the different criterion that will guide our choice. While maximising the accuracy is the main criterion of selection, we have to stay attentive to the complexity of the models. According to Occam's razor, the simplest solution is better in the case of similar accuracies. With the same idea, we aim at finding a good trade-off between the accuracy and the complexity in our analysis. Even if the accuracy is a bit bigger, we might still prefer a simpler, obviously depending on how big the differences are in terms of accuracy and complexity. We also compare the precision, recall and f1 scores that are obtained on average for all variants as well as these metrics for the "blitz" variant, which is an important one.  
Regarding the computaion time, we put the emphasis on the training time (the time needed to fit the already tuned models on the full training set). The tuning time is obviously highly dependent on the grid that was input in the tuning part, which could be in some cases reduced (especially in future iterations as we already got a sense of the parameters that appear to work well).  

Having these elements in mind, let's look at our results once again.

In [2]:
results_full = pd.read_pickle('results/results_full')
results_eng = pd.read_pickle('results/results_eng')
results_pca = pd.read_pickle('results/results_pca')
results_NN_5000 = pd.read_pickle('results/results_NN_5000')
results_nested_0 = pd.read_pickle('results/results_nested_0')
results_nested_1 = pd.read_pickle('results/results_nested_1')
results_nested_2 = pd.read_pickle('results/results_nested_2')
results_nested_3 = pd.read_pickle('results/results_nested_3')

## Full dataset

In [3]:
results_full

Unnamed: 0,Accuracy,Tuning time,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
SVM,0.881,2295.734,54.954,0.88,0.881,0.881,0.88,0.72,0.6,120.0
NN,0.854,1416.433,8.049,0.853,0.854,0.854,0.657,0.71,0.612,116.0
logistic,0.853,80.75,10.718,0.853,0.855,0.853,0.614,0.66,0.574,115.0
random forest,0.819,252.935,25.133,0.819,0.82,0.819,0.589,0.63,0.553,114.0
decision tree,0.742,7.227,0.28,0.739,0.737,0.742,0.532,0.54,0.524,103.0


Reminder best model parameters: {'C': 17.78279410038923, 'gamma': 0.0031622776601683794, 'kernel': 'rbf'}

## Engineered dataset

In [4]:
results_eng

Unnamed: 0,Accuracy,Tuning time,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
SVM,0.83,468.885,14.845,0.826,0.832,0.83,0.826,0.74,0.493,150.0
logistic,0.804,48.848,9.33,0.802,0.804,0.804,0.595,0.69,0.523,132.0
NN,0.803,1203.451,150.76,0.708,0.714,0.708,0.472,0.51,0.44,116.0
random forest,0.797,88.561,11.305,0.795,0.796,0.797,0.566,0.64,0.508,126.0
decision tree,0.708,6.128,0.053,0.708,0.714,0.708,0.472,0.51,0.44,116.0


## PCA on full dataset

In [5]:
results_pca

Unnamed: 0,Accuracy,Tuning time,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
SVM_89,0.876,633.61,2.348,0.874,0.874,0.876,0.642,0.7,0.593,118.0
logistic_89,0.849,82.249,3.762,0.849,0.851,0.849,0.613,0.65,0.58,112.0
SVM_26,0.839,223.375,1.083,0.839,0.841,0.839,0.642,0.7,0.593,118.0
random_forest,0.826,1630.714,62.678,0.684,0.692,0.679,0.442,0.44,0.444,99.0
random_forest_89,0.823,2118.101,184.197,0.826,0.831,0.823,0.612,0.63,0.594,106.0
SVM,0.686,144.454,1.669,0.694,0.71,0.686,0.485,0.56,0.427,131.0
NN,0.679,286.411,4.2,0.688,0.708,0.679,0.47,0.47,0.47,100.0
decision_tree,0.647,9.928,0.162,0.655,0.678,0.647,0.381,0.36,0.404,89.0
logistic,0.631,5.127,0.215,0.652,0.697,0.631,0.514,0.66,0.42,157.0


Reminder best model parameters: {'model__C': 31.622776601683793, 'model__gamma': 0.0031622776601683794, 'model__kernel': 'rbf'}

## Neural Network on 5000 samples per variant using keras

In [6]:
results_NN_5000

Unnamed: 0,Accuracy,Tuning time,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
NN_5000,0.854,3214.1,107.456,0.853,0.867,0.854,0.634,0.78,0.534,146.0


## Nested models

In [7]:
display("results_nested_0", results_nested_0, 
        "results_nested_1", results_nested_1, 
        "results_nested_2", results_nested_2, 
        "results_nested_3", results_nested_3)

'results_nested_0'

Unnamed: 0,Accuracy,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
SVM,0.733,69.633,0.734,0.743,0.733,0.468,0.52,0.426,122.0
logistic,0.728,69.167,0.726,0.728,0.728,0.459,0.5,0.424,118.0
random forest,0.714,71.404,0.714,0.72,0.714,0.714,0.53,0.408,130.0
NN,0.713,69.765,0.726,0.728,0.728,0.459,0.5,0.424,118.0


'results_nested_1'

Unnamed: 0,Accuracy,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
SVM,0.791,9.414,0.786,0.798,0.791,0.559,0.73,0.453,161.0
logistic,0.786,9.152,0.781,0.789,0.786,0.552,0.69,0.46,150.0
random forest,0.758,23.757,0.755,0.761,0.758,0.755,0.6,0.432,139.0
NN,0.751,33.932,0.781,0.789,0.786,0.552,0.69,0.46,150.0


'results_nested_2'

Unnamed: 0,Accuracy,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
random forest,0.773,28.365,0.773,0.786,0.773,0.773,0.7,0.47,149.0
SVM,0.772,37.948,0.772,0.788,0.772,0.568,0.69,0.483,143.0
logistic,0.766,20.664,0.765,0.778,0.766,0.577,0.64,0.525,122.0
NN,0.749,26.272,0.765,0.778,0.766,0.577,0.64,0.525,122.0


'results_nested_3'

Unnamed: 0,Accuracy,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
logistic,0.7,15.691,0.7,0.705,0.7,0.384,0.38,0.388,98.0
SVM,0.699,16.201,0.7,0.709,0.699,0.374,0.37,0.378,98.0
random forest,0.682,18.725,0.682,0.683,0.682,0.682,0.37,0.366,101.0
NN,0.659,16.296,0.7,0.705,0.7,0.384,0.38,0.388,98.0


# Interpretation

Regarding the different results, we have a clear  in terms of accuracy. The **SVM model that has been applied to the full dataset outperforms all the other models with an accuracy of more than 88%**. Considering the other scoring metrics, its **f1 score of 0.88 for blitz also slightly outperforms the other models** that have been tested. Only the precision of 0.72 for blitz is a bit below the precision of the SVM model that has been applied to the engineered dataset, which is of 0.74. But looking at the support of this last model, we see that blitz has been predicted in 150 cases out of 900 while only 100 blitz games are contained in the testing set. This is reflected in the recall of 0.493 which is very low, meaning that we have a high rate of false negatives. Similar comments can be made for the neural network ran on the bigger sample.  

The **main issue of this model is however its complexity**. While we could aim at reducing the tuning time by testing less parameters, the **training time is high**. Considering that we are working "only" on 8000 features in the training part, 55 sec of training time is quite a lot. Depending on the resources available and at the sample size, which might be billion of games, using such a model would not be an option.  

If we consider rather the trade-off between accuracy and complexity, the **SVM model with PCA using 89 components applied on the full dataset seems to be an excellent choice**. Indeed, the time needed to train this model is approximately 25 lower than the time needed to train the SVM models without PCA, while the other metrics are just a bit lower (approx 0.5% less for the accuracy).  
Therefore, depending on whether the user gives more importance to a better accuracy or to a model that would scale up easily, it **might be very interesting to apply first a PCA with 89 components and then use the SVM model**.   
  
Regarding the other models, none seems to perform as well or present trade-off close to the ones reached by the two SVM models. 

To sum up, two model clearly outperform the rest of the candidates and should be considered:  
- SVM on full dataset: outperforming other models in terms of accuracy and f1 scores, but complex  
- SVM + PCA on full dataset: excellent candidate as good accuracy and low training time  