In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

%matplotlib inline

import seaborn as sns

In [None]:
sns.set_style("dark")

In [None]:
%load_ext autoreload
%autoreload 2
import chess_utility as cu

In [None]:
games = pd.read_csv('games_new_vars.csv')

## Getting the plan straight

After going through the grand experience of the EDA, you may have forgotten what kind of modelling wonders we're trying to do. So here's the dealio again. 

We imagine we're sitting down to play a chess match. Since we're both pretty bad ass, we've already racked up some kind of rating. We'll say you've been practicing more so you're now touting that higher rating. What's the chance that you'll end up winning? What if you're playing as white instead of black? What if the game has been going on for like a bajillion moves now?

Using Python's groovy machine learning libraries, we'll answer these dire questions. 

We'll get cracking on predicting wins and losses first (binary case) for the higher rated player. 

## Cooking up binary models

#### Normalization, response setting, and other params

Before fitting models, we will normalize our data. This matters because we intend on using multiple predictors, and our predictors could be on different scales. We'll knight these predictors with a big 'X' for a name. 

In [None]:
from sklearn.preprocessing import normalize
X = normalize(games[ [ 'abs_diff_rating', 'turns', 'white_higher_rated' ] ])

We set up a response and give it a wildly descriptive 'y'. 

In [None]:
y = games[ 'higher_rating_won' ]

Some of the algorithms below automatically randomize. So, if we're not careful, we could end up always getting different numbers for the model results. By setting a RANDOM STATE, we can pick a result and stick with throughout the analysis. Also, when we cross validate using K fold cross validation we will specify 10 folds through NUMBER FOLDS. 

In [None]:
RANDOM_STATE = 1
NUMBER_FOLDS = 10

#### Logistic regression

We will begin with a Logistic Regression and perform a stratifed 10 fold cross validation on the model. 

In [None]:
lg_confusion_matrix = cu.run_logistic_regression(NUMBER_FOLDS, X, y, RANDOM_STATE)
lg_results = cu.get_cm_results(lg_confusion_matrix)

#### Tree based methods

Next up, we fit a bag of trees to our data. In the case of fitting a bag of trees and a random forest, we will use 100 estimators. This means that we will be using 100 trees in each case to build the models. 

In [None]:
NUMBER_ESTIMATORS = 100

In [None]:
bag_confusion_matrix = cu.run_bag_trees(NUMBER_FOLDS, NUMBER_ESTIMATORS, RANDOM_STATE, X, y)
bag_results = cu.get_cm_results(bag_confusion_matrix)

In [None]:
r_forest_confusion_matrix = cu.run_random_forest(NUMBER_FOLDS, NUMBER_ESTIMATORS, RANDOM_STATE, X, y)
r_forest_results = cu.get_cm_results(r_forest_confusion_matrix)

#### Results readings

How did our beautiful models do? We can plot a confusion matrix for each model and gaze in wonder. 

In [None]:
cu.create_cm_plot( 'Logistic Regression', lg_confusion_matrix )

In [None]:
cu.create_cm_plot( 'Bag of Trees', bag_confusion_matrix )

In [None]:
cu.create_cm_plot( 'Random Forest', r_forest_confusion_matrix )

In case you're rusty on the magic of confusion matrices, recall that confusion matrices help us determine how well the model performed by comparing our guesses to the truth and nothing but the truth. After admiring the excellent shades of blue, you will probably notice the bottom right hand square in the matrices above. This square denotes an accuracte prediction for wins, and it looks like the models do very well when predicting wins. 

A whole tribe of metrics based on a confusion matrix exist. We'll acquire these then plot these to determine more specific model performance.

In [None]:
accuracy, precision, recall, fmeasure, specificity, negative_pv = cu.group_important_results(lg_results, bag_results, r_forest_results)

In [None]:
cu.create_specific_results_plot(r_forest_results, lg_results, bag_results, 3, 2) # 3 cols, 2 rows

We've got six different metrics to chew on here so put on your seat belt. 

- First, is accuracy. This is a general metric. It tells us how often the model correctly predicts the value. It's not hard to see that all models seem to hover at around 62% of accuracy. In other words, the model makes a good prediction (good boy!) 62% of the time. 

- Second, is precision. This tells us how often the model predics a win. Here, the Logistic Regression pulls ahead at 0.794 and says cheers to the other two models.  

- Third, is recall. This tells us how accurately we predicted the wins. Our models seem to hover at around 68%. 

- Fourth, is the famous fabulous fmeasure. This combines precision and recall into one spiffy measurement. In our case, it looks like most models however around 70%. 

- Fifth, is specificity. The specifics on specificity are simple. The measure let's us know stuff about losses. In particular, out of the times we predicted a loss how often were we right? And, goodness gracious the Logistic Regression just sags here at only 37% compared to the others at around 46%. 

- Finally, we got negative pv. This is short for negative predictive value, and this tells us what percentage of true losses our models caught. Most of the models do equally pitifully at around 52%.  

##### Conclusion
So what can we say? These models perform similarly across most measurements. The models do splendidly when they predict wins (Logistic Regression in particular), but they trip awkwardly when predicting losses (Logistic Regression in particular). 

We could also compare with the null rate. This measures the accuracy if we just predicted every response to the dominant class. Using different word words, this is the accuracy of a model that simply predicts a win no matter what. 

In [None]:
null_rate = np.abs( (y - 1 ).sum() ) / len( games )
null_rate

And it is cute, and encouragingly, worse than the accuracy of any of our models above. 

For those pursuing extra credit, we also show plots for each model packaged with its confusion matrix measurements.

In [None]:
cu.create_cumulative_results_plot(r_forest_results, lg_results, bag_results)

They look like they all pretty similar. 

## Beyond binary 

In this section, we go to town. We refit our tree based models with our response will as 'result'. Recall this encoded whether games were a loss, draw, or win for the higher rated player. We eschew trying a three class classification with a logistic regression 'cuz it isn't a popular choice. 

In [None]:
y = games['result']

#### Bag those Trees

We kick it off again a bag of trees. Here, we also run stratified 10 fold cross validation.  

In [None]:
bag_multi_confusion_matrix = cu.run_bag_trees(NUMBER_FOLDS, NUMBER_ESTIMATORS, RANDOM_STATE, X, y)

We also fit the random forest in exactly the same way. 

In [None]:
r_forest_multi_confusion_matrix = cu.run_random_forest(NUMBER_FOLDS, NUMBER_ESTIMATORS, RANDOM_STATE, X, y)

We can start by comparing the accuracy of our two models. 

In [None]:
r_forest_accuracy = cu.get_accuracy_three_class( r_forest_multi_confusion_matrix )
bag_accuracy = cu.get_accuracy_three_class( bag_multi_confusion_matrix )
accuracies = {'Random Forest': r_forest_accuracy, 'Bag of Trees': bag_accuracy}

In [None]:
fig, axs = plt.subplots(figsize=[5, 5], gridspec_kw={'wspace': 0.2})
cu.create_bar_results(accuracies, 'Accuracy', axs)
plt.ylim([0, 1])

Our hearts say 'bummer'. They classify correctly about 60% of the time. We, however, don't immediately lose hope since maybe more specific metrics will uplift us. 

In a three class confusion matrix, we usually consider precision and recall for each class. So, we'll join the herd and do so as well. 

##### Loss

In [None]:
cu.make_plot_multi_label( bag_multi_confusion_matrix, r_forest_multi_confusion_matrix, 0 )    # 0 for loss

Uncannily, the precision and recall is quite similar for the bag and forest. This considers the losses predicted by the model. So, when we predict a lost, we are right around 43% of the time (precision). But out of all the losses we should've predicted, we only got 48% (recall). 

##### Draw

In [None]:
cu.make_plot_multi_label( bag_multi_confusion_matrix, r_forest_multi_confusion_matrix, 1 )    # 1 for draw

Egad. Our models sink to new lows when trying to predict a draw between the two players. When we predict a draw, we are right about 13% of the time (precision). Also, out of all the draws we should've predicted, we only got 20% (recall). 

##### Winning

In [None]:
cu.make_plot_multi_label( bag_multi_confusion_matrix, r_forest_multi_confusion_matrix, 2 )    # 2 for win

Applauds wildly. Our models rise to new heights when predicting wins. When we predict a win, we are right around 74% of the time (precision). But out of all the wins we should've predicted, we only properly predicted 68% (recall). 

##### Conclusion

So, we can say our model wins at predicting wins, loses at predicting losses, and jumps off a cliff when trying to predict draws. 

## Looking back fondly

This wraps up our analysis. Although the models do well when predicting wins, further work will need to be done if we want them to win at predicting losses and draws. Also note how the bar graphs gave off orange and blue, just like my alma mater completely by accident. Or was it an accident?

## Now what

A recent one person survey said that 100% of me was thankful for your taking the time to peruse this document. 

You may be wondering what do now. I suggest watching dance videos on youtube. But say you wanted to see the next episode of this chess analysis, I would probably attempt to better the predictive models by: 

- Using more predictors (like the kinds of openings used). It's possible that other predictors could give more determining info about wins. 

- Getting my hands on more observations. This data set carried almost 20 000 records. What if tried this with 200 000 or 1 000 000 000 records. That would be cool.   

- Trying new models. There's a whole zoo of models out there, and we only visited the logistic regression and trees. Perhaps an additive GLM. Or a neural network. 

- It's possible that the models do poorly with losses and draws because not many losses and draws are present in the data. Perhaps bootstrapping loss and draw data would help the models. (I'm just freestyling here). 

## Cheers

I'm currently my own agent so if you liked this drop me a line and let me know whassup. Wishing you a splendid day.