# Summary of the methodology and outcomes:

### Methodology

The idea here is to tune models using a nested strategy described as follows:  
1. Creating "groups of variants", tuning the models and predicting the group of variant of the testing set (+ iteratively "tuning" the groups of variants, testing different combinations)    
2. Splitting the testing set into subsets based on the previous prediction  
3. Predicting the variant for each of the subsets and evaluating the accuracy  

The rational behind this idea is to try to reduce the computational complexity by first using only the engineered features. As we group the variants together, we can expect the engineered features to already do a good job in predicting these groups and therefore to reduce the complexity.  
In a second step we use all the features to predict the variant. As we are working with subsets of data (1 subset for each group of data), we can also expect the complexity of the models to be lower than when using all the variants at once.  

As we do not know which variants should be grouped together to maximise the likelihood of accurate predictions, we consider the groups of variants in the same way as we consider hyperparameters.  
More precisely, we test groups of variants and change them iteratively according to the accuracy obtained.  
In this notebook, we present and comment one option. Other iterations based on other groups are presented in the notebooks "2.3.X Capstone project - Nested models - Copy.X".  

### Outcomes

The accuracy is of more than 90% for predicting the groups of variants (baseline = 44.4%) that have been created in the first step of the process. However, the models struggle to predict the variants within these different groups and the overall accuracy for predicting the final variant (our objective) is clearly less than when predicting directly the variant without a nested strategy.  

---
# Nested Models

In [1]:
import pandas as pd
import numpy as np
import time
import matplotlib.pyplot as plt
import graphviz
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.neighbors import NearestNeighbors
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_validate
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [2]:
# loading both sets
full_df = pd.read_pickle('datasets/full_df')
df_feat_eng = pd.read_pickle('datasets/df_feat_eng')

We can create our groups of variants for this first iteration. The idea is to split the data first in terms of difficulty: 

In [3]:
variants = ['antichess', 'atomic', 'blitz', 'chess960', 'crazyhouse', 'horde', 'kingOfTheHill', 'racingKings', 'threeCheck']
var_grps = ['easy_grp', 'middle_grp', 'hard_grp']

# creating the groups of variants
easy_grp = ['horde', 'racingKings']
middle_grp = ['chess960', 'antichess', 'atomic', 'crazyhouse']
hard_grp = [e for e in variants if e not in [item for sublist in [easy_grp, middle_grp] for item in sublist]] # the remaining variants

As we have a group with 4 variants, the baseline for this first step would be to predict always "middle_group", and the accuracy would be of 4/9 = 44.4%. 

In [4]:
df_feat_eng.loc[df_feat_eng['variant'].isin(easy_grp), 'var_grp'] = 'easy_grp'
df_feat_eng.loc[df_feat_eng['variant'].isin(middle_grp), 'var_grp'] = 'middle_grp'
df_feat_eng.loc[df_feat_eng['variant'].isin(hard_grp), 'var_grp'] = 'hard_grp'

In [5]:
df_feat_eng['var_grp'].value_counts()

middle_grp    4000
hard_grp      3000
easy_grp      2000
Name: var_grp, dtype: int64

# Creating Train/Validation/Test set

In [6]:
X_tr, X_te, y_tr, y_te = train_test_split(
     df_feat_eng.drop(['var_grp'], axis=1), df_feat_eng['var_grp'], test_size=1/10, random_state=0, stratify =  df_feat_eng['variant'])

# X_tr, X_va, y_tr, y_va = train_test_split(
#    X_tr, y_tr, test_size=1/4, random_state=1, stratify = y_tr)

# cross validation strategy
kfold = KFold(
    n_splits=5, 
    random_state=0) 

In [7]:
print("Shape X_tr:" , X_tr.shape, 
      "\nShape X_te:", X_te.shape,      
      "\nShape y_tr:" , y_tr.shape, 
      "\nShape y_te:" , y_te.shape)

Shape X_tr: (8100, 30) 
Shape X_te: (900, 30) 
Shape y_tr: (8100,) 
Shape y_te: (900,)


In [8]:
y_tr.value_counts()

middle_grp    3600
hard_grp      2700
easy_grp      1800
Name: var_grp, dtype: int64

In [9]:
y_te.value_counts()

middle_grp    400
hard_grp      300
easy_grp      200
Name: var_grp, dtype: int64

We do not use the variant as dependent variable, but the group of variant. We therefore drop the variant for the moment, but save it as we will have to use it later.

In [10]:
tr_variant = X_tr['variant']
X_tr = X_tr.drop(['variant'],axis=1)

In [11]:
te_variant = X_te['variant']
X_te = X_te.drop(['variant'],axis=1)

In [12]:
X_tr.head()

Unnamed: 0,nb_moves,nb_K,nb_Q,nb_R,nb_B,nb_N,nb_P,nb_k,nb_q,nb_r,...,nb_pawns_ext_center,nb_pawns_third_row_c_f,nb_pawns_b_g_col,nb_pieces_center,nb_pieces_ext_center,nb_pieces_third_row_c_f,nb_pieces_b_g_col,knight_squares,king_castle_squares,nb_checks
1743,10,1,1,2,2,1,7,1,1,2,...,0,2,0,1,0,2,1,1,0,1
7772,10,0,0,0,0,0,32,1,1,2,...,4,3,6,0,0,0,0,1,0,0
8400,10,1,1,1,0,2,0,1,0,1,...,0,0,0,0,0,0,1,1,0,0
2760,10,1,1,2,1,2,7,1,1,2,...,0,3,2,0,2,1,0,1,0,0
7741,10,0,0,0,0,0,32,1,1,2,...,3,2,6,0,0,1,0,0,0,0


# Modelling

# Simple Decision Tree

In [13]:
# model 
model = DecisionTreeClassifier()

# grid search
grid = {
    'criterion': ['gini','entropy'], 
    'max_depth': [2,5,8,10,12,15,20,50,100,150]
}

# grid-search + CV
dt_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1,
    return_train_score=True)

In [14]:
start_time = time.time()
dt_cv.fit(X_tr, y_tr)
dt_tune_time = time.time() - start_time

Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    2.0s
[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed:    2.9s finished


Next, we look at the best parameters that lead to the best accuracy.

In [15]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(dt_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), dt_cv.best_params_, round(dt_tune_time, 1)))

Unnamed: 0,param_criterion,param_max_depth,mean_test_score,std_test_score
13,entropy,10,0.807531,0.001532
12,entropy,8,0.805679,0.002388
3,gini,10,0.805309,0.004427
14,entropy,12,0.804815,0.005156
15,entropy,15,0.804198,0.008249
4,gini,12,0.803827,0.00636
2,gini,8,0.803333,0.003064
5,gini,15,0.8,0.011275
19,entropy,150,0.79716,0.005557
16,entropy,20,0.795309,0.007516


We can expect an accuracy close to 80.8% on the testing set with the following parameters: 
{'criterion': 'entropy', 'max_depth': 10}. 
The tuning time is 3.0 sec.


In [16]:
dt = DecisionTreeClassifier(
    **dt_cv.best_params_, random_state=0)

In [17]:
start_time = time.time()
dt.fit(X_tr, y_tr)
dt_train_time = time.time() - start_time

In [18]:
dt_accuracy = dt.score(X_te, y_te)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(dt_accuracy*100))

Accuracy on Testing Set: accuracy: 84.9%.


In [19]:
dt_conf=pd.DataFrame(confusion_matrix(dt.predict(X_te), y_te), index= var_grps, columns=var_grps)
round(dt_conf / dt_conf.astype(np.float).sum(axis=1),3)

Unnamed: 0,easy_grp,middle_grp,hard_grp
easy_grp,0.995,0.0,0.003
middle_grp,0.0,0.731,0.281
hard_grp,0.0,0.109,0.886


In [20]:
dt_metrics=pd.DataFrame(classification_report(dt.predict(X_te), y_te, output_dict=True)).T.round(3)
dt_metrics

Unnamed: 0,f1-score,precision,recall,support
easy_grp,0.998,1.0,0.995,201.0
hard_grp,0.795,0.87,0.731,357.0
middle_grp,0.817,0.758,0.886,342.0
micro avg,0.849,0.849,0.849,900.0
macro avg,0.87,0.876,0.871,900.0
weighted avg,0.848,0.856,0.849,900.0


# Logistic Regression

In [21]:
# Define our steps
model = LogisticRegression()

# grid 
grid = {
    'C': np.logspace(-3, 4, 8),
}

# Grid search
log_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1,
    return_train_score=True)

In [22]:
start_time = time.time()
log_cv.fit(X_tr, y_tr)
logistic_tune_time = time.time() - start_time

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    3.1s finished


In [23]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(log_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), log_cv.best_params_, round(logistic_tune_time, 1)))

Unnamed: 0,param_C,mean_test_score,std_test_score
6,1000.0,0.807284,0.006432
7,10000.0,0.807284,0.006432
4,10.0,0.80716,0.006118
5,100.0,0.807037,0.006217
3,1.0,0.805185,0.006847
2,0.1,0.801605,0.00731
1,0.01,0.795062,0.00666
0,0.001,0.777654,0.006141


We can expect an accuracy close to 80.7% on the testing set with the following parameters: 
{'C': 1000.0}. 
The tuning time is 3.5 sec.


In [24]:
logistic = LogisticRegression(
    **log_cv.best_params_, random_state=0)

In [25]:
start_time = time.time()
logistic.fit(X_tr, y_tr)
logistic_train_time = time.time() - start_time



In [26]:
logistic_accuracy = logistic.score(X_te, y_te)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(logistic_accuracy*100))

Accuracy on Testing Set: accuracy: 80.6%.


In [27]:
logistic_conf=pd.DataFrame(confusion_matrix(logistic.predict(X_te), y_te), index=var_grps, columns=var_grps)/100
round(logistic_conf / logistic_conf.astype(np.float).sum(axis=1),3)

Unnamed: 0,easy_grp,middle_grp,hard_grp
easy_grp,1.0,0.0,0.0
middle_grp,0.0,0.713,0.206
hard_grp,0.0,0.311,0.776


In [28]:
logistic_metrics=pd.DataFrame(classification_report(logistic.predict(X_te), y_te, output_dict=True)).T.round(3)
logistic_metrics

Unnamed: 0,f1-score,precision,recall,support
easy_grp,1.0,1.0,1.0,200.0
hard_grp,0.705,0.697,0.713,293.0
middle_grp,0.783,0.79,0.776,407.0
micro avg,0.806,0.806,0.806,900.0
macro avg,0.829,0.829,0.83,900.0
weighted avg,0.806,0.806,0.806,900.0


# Random Forest

In [29]:
# Define our steps
model = RandomForestClassifier()

# grid 
grid = {'n_estimators': [20,100,500,750,1000],
               'max_depth': [5,10,20,50,75,100]
       } 

# Grid search
rf_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1)

In [30]:
start_time = time.time()
rf_cv.fit(X_tr, y_tr)
rf_tune_time = time.time() - start_time

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  46 tasks      | elapsed:   18.2s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:  1.4min finished


In [31]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(rf_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), rf_cv.best_params_, round(rf_tune_time, 1)))



Unnamed: 0,param_n_estimators,param_max_depth,mean_test_score,std_test_score
14,1000,20,0.862593,0.005247
17,500,50,0.862346,0.004451
13,750,20,0.862222,0.002853
12,500,20,0.862099,0.002124
19,1000,50,0.861975,0.005679
29,1000,100,0.861852,0.005081
27,500,100,0.861481,0.003089
24,1000,75,0.861481,0.004031
28,750,100,0.861235,0.005096
22,500,75,0.860864,0.005361


We can expect an accuracy close to 86.3% on the testing set with the following parameters: 
{'max_depth': 20, 'n_estimators': 1000}. 
The tuning time is 87.6 sec.


In [32]:
rf = RandomForestClassifier(
    **rf_cv.best_params_, random_state=0)

In [33]:
start_time = time.time()
rf.fit(X_tr, y_tr)
rf_train_time = time.time() - start_time

In [34]:
rf_accuracy = rf.score(X_te, y_te)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(rf_accuracy*100))

Accuracy on Testing Set: accuracy: 88.0%.


In [35]:
rf_conf=pd.DataFrame(confusion_matrix(rf.predict(X_te), y_te), index=var_grps, columns=var_grps)/100
round(rf_conf / rf_conf.astype(np.float).sum(axis=1),3)

Unnamed: 0,easy_grp,middle_grp,hard_grp
easy_grp,1.0,0.0,0.0
middle_grp,0.0,0.789,0.19
hard_grp,0.0,0.114,0.897


In [36]:
rf_metrics=pd.DataFrame(classification_report(rf.predict(X_te), y_te, output_dict=True)).T.round(3)
rf_metrics

Unnamed: 0,f1-score,precision,recall,support
easy_grp,1.0,1.0,1.0,200.0
hard_grp,0.829,0.873,0.789,332.0
middle_grp,0.859,0.825,0.897,368.0
micro avg,0.88,0.88,0.88,900.0
macro avg,0.896,0.899,0.895,900.0
weighted avg,0.879,0.882,0.88,900.0


# SVM

In [37]:
# Define our steps
model = SVC()

# grid 
grid = {'C': np.logspace(-0,5,5), # others sets of param were tested, higher C generally performed better
               'kernel': ['sigmoid', 'rbf'], # computational performance of linear kernel too bad to be considered as a good candidate
              'gamma': np.logspace(-5,0,5) # others sets of parameters were tested, lower gammas generally performed better
       }  

# Grid search
svm_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1)

In [38]:
start_time = time.time()
svm_cv.fit(X_tr, y_tr)
svm_tune_time = time.time() - start_time

Fitting 5 folds for each of 50 candidates, totalling 250 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:   26.9s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:  2.2min
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:  5.1min finished


In [39]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(svm_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), svm_cv.best_params_, round(svm_tune_time, 1)))



Unnamed: 0,param_C,param_kernel,param_gamma,mean_test_score,std_test_score
35,5623.41,rbf,0.00316228,0.904691,0.004358
25,316.228,rbf,0.00316228,0.895062,0.005339
43,100000.0,rbf,0.000177828,0.893333,0.003813
17,17.7828,rbf,0.0562341,0.88321,0.003059
45,100000.0,rbf,0.00316228,0.882593,0.002983
7,1.0,rbf,0.0562341,0.87963,0.003219
27,316.228,rbf,0.0562341,0.862593,0.004563
37,5623.41,rbf,0.0562341,0.861728,0.004468
47,100000.0,rbf,0.0562341,0.861728,0.004468
33,5623.41,rbf,0.000177828,0.86,0.003322


We can expect an accuracy close to 90.5% on the testing set with the following parameters: 
{'C': 5623.413251903491, 'gamma': 0.0031622776601683794, 'kernel': 'rbf'}. 
The tuning time is 317.5 sec.


In [40]:
svm = SVC(
    **svm_cv.best_params_, random_state=0, probability=True
)

In [41]:
start_time = time.time()
svm.fit(X_tr, y_tr)
svm_train_time = time.time() - start_time

In [42]:
svm_accuracy = svm.score(X_te, y_te)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(svm_accuracy*100))

Accuracy on Testing Set: accuracy: 92.0%.


In [43]:
svm_conf=pd.DataFrame(confusion_matrix(svm.predict(X_te), y_te), index=var_grps, columns=var_grps)/100
round(svm_conf / svm_conf.astype(np.float).sum(axis=1),3)

Unnamed: 0,easy_grp,middle_grp,hard_grp
easy_grp,1.0,0.0,0.0
middle_grp,0.0,0.841,0.145
hard_grp,0.0,0.057,0.948


In [44]:
svm_metrics=pd.DataFrame(classification_report(svm.predict(X_te), y_te, output_dict=True)).T.round(3)
svm_metrics

Unnamed: 0,f1-score,precision,recall,support
easy_grp,1.0,1.0,1.0,200.0
hard_grp,0.886,0.937,0.841,334.0
middle_grp,0.906,0.868,0.948,366.0
micro avg,0.92,0.92,0.92,900.0
macro avg,0.931,0.935,0.93,900.0
weighted avg,0.92,0.923,0.92,900.0


# Neural Network with Sklearn

In [45]:
# Define our steps
model = MLPClassifier()

# grid 
grid = {'hidden_layer_sizes': [5,10,15,20,50,100],
        'alpha': np.logspace(-5,5,5),
        'solver': ['adam', 'sgd'],
        'activation': ['logistic', 'relu'],
        'early_stopping': [True]
       }  

# Grid search
nnet_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1)

In [46]:
start_time = time.time()
nnet_cv.fit(X_tr, y_tr)
nnet_tune_time = time.time() - start_time

Fitting 5 folds for each of 120 candidates, totalling 600 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    6.6s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   47.0s
[Parallel(n_jobs=-1)]: Done 434 tasks      | elapsed:  1.4min
[Parallel(n_jobs=-1)]: Done 600 out of 600 | elapsed:  1.7min finished


In [47]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(nnet_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), nnet_cv.best_params_, round(nnet_tune_time, 1)))



Unnamed: 0,param_hidden_layer_sizes,param_alpha,param_solver,param_activation,param_early_stopping,mean_test_score,std_test_score
70,100,1e-05,adam,relu,True,0.846173,0.011669
82,100,0.00316228,adam,relu,True,0.831235,0.011664
94,100,1.0,adam,relu,True,0.824691,0.015518
92,50,1.0,adam,relu,True,0.819506,0.005084
80,50,0.00316228,adam,relu,True,0.818395,0.018417
64,15,1e-05,adam,relu,True,0.812346,0.005045
68,50,1e-05,adam,relu,True,0.811728,0.008868
74,10,0.00316228,adam,relu,True,0.810247,0.00906
90,20,1.0,adam,relu,True,0.809506,0.009873
22,100,0.00316228,adam,logistic,True,0.809136,0.011531


We can expect an accuracy close to 84.6% on the testing set with the following parameters: 
{'activation': 'relu', 'alpha': 1e-05, 'early_stopping': True, 'hidden_layer_sizes': 100, 'solver': 'adam'}. 
The tuning time is 104.8 sec.


In [48]:
nnet = MLPClassifier(
    **nnet_cv.best_params_, random_state=0)

In [49]:
start_time = time.time()
nnet.fit(X_tr, y_tr)
nnet_train_time = time.time() - start_time

In [50]:
nnet_accuracy = nnet.score(X_te, y_te)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(nnet_accuracy*100))

Accuracy on Testing Set: accuracy: 87.1%.


In [51]:
nnet_conf=pd.DataFrame(confusion_matrix(nnet.predict(X_te), y_te), index=var_grps, columns=var_grps)/100
round(nnet_conf / nnet_conf.astype(np.float).sum(axis=1),3)

Unnamed: 0,easy_grp,middle_grp,hard_grp
easy_grp,1.0,0.0,0.0
middle_grp,0.0,0.777,0.201
hard_grp,0.0,0.127,0.886


In [52]:
nnet_metrics=pd.DataFrame(classification_report(nnet.predict(X_te), y_te, output_dict=True)).T.round(3)
nnet_metrics

Unnamed: 0,f1-score,precision,recall,support
easy_grp,1.0,1.0,1.0,200.0
hard_grp,0.816,0.86,0.777,332.0
middle_grp,0.849,0.815,0.886,368.0
micro avg,0.871,0.871,0.871,900.0
macro avg,0.888,0.892,0.888,900.0
weighted avg,0.871,0.873,0.871,900.0


# Intermediary results

In [53]:
dt_scores_avg=list(dt_metrics.loc['weighted avg',:])[:3]
#dt_scores_blitz=list(dt_metrics.loc['blitz',:])
logistic_scores_avg=list(logistic_metrics.loc['weighted avg',:])[:3]
#logistic_scores_blitz=list(logistic_metrics.loc['blitz',:])
rf_scores_avg=list(rf_metrics.loc['weighted avg',:])[:3]
#rf_scores_blitz=list(rf_metrics.loc['blitz',:])
svm_scores_avg=list(svm_metrics.loc['weighted avg',:])[:3]
#svm_scores_blitz=list(svm_metrics.loc['blitz',:])
nnet_scores_avg=list(nnet_metrics.loc['weighted avg',:])[:3]
#nnet_scores_blitz=list(nnet_metrics.loc['blitz',:])

In [54]:
results_full=pd.DataFrame({
              'decision tree': [dt_accuracy, dt_tune_time, dt_train_time, dt_scores_avg[0], dt_scores_avg[1], dt_scores_avg[2]],
              'logistic': [logistic_accuracy, logistic_tune_time, logistic_train_time, logistic_scores_avg[0], logistic_scores_avg[1], logistic_scores_avg[2]],
              'random forest': [rf_accuracy, rf_tune_time, rf_train_time, rf_scores_avg[0], rf_scores_avg[1], rf_scores_avg[2]],
              'SVM': [svm_accuracy, svm_tune_time, svm_train_time, svm_scores_avg[0], svm_scores_avg[1], svm_scores_avg[2]],
              'NN': [nnet_accuracy, nnet_tune_time, nnet_train_time, nnet_scores_avg[0], nnet_scores_avg[1], nnet_scores_avg[2]],
}, index = ['Accuracy', 'Tuning time', 'Training time', 'f1_avg', 'precision_avg', 'recall_avg']).round(3).T.sort_values('Accuracy', ascending=False)
results_full

Unnamed: 0,Accuracy,Tuning time,Training time,f1_avg,precision_avg,recall_avg
SVM,0.92,317.461,58.084,0.92,0.923,0.92
random forest,0.88,87.609,5.164,0.879,0.882,0.88
NN,0.871,104.819,1.079,0.871,0.873,0.871
decision tree,0.849,2.954,0.03,0.848,0.856,0.849
logistic,0.806,3.472,0.317,0.806,0.806,0.806


We have made our first iteration and have come up with predictions of the groups of variants.  
Obviously, as we are predicting groups of variants, the accuracy of the different models is clearly higher than the one that we have observed previously when working on the individual variants.   
SVM seems to perform much better than the other models, but has a much bigger training time, which is problematic. As the accuracy is clearly bigger than the second best model, we still go for an svm in this first step. 

Now, for these different groups, we are going to predict the variant that has been played.   

We therefore split the original dataframe into three different subsets according to the group of variant that the observation belongs (for the training set). For the testing set, the split will be done according to the prediction of the group of variant that we got in the first step.  
As we are working on subset of data and therefore on less samples, we can use more complex models in this second step. The independent features that are now used are the ones derived from the "full dataframe".  
The data split can be summarized in the following ways:  
- X_tr --> all the features split into 3 sets  
    - "simple" set containing all variants that have been described as "simple"  
    - "middle" set containing all variants that have been described as "middle"  
    - "hard" set containing all variants that have been described as "hard"   
- y_tr --> the variant  
- X_te --> all the features split into 3 sets
    - "simple" set containing all variants that have been **predicted** as "simple"  
    - "middle" set containing all variants that have been **predicted** as "middle"  
    - "hard" set containing all variants that have been **predicted** as "hard"  
- y_te --> the variant that we want to predict and that will help to evaluate the accuracy of the models  

# 2nd step

We first have to select a first model to predict the different variants. Here, we select simply the one that maximised the accuracy, as the training time has been quite low previously.  
We train this model with the best params found previously.  

In [55]:
model_step1 = SVC(
    **svm_cv.best_params_, random_state=0)

In [56]:
model_step1.fit(X_tr, y_tr)

SVC(C=5623.413251903491, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.0031622776601683794,
  kernel='rbf', max_iter=-1, probability=False, random_state=0,
  shrinking=True, tol=0.001, verbose=False)

As explained above, we create our new dataset of interest by splitting the data accordingly.  

In [57]:
X_tr['var_grp'] = y_tr # adding the "variant group" to the training set
X_tr.head()

Unnamed: 0,nb_moves,nb_K,nb_Q,nb_R,nb_B,nb_N,nb_P,nb_k,nb_q,nb_r,...,nb_pawns_third_row_c_f,nb_pawns_b_g_col,nb_pieces_center,nb_pieces_ext_center,nb_pieces_third_row_c_f,nb_pieces_b_g_col,knight_squares,king_castle_squares,nb_checks,var_grp
1743,10,1,1,2,2,1,7,1,1,2,...,2,0,1,0,2,1,1,0,1,middle_grp
7772,10,0,0,0,0,0,32,1,1,2,...,3,6,0,0,0,0,1,0,0,easy_grp
8400,10,1,1,1,0,2,0,1,0,1,...,0,0,0,0,0,1,1,0,0,easy_grp
2760,10,1,1,2,1,2,7,1,1,2,...,3,2,0,2,1,0,1,0,0,middle_grp
7741,10,0,0,0,0,0,32,1,1,2,...,2,6,0,0,1,0,0,0,0,easy_grp


In [58]:
y_tr = tr_variant # the new dependent variable is the variant that we want to predict
y_tr.head()

1743     crazyhouse
7772          horde
8400    racingKings
2760       chess960
7741          horde
Name: variant, dtype: object

In [59]:
X_te['var_grp'] = logistic.predict(X_te) # adding the predicted variant group to the testing set
X_te.head()

Unnamed: 0,nb_moves,nb_K,nb_Q,nb_R,nb_B,nb_N,nb_P,nb_k,nb_q,nb_r,...,nb_pawns_third_row_c_f,nb_pawns_b_g_col,nb_pieces_center,nb_pieces_ext_center,nb_pieces_third_row_c_f,nb_pieces_b_g_col,knight_squares,king_castle_squares,nb_checks,var_grp
3035,10,1,1,2,1,1,8,1,1,2,...,2,0,0,0,1,1,1,0,0,middle_grp
8710,10,1,1,1,1,1,0,1,0,2,...,0,0,0,0,1,1,2,0,0,easy_grp
3382,10,1,1,2,2,2,7,1,1,2,...,2,0,0,1,2,2,2,0,0,hard_grp
236,10,1,0,2,2,2,5,1,0,2,...,1,0,0,1,4,1,4,1,1,hard_grp
620,10,1,1,2,2,2,8,1,1,2,...,3,2,0,0,2,2,3,0,0,middle_grp


In [60]:
y_te = te_variant # the new dependent variable is the variant that we want to predict
y_te.head()

3035    kingOfTheHill
8710      racingKings
3382    kingOfTheHill
236             blitz
620             blitz
Name: variant, dtype: object

Splitting the original dataset into three subsets: 

In [61]:
X_tr_easy = X_tr[X_tr['var_grp'] == 'easy_grp']
X_tr_middle = X_tr[X_tr['var_grp'] == 'middle_grp']
X_tr_hard = X_tr[X_tr['var_grp'] == 'hard_grp']

In [62]:
X_tr_easy.head()

Unnamed: 0,nb_moves,nb_K,nb_Q,nb_R,nb_B,nb_N,nb_P,nb_k,nb_q,nb_r,...,nb_pawns_third_row_c_f,nb_pawns_b_g_col,nb_pieces_center,nb_pieces_ext_center,nb_pieces_third_row_c_f,nb_pieces_b_g_col,knight_squares,king_castle_squares,nb_checks,var_grp
7772,10,0,0,0,0,0,32,1,1,2,...,3,6,0,0,0,0,1,0,0,easy_grp
8400,10,1,1,1,0,2,0,1,0,1,...,0,0,0,0,0,1,1,0,0,easy_grp
7741,10,0,0,0,0,0,32,1,1,2,...,2,6,0,0,1,0,0,0,0,easy_grp
7470,10,0,0,0,0,0,30,1,1,2,...,5,7,0,0,0,0,1,0,0,easy_grp
8448,5,1,0,1,0,2,0,1,1,2,...,0,0,0,0,0,0,2,1,0,easy_grp


In [63]:
y_tr_easy = y_tr.loc[y_tr.index.isin(X_tr_easy.index)] 
y_tr_middle = y_tr.loc[y_tr.index.isin(X_tr_middle.index)] 
y_tr_hard = y_tr.loc[y_tr.index.isin(X_tr_hard.index)] 

In [64]:
X_tr_easy = X_tr_easy.drop(['var_grp'], axis=1)
X_tr_middle = X_tr_middle.drop(['var_grp'], axis=1)
X_tr_hard = X_tr_hard.drop(['var_grp'], axis=1)

In [65]:
y_tr_easy.value_counts()

racingKings    900
horde          900
Name: variant, dtype: int64

In [66]:
y_tr_middle.value_counts()

crazyhouse    900
chess960      900
atomic        900
antichess     900
Name: variant, dtype: int64

In [67]:
y_tr_hard.value_counts()

kingOfTheHill    900
blitz            900
threeCheck       900
Name: variant, dtype: int64

Doing the same splits for the testing set:

In [68]:
X_te_easy = X_te[X_te['var_grp'] == 'easy_grp']
X_te_middle = X_te[X_te['var_grp'] == 'middle_grp']
X_te_hard = X_te[X_te['var_grp'] == 'hard_grp']

In [69]:
y_te_easy = y_te.loc[y_te.index.isin(X_te_easy.index)] 
y_te_middle = y_te.loc[y_te.index.isin(X_te_middle.index)] 
y_te_hard = y_te.loc[y_te.index.isin(X_te_hard.index)] 

In [70]:
X_te_easy = X_te_easy.drop(['var_grp'], axis=1)
X_te_middle = X_te_middle.drop(['var_grp'], axis=1)
X_te_hard = X_te_hard.drop(['var_grp'], axis=1)

In [71]:
y_te_easy.value_counts()

racingKings    100
horde          100
Name: variant, dtype: int64

In [72]:
y_te_middle.value_counts()

antichess        98
atomic           88
chess960         85
crazyhouse       45
threeCheck       36
blitz            36
kingOfTheHill    19
Name: variant, dtype: int64

In [73]:
y_te_hard.value_counts()

kingOfTheHill    81
blitz            64
threeCheck       64
crazyhouse       55
chess960         15
atomic           12
antichess         2
Name: variant, dtype: int64

We now have our different sets. In the training set, we know to which groups of variants an observation belongs, so we have "clean" sets with only the relevant variants. However, we see that for the testing set, we already have some missclassified units. For instance, about half of the crazyhouse games have already been missclassified in the first step.

In [74]:
# reminder
print(' easy_grp:', easy_grp , '\n', 
      'middle_grp:', middle_grp , '\n',
      'hard_grp:' , hard_grp)

 easy_grp: ['horde', 'racingKings'] 
 middle_grp: ['chess960', 'antichess', 'atomic', 'crazyhouse'] 
 hard_grp: ['blitz', 'kingOfTheHill', 'threeCheck']


The next steps consists of fitting the different models on the three subsets.

# Logistic Regression

## Hard set

In [75]:
# Define our steps
model = LogisticRegression()

# grid 
grid = {
    'C': np.logspace(-3, 4, 8),
}

# Grid search
grid_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1,
    return_train_score=True)

In [76]:
start_time = time.time()
grid_cv.fit(X_tr_hard, y_tr_hard)
logistic_hard_tune_time = time.time() - start_time

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    0.6s finished


In [77]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(grid_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), grid_cv.best_params_, round(logistic_hard_tune_time, 1)))

Unnamed: 0,param_C,mean_test_score,std_test_score
2,0.1,0.691852,0.031959
5,100.0,0.69037,0.031458
6,1000.0,0.69037,0.031458
7,10000.0,0.69037,0.031458
3,1.0,0.69,0.031657
4,10.0,0.69,0.03144
1,0.01,0.673333,0.029504
0,0.001,0.609259,0.032898


We can expect an accuracy close to 69.2% on the testing set with the following parameters: 
{'C': 0.1}. 
The tuning time is 0.7 sec.


In [78]:
logistic_hard = LogisticRegression(
    **grid_cv.best_params_, random_state=0)

In [79]:
start_time = time.time()
logistic_hard.fit(X_tr_hard, y_tr_hard)
logistic_hard_train_time = time.time() - start_time



In [80]:
logistic_hard_accuracy = logistic_hard.score(X_te_hard, y_te_hard)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(logistic_hard_accuracy*100))

Accuracy on Testing Set: accuracy: 52.6%.


In [81]:
logistic_hard_conf=pd.DataFrame(confusion_matrix(logistic_hard.predict(X_te_hard), y_te_hard), index=sorted(list(set(y_te_hard))), columns=sorted(list(set(y_te_hard))))/100
logistic_hard_conf

Unnamed: 0,antichess,atomic,blitz,chess960,crazyhouse,kingOfTheHill,threeCheck
antichess,0.0,0.0,0.0,0.0,0.0,0.0,0.0
atomic,0.0,0.0,0.0,0.0,0.0,0.0,0.0
blitz,0.0,0.01,0.5,0.09,0.3,0.17,0.11
chess960,0.0,0.0,0.0,0.0,0.0,0.0,0.0
crazyhouse,0.0,0.0,0.0,0.0,0.0,0.0,0.0
kingOfTheHill,0.0,0.03,0.12,0.04,0.11,0.57,0.06
threeCheck,0.02,0.08,0.02,0.02,0.14,0.07,0.47


In [82]:
logistic_hard_metrics=pd.DataFrame(classification_report(logistic_hard.predict(X_te_hard), y_te_hard, output_dict=True)).T.round(3)
logistic_hard_metrics[(logistic_hard_metrics.index.isin(hard_grp)) | (logistic_hard_metrics.index == 'weighted avg')]

  'recall', 'true', average, warn_for)


Unnamed: 0,f1-score,precision,recall,support
blitz,0.549,0.781,0.424,118.0
kingOfTheHill,0.655,0.704,0.613,93.0
threeCheck,0.644,0.734,0.573,82.0
weighted avg,0.609,0.744,0.526,293.0


## Middle set

In [83]:
# Define our steps
model = LogisticRegression()

# grid 
grid = {
    'C': np.logspace(-3, 4, 8),
}

# Grid search
grid_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1,
    return_train_score=True)

In [84]:
start_time = time.time()
grid_cv.fit(X_tr_middle, y_tr_middle)
logistic_middle_tune_time = time.time() - start_time

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    1.9s finished


In [85]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(grid_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), grid_cv.best_params_, round(logistic_middle_tune_time, 1)))

Unnamed: 0,param_C,mean_test_score,std_test_score
7,10000.0,0.915278,0.010244
6,1000.0,0.912778,0.010408
5,100.0,0.909722,0.010936
4,10.0,0.904167,0.011146
3,1.0,0.888333,0.012317
2,0.1,0.876389,0.014245
1,0.01,0.852222,0.012868
0,0.001,0.806389,0.013076


We can expect an accuracy close to 91.5% on the testing set with the following parameters: 
{'C': 10000.0}. 
The tuning time is 2.2 sec.


In [86]:
logistic_middle = LogisticRegression(
    **grid_cv.best_params_, random_state=0)

In [87]:
start_time = time.time()
logistic_middle.fit(X_tr_middle, y_tr_middle)
logistic_middle_train_time = time.time() - start_time



In [88]:
logistic_middle_accuracy = logistic_middle.score(X_te_middle, y_te_middle)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(logistic_middle_accuracy*100))

Accuracy on Testing Set: accuracy: 74.0%.


In [89]:
logistic_middle_conf=pd.DataFrame(confusion_matrix(logistic_middle.predict(X_te_middle), y_te_middle), index=sorted(list(set(y_te_middle))), columns=sorted(list(set(y_te_middle))))/100
logistic_middle_conf

Unnamed: 0,antichess,atomic,blitz,chess960,crazyhouse,kingOfTheHill,threeCheck
antichess,0.98,0.01,0.01,0.0,0.0,0.0,0.0
atomic,0.0,0.85,0.0,0.0,0.0,0.0,0.0
blitz,0.0,0.0,0.0,0.0,0.0,0.0,0.0
chess960,0.0,0.02,0.16,0.77,0.04,0.1,0.12
crazyhouse,0.0,0.0,0.19,0.08,0.41,0.09,0.24
kingOfTheHill,0.0,0.0,0.0,0.0,0.0,0.0,0.0
threeCheck,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [90]:
logistic_middle_metrics=pd.DataFrame(classification_report(logistic_middle.predict(X_te_middle), y_te_middle, output_dict=True)).T.round(3)
logistic_middle_metrics[(logistic_middle_metrics.index.isin(middle_grp)) | (logistic_middle_metrics.index == 'weighted avg')]

  'recall', 'true', average, warn_for)


Unnamed: 0,f1-score,precision,recall,support
antichess,0.99,1.0,0.98,100.0
atomic,0.983,0.966,1.0,85.0
chess960,0.748,0.906,0.636,121.0
crazyhouse,0.562,0.911,0.406,101.0
weighted avg,0.81,0.943,0.74,407.0


## Easy set

In [91]:
# Define our steps
model = LogisticRegression()

# grid 
grid = {
    'C': np.logspace(-3, 4, 8),
}

# Grid search
grid_cv = GridSearchCV(
    model, 
    grid, 
    cv=kfold,    
    verbose = 1,
    n_jobs = -1,
    return_train_score=True)

In [92]:
start_time = time.time()
grid_cv.fit(X_tr_easy, y_tr_easy)
logistic_easy_tune_time = time.time() - start_time

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    0.4s finished


In [93]:
# Collect results in a DataFrame
cv_results = pd.DataFrame(grid_cv.cv_results_)

# getting the columns of interest:
cols = []
for i in range(len(grid.keys())):
    cols.append('param_'+list(grid.keys())[i]) 
cols.append('mean_test_score')
cols.append('std_test_score')

# printing the results
display(cv_results[cols].sort_values('mean_test_score', ascending=False).head(10))
print("We can expect an accuracy close to {:.1f}% on the testing set with the following parameters: \n{}. \nThe tuning time is {} sec.".format(max(cv_results['mean_test_score']*100), grid_cv.best_params_, round(logistic_easy_tune_time, 1)))

Unnamed: 0,param_C,mean_test_score,std_test_score
0,0.001,1.0,0.0
1,0.01,1.0,0.0
2,0.1,1.0,0.0
3,1.0,1.0,0.0
4,10.0,1.0,0.0
5,100.0,1.0,0.0
6,1000.0,1.0,0.0
7,10000.0,1.0,0.0


We can expect an accuracy close to 100.0% on the testing set with the following parameters: 
{'C': 0.001}. 
The tuning time is 0.4 sec.


In [94]:
logistic_easy = LogisticRegression(
    **grid_cv.best_params_, random_state=0)

In [95]:
start_time = time.time()
logistic_easy.fit(X_tr_easy, y_tr_easy)
logistic_easy_train_time = time.time() - start_time



In [96]:
logistic_easy_accuracy = logistic_easy.score(X_te_easy, y_te_easy)
print('Accuracy on Testing Set: accuracy: {:.1f}%.'.format(logistic_easy_accuracy*100))

Accuracy on Testing Set: accuracy: 100.0%.


In [97]:
logistic_easy_conf=pd.DataFrame(confusion_matrix(logistic_easy.predict(X_te_easy), y_te_easy), index=sorted(list(set(y_te_easy))), columns=sorted(list(set(y_te_easy))))/100
logistic_easy_conf

Unnamed: 0,horde,racingKings
horde,1.0,0.0
racingKings,0.0,1.0


In [98]:
logistic_easy_metrics=pd.DataFrame(classification_report(logistic_easy.predict(X_te_easy), y_te_easy, output_dict=True)).T.round(3)
logistic_easy_metrics[(logistic_easy_metrics.index.isin(easy_grp)) | (logistic_easy_metrics.index == 'weighted avg')]

Unnamed: 0,f1-score,precision,recall,support
horde,1.0,1.0,1.0,100.0
racingKings,1.0,1.0,1.0,100.0
weighted avg,1.0,1.0,1.0,200.0


# Accuracy logistic

Obviously, the global accuracy of our nested models will be lower than the weighted average of the accuracies of the 3 differents sets. Indeed, some datapoints have already been missclassified in the first step of the analysis (i.e. some observations were classified as "hard_grp" while they were for instance "middle_grp").  
So let's calculate the "real accuracy".  
We start by merging all the predictions into two uniques arrays.

In [99]:
y_pred = np.concatenate([logistic_easy.predict(X_te_easy), logistic_middle.predict(X_te_middle), logistic_hard.predict(X_te_hard)])
true_y = np.concatenate([y_te_easy, y_te_middle, y_te_hard])

We are then ready to calculate our relevant metrics.

In [100]:
logistic_nested_conf=pd.DataFrame(confusion_matrix(y_pred, true_y), index=sorted(list(set(variants))), columns=sorted(list(set(variants))))/100
logistic_nested_conf

Unnamed: 0,antichess,atomic,blitz,chess960,crazyhouse,horde,kingOfTheHill,racingKings,threeCheck
antichess,0.98,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
atomic,0.0,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.0
blitz,0.0,0.01,0.5,0.09,0.3,0.0,0.17,0.0,0.11
chess960,0.0,0.02,0.16,0.77,0.04,0.0,0.1,0.0,0.12
crazyhouse,0.0,0.0,0.19,0.08,0.41,0.0,0.09,0.0,0.24
horde,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
kingOfTheHill,0.0,0.03,0.12,0.04,0.11,0.0,0.57,0.0,0.06
racingKings,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
threeCheck,0.02,0.08,0.02,0.02,0.14,0.0,0.07,0.0,0.47


In [101]:
accuracy_score(y_pred, true_y)

0.7277777777777777

In [102]:
logistic_nested_metrics=pd.DataFrame(classification_report(y_pred, true_y, output_dict=True)).T.round(3)
logistic_nested_metrics

Unnamed: 0,f1-score,precision,recall,support
antichess,0.98,0.98,0.98,100.0
atomic,0.919,0.85,1.0,85.0
blitz,0.459,0.5,0.424,118.0
chess960,0.697,0.77,0.636,121.0
crazyhouse,0.408,0.41,0.406,101.0
horde,1.0,1.0,1.0,100.0
kingOfTheHill,0.591,0.57,0.613,93.0
racingKings,1.0,1.0,1.0,100.0
threeCheck,0.516,0.47,0.573,82.0
micro avg,0.728,0.728,0.728,900.0


# Going to the point

Now, let's clean the code to be able to run this procedure quickly and with the only goal of maximising the accuracy and retrieving the different metrics for a given model.

## Logistic

In [103]:
# "Easy model"
model = LogisticRegression()
grid = {'C': np.logspace(-3, 4, 8),}
log_easy_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
log_easy_cv.fit(X_tr_easy, y_tr_easy)
log_easy = LogisticRegression(**log_easy_cv.best_params_, random_state=0)

start_time = time.time()
log_easy.fit(X_tr_easy, y_tr_easy)
logistic_2nd_train_time = time.time() - start_time

# "Middle model"
model = LogisticRegression()
grid = {'C': np.logspace(-3, 4, 8),}
log_middle_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
log_middle_cv.fit(X_tr_middle, y_tr_middle)
log_middle = LogisticRegression(**log_middle_cv.best_params_, random_state=0)

start_time = time.time()
log_middle.fit(X_tr_middle, y_tr_middle)
logistic_2nd_train_time = logistic_2nd_train_time + time.time() - start_time

# "Hard model"
model = LogisticRegression()
grid = {'C': np.logspace(-3, 4, 8),}
log_hard_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
log_hard_cv.fit(X_tr_hard, y_tr_hard)
log_hard = LogisticRegression(**log_hard_cv.best_params_, random_state=0)

start_time = time.time()
log_hard.fit(X_tr_hard, y_tr_hard)
logistic_2nd_train_time = logistic_2nd_train_time + time.time() - start_time


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    0.3s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    2.0s finished


Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:    0.7s finished


In [104]:
y_pred = np.concatenate([log_easy.predict(X_te_easy), log_middle.predict(X_te_middle), log_hard.predict(X_te_hard)])
true_y = np.concatenate([y_te_easy, y_te_middle, y_te_hard])

In [105]:
logistic_nested_accuracy = accuracy_score(y_pred, true_y)
logistic_nested_accuracy

0.7277777777777777

In [106]:
logistic_nested_metrics=pd.DataFrame(classification_report(y_pred, true_y, output_dict=True)).T.round(3)

## SVM

In [107]:
# "Easy model"
model = SVC()
grid = {'C': np.logspace(-0,5,5), 'kernel': ['sigmoid', 'rbf'], 'gamma': np.logspace(-5,0,5)} 
svm_easy_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
svm_easy_cv.fit(X_tr_easy, y_tr_easy)
svm_easy = SVC(**svm_easy_cv.best_params_, random_state=0)

start_time = time.time()
svm_easy.fit(X_tr_easy, y_tr_easy)
svm_2nd_train_time = time.time() - start_time

# "Middle model"
model = SVC()
grid = {'C': np.logspace(-0,5,5), 'kernel': ['sigmoid', 'rbf'], 'gamma': np.logspace(-5,0,5)} 
svm_middle_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
svm_middle_cv.fit(X_tr_middle, y_tr_middle)
svm_middle = SVC(**svm_middle_cv.best_params_, random_state=0)

start_time = time.time()
svm_middle.fit(X_tr_middle, y_tr_middle)
svm_2nd_train_time = svm_2nd_train_time + time.time() - start_time

# "Hard model"
model = SVC()
grid = {'C': np.logspace(-0,5,5), 'kernel': ['sigmoid', 'rbf'], 'gamma': np.logspace(-5,0,5)} 
svm_hard_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
svm_hard_cv.fit(X_tr_hard, y_tr_hard)
svm_hard = SVC(**svm_hard_cv.best_params_, random_state=0)

start_time = time.time()
svm_hard.fit(X_tr_hard, y_tr_hard)
svm_2nd_train_time = svm_2nd_train_time + time.time() - start_time

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 5 folds for each of 50 candidates, totalling 250 fits


[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:    3.9s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 5 folds for each of 50 candidates, totalling 250 fits


[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    6.1s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   25.7s
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:   37.7s finished


Fitting 5 folds for each of 50 candidates, totalling 250 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    3.7s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   20.7s
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:  1.0min finished


In [108]:
y_pred = np.concatenate([svm_easy.predict(X_te_easy), svm_middle.predict(X_te_middle), svm_hard.predict(X_te_hard)])
true_y = np.concatenate([y_te_easy, y_te_middle, y_te_hard])

In [109]:
svm_nested_accuracy = accuracy_score(y_pred, true_y)
svm_nested_accuracy

0.7333333333333333

In [110]:
svm_nested_metrics=pd.DataFrame(classification_report(y_pred, true_y, output_dict=True)).T.round(3)

## Random Forest

In [111]:
# "Easy model"
model = RandomForestClassifier()
grid = {'n_estimators': [20,100,500,750,1000], 'max_depth': [5,10,20,50,75,100]} 
rf_easy_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
rf_easy_cv.fit(X_tr_easy, y_tr_easy)
rf_easy = RandomForestClassifier(**rf_easy_cv.best_params_, random_state=0)

start_time = time.time()
rf_easy.fit(X_tr_easy, y_tr_easy)
rf_2nd_train_time = time.time() - start_time

# "Middle model"
model = RandomForestClassifier()
grid = {'n_estimators': [20,100,500,750,1000], 'max_depth': [5,10,20,50,75,100]} 
rf_middle_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
rf_middle_cv.fit(X_tr_middle, y_tr_middle)
rf_middle = RandomForestClassifier(**rf_middle_cv.best_params_, random_state=0)

start_time = time.time()
rf_middle.fit(X_tr_middle, y_tr_middle)
rf_2nd_train_time = rf_2nd_train_time + time.time() - start_time

# "Hard model"
model = RandomForestClassifier()
grid = {'n_estimators': [20,100,500,750,1000], 'max_depth': [5,10,20,50,75,100]} 
rf_hard_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
rf_hard_cv.fit(X_tr_hard, y_tr_hard)
rf_hard = RandomForestClassifier(**rf_hard_cv.best_params_, random_state=0)

start_time = time.time()
rf_hard.fit(X_tr_hard, y_tr_hard)
rf_2nd_train_time = rf_2nd_train_time + time.time() - start_time

Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:   19.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Done  78 tasks      | elapsed:   22.5s
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:   47.4s finished


Fitting 5 folds for each of 30 candidates, totalling 150 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done 150 out of 150 | elapsed:   44.6s finished


In [112]:
y_pred = np.concatenate([rf_easy.predict(X_te_easy), rf_middle.predict(X_te_middle), rf_hard.predict(X_te_hard)])
true_y = np.concatenate([y_te_easy, y_te_middle, y_te_hard])

In [113]:
rf_nested_accuracy = accuracy_score(y_pred, true_y)
rf_nested_accuracy 

0.7188888888888889

In [114]:
rf_nested_metrics=pd.DataFrame(classification_report(y_pred, true_y, output_dict=True)).T.round(3)

## Neural network

In [115]:
# "Easy model"
model = MLPClassifier()
grid = {'hidden_layer_sizes': [5,10,15,20,50,100],
        'alpha': np.logspace(-5,5,5),
        'solver': ['adam', 'sgd'],
        'activation': ['logistic', 'relu'],
        'early_stopping': [True]
       } 
nnet_easy_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
nnet_easy_cv.fit(X_tr_easy, y_tr_easy)
nnet_easy = MLPClassifier(**nnet_easy_cv.best_params_, random_state=0)

start_time = time.time()
nnet_easy.fit(X_tr_easy, y_tr_easy)
nnet_2nd_train_time = time.time() - start_time

# "Middle model"
model = MLPClassifier()
grid = {'hidden_layer_sizes': [5,10,15,20,50,100],
        'alpha': np.logspace(-5,5,5),
        'solver': ['adam', 'sgd'],
        'activation': ['logistic', 'relu'],
        'early_stopping': [True]
       } 
nnet_middle_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
nnet_middle_cv.fit(X_tr_middle, y_tr_middle)
nnet_middle = MLPClassifier(**nnet_middle_cv.best_params_, random_state=0)

start_time = time.time()
nnet_middle.fit(X_tr_middle, y_tr_middle)
nnet_2nd_train_time = nnet_2nd_train_time + time.time() - start_time

# "Hard model"
model = MLPClassifier()
grid = {'hidden_layer_sizes': [5,10,15,20,50,100],
        'alpha': np.logspace(-5,5,5),
        'solver': ['adam', 'sgd'],
        'activation': ['logistic', 'relu'],
        'early_stopping': [True]
       } 
nnet_hard_cv = GridSearchCV(model, grid, cv=kfold, verbose = 1,n_jobs = -1,return_train_score=True)
nnet_hard_cv.fit(X_tr_hard, y_tr_hard)
nnet_hard = MLPClassifier(**nnet_hard_cv.best_params_, random_state=0)

start_time = time.time()
nnet_hard.fit(X_tr_hard, y_tr_hard)
nnet_2nd_train_time = nnet_2nd_train_time + time.time() - start_time



Fitting 5 folds for each of 120 candidates, totalling 600 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  52 tasks      | elapsed:    1.0s
[Parallel(n_jobs=-1)]: Done 352 tasks      | elapsed:    5.9s
[Parallel(n_jobs=-1)]: Done 600 out of 600 | elapsed:   11.6s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


Fitting 5 folds for each of 120 candidates, totalling 600 fits


[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    3.8s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   24.3s
[Parallel(n_jobs=-1)]: Done 434 tasks      | elapsed:   45.2s
[Parallel(n_jobs=-1)]: Done 600 out of 600 | elapsed:   55.5s finished


Fitting 5 folds for each of 120 candidates, totalling 600 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  52 tasks      | elapsed:    3.7s
[Parallel(n_jobs=-1)]: Done 214 tasks      | elapsed:   14.8s
[Parallel(n_jobs=-1)]: Done 464 tasks      | elapsed:   29.1s
[Parallel(n_jobs=-1)]: Done 600 out of 600 | elapsed:   35.5s finished


In [116]:
y_pred = np.concatenate([nnet_easy.predict(X_te_easy), nnet_middle.predict(X_te_middle), nnet_hard.predict(X_te_hard)])
true_y = np.concatenate([y_te_easy, y_te_middle, y_te_hard])

In [117]:
nnet_nested_accuracy=accuracy_score(y_pred, true_y)
nnet_nested_accuracy

0.71

In [118]:
nnet_nested_metrics=pd.DataFrame(classification_report(y_pred, true_y, output_dict=True)).T.round(3)

# Results

In [119]:
logistic_nested_scores_avg=list(logistic_nested_metrics.loc['weighted avg',:])[:3]
logistic_nested_scores_blitz=list(logistic_nested_metrics.loc['blitz',:])
rf_nested_scores_avg=list(rf_nested_metrics.loc['weighted avg',:])[:3]
rf_nested_scores_blitz=list(rf_nested_metrics.loc['blitz',:])
svm_nested_scores_avg=list(svm_nested_metrics.loc['weighted avg',:])[:3]
svm_nested_scores_blitz=list(svm_nested_metrics.loc['blitz',:])
nnet_nested_scores_avg=list(logistic_nested_metrics.loc['weighted avg',:])[:3]
nnet_nested_scores_blitz=list(logistic_nested_metrics.loc['blitz',:])

In [120]:
# total train time = train time in the first iteration + sum of all training time for all 3 subsets in the 2nd iteration
logistic_nested_train_time = svm_train_time + logistic_2nd_train_time
rf_nested_train_time = svm_train_time + rf_2nd_train_time
svm_nested_train_time = svm_train_time + svm_2nd_train_time
nnet_nested_train_time = svm_train_time + nnet_2nd_train_time

In [123]:
results_nested_0=pd.DataFrame({
              'logistic': [logistic_nested_accuracy, logistic_nested_train_time, logistic_nested_scores_avg[0], logistic_nested_scores_avg[1], logistic_nested_scores_avg[2], logistic_nested_scores_blitz[0], logistic_nested_scores_blitz[1], logistic_nested_scores_blitz[2], logistic_nested_scores_blitz[3]],
              'random forest': [rf_nested_accuracy, rf_nested_train_time, rf_nested_scores_avg[0], rf_nested_scores_avg[1], rf_nested_scores_avg[2], rf_nested_scores_avg[0], rf_nested_scores_blitz[1], rf_nested_scores_blitz[2], rf_nested_scores_blitz[3]],
              'SVM': [svm_nested_accuracy, svm_nested_train_time, svm_nested_scores_avg[0], svm_nested_scores_avg[1], svm_nested_scores_avg[2], svm_nested_scores_blitz[0], svm_nested_scores_blitz[1], svm_nested_scores_blitz[2], svm_nested_scores_blitz[3]],
              'NN': [nnet_nested_accuracy, nnet_nested_train_time, nnet_nested_scores_avg[0], nnet_nested_scores_avg[1], nnet_nested_scores_avg[2], nnet_nested_scores_blitz[0], nnet_nested_scores_blitz[1], nnet_nested_scores_blitz[2], nnet_nested_scores_blitz[3]],
}, index = ['Accuracy', 'Training time', 'f1_avg', 'precision_avg', 'recall_avg', 'f1_blitz', 'precision_blitz', 'recall_blitz', 'support_blitz']).round(3).T.sort_values('Accuracy', ascending=False)
results_nested_0

Unnamed: 0,Accuracy,Training time,f1_avg,precision_avg,recall_avg,f1_blitz,precision_blitz,recall_blitz,support_blitz
SVM,0.733,58.9,0.734,0.743,0.733,0.468,0.52,0.426,122.0
logistic,0.728,58.402,0.726,0.728,0.728,0.459,0.5,0.424,118.0
random forest,0.719,62.176,0.719,0.723,0.719,0.719,0.54,0.422,128.0
NN,0.71,58.93,0.726,0.728,0.728,0.459,0.5,0.424,118.0


In [124]:
results_nested_0.to_pickle('datasets/results_nested_0')

Unfortunatelly, in this first iteration, the results are far from being good compared to what we had before.  
The training time looks very similar for all of the models. This comes from the first iteration where the SVM models has a quite high computation time. This could be reduced by using another model for the first iteration, but would not solve the accuracy problems that we observe. This idea will not be further explored.  

However, these results are obtain after a "random" split of the variants into the 3 groups easy/middle/hard.  
In the jupyter notebooks "2.3.X Capsone project - Nested (Copy ...).ipynb", we test other groups of variants to create the 3 subsets.