# Benchmarking Models 

So in this notebook we will take the different data set, fit models and test their accuracy. From this hopefully we will find the best data to be using and hopefully gain some insight into out models. 

In [26]:
import pandas as pd
import numpy as np
from tqdm import tqdm
from sklearn import model_selection
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix, f1_score
from sklearn.linear_model import LogisticRegression

import xgboost as xgb
import warnings
import matplotlib.pyplot as plt
import seaborn as sns

warnings.filterwarnings('ignore')

In [2]:
train_benchmark = pd.read_csv("Benchmarks/Benchmark_dataset.csv")
train_full = pd.read_csv("full_train_data.csv")
train_Without_avg = pd.read_csv("without_avg_train.csv")
train_Without_avg_BS_S = pd.read_csv("without_avg_train_BS_S.csv")
train_Without_avg_BS_S_sum = pd.read_csv("without_avg_train_BS_S_sum.csv")

                            
train_scores = pd.read_csv('Train_Data/Y_train.csv', index_col=0)
train_scores = train_scores.loc[train_benchmark.index] # This is our target


## Modelling Wins

In order to best compare to the benchmark given on the website we will start by trying to predict the number of wins.

In [3]:
train_y_AWAY_WINS = train_scores['AWAY_WINS']

In [9]:
def get_train_valid_test(X, y):
    X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, train_size=0.8, random_state=42)
    X_train, X_valid, y_train, y_valid = model_selection.train_test_split(X_train, y_train, train_size=0.8, random_state=42)

    return X_train, X_valid, X_test, y_train, y_valid, y_test 

In [10]:
def format_1D_predictions(predictions):
    predictions[2] = 0 # This will mean that no draws are predicted
    predictions.columns = [0,2,1] # This labels the columns correctly
    return (predictions.reindex(columns=[0,1,2]).rank(1,ascending=False)==1).astype(int).values # This reorders the columns

In [11]:
XGB_Benchmark_params = {
    'booster': 'gbtree',
    'tree_method':'hist',
    'max_depth': 8, 
    'learning_rate': 0.025,
    'objective': 'multi:softprob',
    'num_class': 2,
    'eval_metric': 'mlogloss'
    }

In [12]:
def get_prediction_score(model, X_test, y_test):
    predictions = model.predict(X_test, iteration_range=(0, model.best_iteration))
    predictions = pd.DataFrame(predictions)
    
    predictions = format_1D_predictions(predictions)
    
    target = train_scores.loc[X_test.index].copy()
    return np.round(accuracy_score(predictions,target),4)

    

In [13]:
def training_1D_XGB(training_data, replace_0_with_nan = False, XGB_params = XGB_Benchmark_params):
    X_train, X_valid, X_test, y_train, y_valid, y_test = get_train_valid_test(training_data, train_y_AWAY_WINS)
    
    if replace_0_with_nan:
        X_train = X_train.replace({0:np.nan})
    
    xgb_model = xgb.XGBClassifier(random_state=42, **XGB_params)
    bst = xgb_model.fit(X_train, y_train)
    
    test_acc, train_acc = get_prediction_score(xgb_model,X_test,y_test), get_prediction_score(xgb_model,X_train,y_train)
    
    print(f"Test accuracy: {test_acc}; Training accuracy: {train_acc}")
    return xgb_model

In [39]:
training_1D_XGB(train_benchmark, False); 
XGB_bench = training_1D_XGB(train_benchmark, True); 
training_1D_XGB(train_full); 
XGB_without_avg = training_1D_XGB(train_Without_avg); # I think that this is our best model 
training_1D_XGB(train_Without_avg_BS_S);
training_1D_XGB(train_Without_avg_BS_S_sum);

Test accuracy: 0.4689; Training accuracy: 0.6847
Test accuracy: 0.4742; Training accuracy: 0.7071
Test accuracy: 0.4722; Training accuracy: 0.669
Test accuracy: 0.4791; Training accuracy: 0.679
Test accuracy: 0.477; Training accuracy: 0.6624
Test accuracy: 0.4779; Training accuracy: 0.6369


Okay, so I think the best data set we have is train_without_avg_BS_S_Sum. But we note that there is some serious overfitting going on. Note that we aren't predicting draws, therefore a 'perfect' score would be ~75% (25% are draws). 

I want to plot the confusion matrcies of the this: I think we are basically just predicting the home team wins
Let's create some functions to generate the analytics of this data set and consider how to preceed. A few things that need to be dealt with:
1. I think we are overfitting the data (Difference in test/train scores)
2. The model is trying to just predict the most common class (i.e. wins) and the improvement is only slight upon this

## Modelling all Classes

In [22]:
def class_encoder(y, mapping = {tuple([1, 0, 0]): 2, tuple([0, 1, 0]): 1, tuple([0, 0, 1]): 0}):
    return (y.apply(lambda x : mapping[tuple(x)], axis = 1)).copy()

In [23]:
XGB_Benchmark_params_multi = {
    'booster': 'gbtree',
    'tree_method':'hist',
    'max_depth': 8, 
    'learning_rate': 0.025,
    'objective': 'multi:softprob',
    'num_class': 3,
    'eval_metric': 'mlogloss'
    }

In [24]:
def training_3D_XGB(training_data, XGB_params = XGB_Benchmark_params_multi):
    X_train, X_valid, X_test, y_train, y_valid, y_test = get_train_valid_test(training_data, train_scores)
    
    
    y_train_encoded = class_encoder(y_train)
    y_test_encoded = class_encoder(y_test)
    
    # Initialize XGBoost classifier
    model = xgb.XGBClassifier(random_state=42, **XGB_params)
    model.fit(X_train, y_train_encoded)
    
    # Predict the labels for the test set
    y_pred = model.predict(X_test)

    # plt.hist(y_pred, bins = 3, density=True)
    print(confusion_matrix(y_test_encoded, y_pred))
    
    # Evaluate accuracy
    test_acc, train_acc = accuracy_score(y_test_encoded, y_pred), accuracy_score(y_train_encoded, model.predict(X_train))
    test_f1, train_f1 = f1_score(y_test_encoded, y_pred, average='weighted'), f1_score(y_train_encoded, model.predict(X_train), average='weighted')
    print(f"Test accuracy: {np.round(test_acc,4)}; Training accuracy: {np.round(train_acc,4)}")
    print(f"Test f1: {np.round(test_f1,4)}; Training f1: {np.round(train_f1,4)}")
    print(f"{'-'*40}")
    return model

In [25]:
XGB_bench = training_3D_XGB(train_benchmark); 
training_3D_XGB(train_full); 
XGB_without_avg = training_3D_XGB(train_Without_avg); 
training_3D_XGB(train_Without_avg_BS_S);
training_3D_XGB(train_Without_avg_BS_S_sum);

[[327  48 383]
 [173  37 410]
 [206  56 821]]
Test accuracy: 0.4815; Training accuracy: 0.9592
Test f1: 0.43; Training f1: 0.9591
----------------------------------------
[[318  66 374]
 [174  53 393]
 [190  72 821]]
Test accuracy: 0.4844; Training accuracy: 0.9564
Test f1: 0.4395; Training f1: 0.9565
----------------------------------------
[[301  68 389]
 [178  43 399]
 [205  65 813]]
Test accuracy: 0.4701; Training accuracy: 0.9552
Test f1: 0.4224; Training f1: 0.9551
----------------------------------------
[[307  66 385]
 [183  51 386]
 [208  76 799]]
Test accuracy: 0.4701; Training accuracy: 0.9473
Test f1: 0.4266; Training f1: 0.9472
----------------------------------------
[[304  62 392]
 [168  39 413]
 [181  73 829]]
Test accuracy: 0.4762; Training accuracy: 0.9153
Test f1: 0.426; Training f1: 0.9151
----------------------------------------
