# Benchmarking Models 

So in this notebook we will take the different data set, fit models and test their accuracy. From this hopefully we will find the best data to be using and hopefully gain some insight into out models. 

In [87]:
import pandas as pd
import numpy as np
from tqdm import tqdm
from sklearn import model_selection
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix, f1_score

import xgboost as xgb
import warnings
import matplotlib.pyplot as plt
import seaborn as sns

warnings.filterwarnings('ignore')

In [52]:
train_benchmark = pd.read_csv("Benchmarks/Benchmark_dataset.csv")
train_full = pd.read_csv("full_train_data.csv")
train_Without_avg = pd.read_csv("without_avg_train.csv")
train_Without_avg_BS_S = pd.read_csv("without_avg_train_BS_S.csv")
train_Without_avg_BS_S_sum = pd.read_csv("without_avg_train_BS_S_sum.csv")

                            
train_scores = pd.read_csv('Train_Data/Y_train.csv', index_col=0)
train_scores = train_scores.loc[train_benchmark.index] # This is our target


## Modelling Wins

In order to best compare to the benchmark given on the website we will start by trying to predict the number of wins.

In [3]:
train_y_AWAY_WINS = train_scores['AWAY_WINS']

In [4]:
def get_train_valid_test(X, y):
    X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, train_size=0.8, random_state=42)
    X_train, X_valid, y_train, y_valid = model_selection.train_test_split(X_train, y_train, train_size=0.8, random_state=42)

    return X_train, X_valid, X_test, y_train, y_valid, y_test 

In [5]:
def format_1D_predictions(predictions):
    predictions[2] = 0 # This will mean that no draws are predicted
    predictions.columns = [0,2,1] # This labels the columns correctly
    return (predictions.reindex(columns=[0,1,2]).rank(1,ascending=False)==1).astype(int).values # This reorders the columns

In [62]:
XGB_Benchmark_params = {
    'booster': 'gbtree',
    'tree_method':'hist',
    'max_depth': 8, 
    'learning_rate': 0.025,
    'objective': 'multi:softprob',
    'num_class': 2,
    'eval_metric': 'mlogloss'
    }

In [84]:
def get_prediction_score(model, X_test, y_test):
    predictions = model.predict(X_test, iteration_range=(0, model.best_iteration))
    predictions = pd.DataFrame(predictions)
    
    predictions = format_1D_predictions(predictions)
    
    
    target = train_scores.loc[X_test.index].copy()
    sns.heatmap(confusion_matrix(predictions, target.to_numpy()))
    
    return np.round(accuracy_score(predictions,target),4), np.round(f1_score(predictions,target) , 4)
    

In [85]:
def training_1D_XGB(training_data, replace_0_with_nan = False, XGB_params = XGB_Benchmark_params):
    X_train, X_valid, X_test, y_train, y_valid, y_test = get_train_valid_test(training_data, train_y_AWAY_WINS)
    
    if replace_0_with_nan:
        X_train = X_train.replace({0:np.nan})
    
    xgb_model = xgb.XGBClassifier(random_state=42, **XGB_params)
    bst = xgb_model.fit(X_train, y_train)
    
    test_acc, train_acc, f1 = get_prediction_score(xgb_model,X_test,y_test), get_prediction_score(xgb_model,X_train,y_train)
    
    print(f"Test accuracy: {test_acc}; Training accuracy: {train_acc}")
    return xgb_model

In [86]:
training_1D_XGB(train_benchmark, False); 
XGB_bench = training_1D_XGB(train_benchmark, True); 
training_1D_XGB(train_full); 
XGB_without_avg = training_1D_XGB(train_Without_avg); # I think that this is our best model 
training_1D_XGB(train_Without_avg_BS_S);
training_1D_XGB(train_Without_avg_BS_S_sum);

ValueError: multilabel-indicator is not supported

 Okay, so I think the best data set we have is train_without_avg_BS_S_Sum. But we note that there is some serious overfitting going on. Note that we aren't predicting draws, therefore a 'perfect' score would be ~75% (25% are draws)

## Modelling all Classes