# Evaluation
“Everybody wants to be a bodybuilder, but nobody wants to lift no heavy-ass weights.”
― Ronnie Coleman

Often we hear alot of about neural networks, deep learning, convolutional networks and deep cross vectored reinforced bayes systems - well the last one was a joke but nevertheless it sounds pretty 'juicy'. Alot of the time people want to dive head first into these sexy machine learning techniques but neglect learning about the grunt work that comes with them. Evaluation is a crucial if not paramount step in machine learning, that along with preprocessing the data, takes a developer the bulk of the effort trying to get right. Building a model is all well and good but if you cannot asses its performance in a meaningful and demonstrable way then it might as well be left behind. Throughout this series, we will take ourselfs through a journey of assesing a model under various fascets of evaluation and critically analyse these different methods







# Which instances to test? Which instances to train? Both train and test?

Lets say we have a model and mountains of data to train it on. One's first thought would be to simply train the model on the whole dataset and then test the model on the same dataset as a whole. Although this sounds great on paper it is deeply flawed and is the mortal sin in machine learning. If there's one thing to be taken away it is DO NOT TEST ON TRAINING DATA! - ever. Imagine taking a subject where you learn the answers to a test prior to taking the test, would your knowledge be fairly assesed? - no definetely not. Therefore we must think of our task of training and evaluation as involving two seperate datasets (sampled from the same population - more on this later). This leads us to our first evaluation strategy - Holdout.

## Split Strategy 1: Holdout

What we do here is randomly sample x% of the dataset with no replacement and use this as train and the remaining instances as test. This leads to the testing dataset to be 'held out'. 

#### Lets look at quick implementation of this:

What we want to do is take a random selection of x% for training and 1-x% for testing:

```python
def get_train_test_split(df, split):
    
    train_percent, test_percent = split[0], split[1]

    # randomise our dataset and reset the indexes
    df_shuffled = df.sample(frac=1).reset_index(drop=True)
    
    #find the indexes we want to slice
    train_portion_index = int(len(df_shuffled.index)/train_percent)
    
    test_portion_index = train_portion_index+1
    
    #slice the instances according to the split ratio
    train_portion = df_shuffled[:train_portion_index].reset_index(drop=True)
    
    test_portion = df_shuffled[test_portion_index-1:].reset_index(drop=True)
    
    return train_portion, test_portion

```


#### We're done now right?
no, simple holdout still gives us the burden of choosing the 'right' split ratio. A large amount of training data will allow the model to see more instances and thus become better trained. But with this comes a small amount of test data that doesn't really allow us to test the generalisability of our model, in other words, this split doesn't fairly evaluate our model. 



## Building on holdout: Repeated Sub-sampling

What we can do to get an evaluation metric with holdout with considerabily less evaluation variance is run holdout multiple times (keeping the split ratio constant) and averaging the accuracy metric for all runs


## Cross Validation

In [2]:
import pandas as pd
from collections import defaultdict as dd
import random
random.seed(3000)
import numpy as np
from NaiveBayes import *

In [4]:
def preprocess(file, normal = True):
    if(normal):
        df = pd.read_csv("./2018S1-proj1_data/"+file+".csv",header=None)
        unnamed = df.columns[len(df.columns)-1]
        df.rename(columns={unnamed:'class'},inplace=True)
    return df

In [66]:
car_dataset = preprocess("car")

In [25]:
nb = NaiveBayes(car_dataset)

In [108]:
def get_train_test_split(df, split):
    
    train_percent = split[0]
    test_percent = split[1]
    
    print(train_percent, test_percent)
    
    df_shuffled = df.sample(frac=1).reset_index(drop=True)
        
    train_portion_index = int(len(df_shuffled.index)*train_percent)
    
    test_portion_index = train_portion_index+1
    
    print (train_portion_index, test_portion_index)
    
    
    train_portion = df_shuffled[:train_portion_index].reset_index(drop=True)
    
    test_portion = df_shuffled[test_portion_index-1:].reset_index(drop=True)
    
    
    return train_portion, test_portion

In [116]:
nb_holdout_90_10 = NaiveBayes(train)

In [117]:
results = nb_holdout_90_10.predict(test)

TypeError: predict() takes 1 positional argument but 2 were given