# Classifier Evaluation

## Task 1: Hold-Out

Implement the Hold-Out method in Python. Your function should accept a list of arbitrary elements and a parameter `training_set_ratio` (e.g. 0.8).
The function should return a training set and a testing set as specified by the `training_set_ratio`.

Note: Normally, we would use existing implementations (see below). 

In [130]:
import numpy as np
from sklearn.model_selection import train_test_split

In [131]:
x = list("Christmas!!")
train_test_split(x, train_size=.8)

[['s', 'm', 'h', 'C', 'r', 'i', '!', 'a'], ['s', '!', 't']]

In [132]:
def holdOut(data, training_set_ratio=0.8):
    
    assert training_set_ratio >= 0.0 and training_set_ratio <= 1.0,  " training_set_ratio not in [0,1]"
    
    n = len(data)
    index_array = np.arange(n, dtype=np.int32)
    np.random.shuffle(index_array)
    s_index = int(training_set_ratio * n)
    
    return [index_array[:s_index], index_array[s_index:]]

In [135]:
train_indices, test_indices = holdOut(x, 0.8)
train_indices, test_indices

(array([ 7,  8, 10,  6,  9,  4,  2,  0], dtype=int32),
 array([3, 5, 1], dtype=int32))

## Task 2: $k$-fold Cross-Validation

Below, you see how you can use $k$-fold CV in sklearn.

In [137]:
x = np.array(list("ABCDEFGHIJK"))
x

array(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K'], dtype='<U1')

In [138]:
from sklearn.model_selection import KFold

splits = KFold(3, shuffle=False)

for i, (train_index, test_index) in enumerate(splits.split(x)):
    print(train_index, test_index)
    print(x[train_index], x[test_index])
    print()

[ 4  5  6  7  8  9 10] [0 1 2 3]
['E' 'F' 'G' 'H' 'I' 'J' 'K'] ['A' 'B' 'C' 'D']

[ 0  1  2  3  8  9 10] [4 5 6 7]
['A' 'B' 'C' 'D' 'I' 'J' 'K'] ['E' 'F' 'G' 'H']

[0 1 2 3 4 5 6 7] [ 8  9 10]
['A' 'B' 'C' 'D' 'E' 'F' 'G' 'H'] ['I' 'J' 'K']



Now, implement the $k$-fold splitting for cross-validation in Python. 

Note: This is rather a Python exercise.

In [139]:
def cv(data, k):
    
    n = len(data)
    index_arr = np.arange(n)
    
    fold_size = n // k
    remaining_elements_for_last_fold = n - (k * fold_size)
    
    print(f"N {n} FoldSize {fold_size} RemainingElements?: {remaining_elements_for_last_fold}")
    
    for i in range(k):
        fold_strt_i = i * fold_size
        fold_end_i = (i+1) * fold_size
        
        # If we are dealing with the last fold
        # -> Add the left-over elements to the fold
        if i == k-1:
            fold_end_i = n
        
        print(f"Fold {i} StrtIndex {fold_strt_i} EndIndex {fold_end_i}") 
      
        test_indices = index_arr[fold_strt_i : fold_end_i]
        train_indices = np.concatenate([index_arr[:fold_strt_i], index_arr[fold_end_i:]])   
        
        print(train_indices, test_indices)

cv(x, 5)

N 11 FoldSize 2 RemainingElements?: 1
Fold 0 StrtIndex 0 EndIndex 2
[ 2  3  4  5  6  7  8  9 10] [0 1]
Fold 1 StrtIndex 2 EndIndex 4
[ 0  1  4  5  6  7  8  9 10] [2 3]
Fold 2 StrtIndex 4 EndIndex 6
[ 0  1  2  3  6  7  8  9 10] [4 5]
Fold 3 StrtIndex 6 EndIndex 8
[ 0  1  2  3  4  5  8  9 10] [6 7]
Fold 4 StrtIndex 8 EndIndex 11
[0 1 2 3 4 5 6 7] [ 8  9 10]
