### How to generate training/testing data from a Dataset

In [14]:
data = [
    [3, 2, "no"],
    [6, 6, "yes"],
    [4, 1, "no"],
    [4, 4, "no"],
    [1, 2, "yes"],
    [2, 0, "no"],
    [0,3,"yes"],
    [1,6,"yes"]
]

1. holdout method
1. random subsampling
1. k fold cross validation
1. bootstrap mmethod

1. holdout method
    * you "hold out" some instances from the dataset for testing
    * train on the remaining instances
    * test on the held out instances
    * e.g. test_size=2 -> hold out 2 instances 
    * **NOTE**: You want to shuffle the dataset first, just to be sure the test set is not biased *(also seed before)*

2. random subsampling
    * repeat the holdout method k tmes
    * this is a different `k` from kNN
        * so you can make a (kNN classifier with `k = 3`) on a (random subsampled dataset with `k = 10`)
    * the accuracy is the average accuracy over the `k` holdout methods
    * this removed the bias of the test set (see **note** above)
    * for `k`:
        * run holdout method

3. k-fold cross validation
    * more intentional about generating out the testing sets
    * each instance is in the test set exactly one time
    * create `k` folds (groups)
    * for each fold in folds:
        * hold out the fold for testing
        * train on the remaining folds (folds - fold)
    * accuracy is the (total correct) / (total predicted over **ALL** folds)
    * when `k` doesnt go into `N` evenly, we stack the earlier folds to be larger than the later folds  
        * LIKE DEALING CARDS IN A CARD GAME (give the leftover to the next repeat type)
    * types:
        * LOOCV leave one out cross validations
            * k = N (the number of instances in the dataset)
            * when do you use it?
            *    when you have a small dataset and you need all the training data you can get
            * train on N-1, test on 1
        * stratified k fold cross validation
            * where every fold has roughly the same distribution of class labels as the original set
            * first, group by class
            * for each group, distribute the instances to each fold
                * like a card dealer
                * NOTE: continue exactly where you left off! (for each group in groups)
    

4. Bootstrap method
    * like random subsampling (2.) but with replacement (so you can get the same instance more than once)
    * create a training set by sampling `N` instances with replacement
    * ~63.2% will be sampled into your training set
    * ~37.8% will not be sampled, and will end up in your test set
        * this is called the out of bag sample
    * repeat `k` times
    * accuracy is the weighted average accuracy over the `k` samples

In [15]:
# first we groupby
yes = [1,4,6,7]
no = [0,2,3,5]

# now we create the folds
# we want to split the data into 4 folds
k = 4
folds = []
for i in range(k):
    folds.append([])

for i in range(len(yes)):
    folds[i % k].append(yes[i])
for i in range(len(no)):
    folds[i % k].append(no[i])

print(folds)

for X_test in folds:
    X_train = []
    for i in range(len(folds)):
        if folds[i] != X_test:
            X_train.append(folds[i])
    print("the training used against ", X_test, "will be ", X_train)

[[1, 0], [4, 2], [6, 3], [7, 5]]
the training used against  [1, 0] will be  [[4, 2], [6, 3], [7, 5]]
the training used against  [4, 2] will be  [[1, 0], [6, 3], [7, 5]]
the training used against  [6, 3] will be  [[1, 0], [4, 2], [7, 5]]
the training used against  [7, 5] will be  [[1, 0], [4, 2], [6, 3]]


## Evaluating Classifier Performance
* binary classification 
    * 2 class label
    * e.g. pos/neg, good/bad, yes/no, etc.
* multi-class classification
    * 3 or more class labels
    * pos/neg/neut, yes/no/maybe, etc. 
    

* $P$ = the # of positive instances in our test set  
* $N$ = the # of negative instances in our test set  
* $TP$ = the # of positive instances in our test set that were correctly classified  
* $TN$ = the # of negative instances in our test set that were correctly classified  
    * combined, these are our "successful" predictions ($TP$ + $TN$)
* $FP$ (False Positives) = the # of negative instances in our test set that were incorrectly classified
* $FN$ (False Negatives) = the # of positive instances in our test set that were incorrectly classified
    * combined, these are our "failed" predictions ($FP$ + $FN$)
    

### Generalized confusion matrix for binary classification
![](https://www.dataschool.io/content/images/2015/01/confusion_matrix2.png)