# Yacine Mahdid May 18 2020
This notebook is addressing directly this issue [number 28](https://github.com/BIAPT/eeg-pain-detection/issues/28)
We need to develop the permutation test module to check if the classifier is better than random.

To do so I will:
- [X] take out the part form `ex_14` that are directly relevant to the classification and put them here
- [X] refactor them so that they are easier to work with
- [X] build the permutation module
- [X] apply it to our current classification and check the result

To read:
- [X] [Pipeline article](https://towardsdatascience.com/a-simple-example-of-pipeline-in-machine-learning-with-scikit-learn-e726ffbb6976)

In [2]:
import numpy as np
import pandas as pd

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

from machine_learning_tools.classification import classify_loso
from machine_learning_tools.pre_processing import pre_process
from machine_learning_tools.classification import permutation_test

input_filename = '/home/yacine/Documents/BIAPT/data_window_10.csv'

pipe = Pipeline([
    ('imputer', SimpleImputer(missing_values=np.nan, strategy='mean')),
    ('scaler', StandardScaler()),
    ('SVM', SVC())])

X,y,group = pre_process(input_filename, "MSK")
accuracies = classify_loso(X, y, group, pipe)
np.mean(accuracies)

0.5509481738477935

The analysis is refactored to use sklearn concept in more depth. The analysis will flow like this:
1. preprocessing of the data using `preprocess`
2. classification using loso with `classify_loso`
3. permutation test to validation the classifier with `permutation_test`

In [3]:
(accuracy, permutation_scores, p_value) = permutation_test(X, y, group, pipe, num_permutation=1000)

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
Pickling array (shape=(1079, 19), dtype=float64).
Pickling array (shape=(1079,), dtype=int64).
Pickling array (shape=(1079,), dtype=int64).
Pickling array (shape=(1070,), dtype=int64).
Pickling array (shape=(9,), dtype=int64).
Pickling array (shape=(1067,), dtype=int64).
Pickling array (shape=(12,), dtype=int64).
Pickling array (shape=(1066,), dtype=int64).
Pickling array (shape=(13,), dtype=int64).
Pickling array (shape=(1061,), dtype=int64).
Pickling array (shape=(18,), dtype=int64).
Pickling array (shape=(1062,), dtype=int64).
Pickling array (shape=(17,), dtype=int64).
Pickling array (shape=(1068,), dtype=int64).
Pickling array (shape=(11,), dtype=int64).
Pickling array (shape=(1065,), dtype=int64).
Pickling array (shape=(14,), dtype=int64).
Pickling array (shape=(1059,), dtype=int64).
Pickling array (shape=(20,), dtype=int64).
Pickling array (shape=(1057,), dtype=int64).
Pickling array (shape=(22,), dtype=i

Pickling array (shape=(1079, 19), dtype=float64).
Pickling array (shape=(1079,), dtype=int64).
Pickling array (shape=(1079,), dtype=int64).
Pickling array (shape=(1070,), dtype=int64).
Pickling array (shape=(9,), dtype=int64).
Pickling array (shape=(1067,), dtype=int64).
Pickling array (shape=(12,), dtype=int64).
Pickling array (shape=(1066,), dtype=int64).
Pickling array (shape=(13,), dtype=int64).
Pickling array (shape=(1061,), dtype=int64).
Pickling array (shape=(18,), dtype=int64).
Pickling array (shape=(1062,), dtype=int64).
Pickling array (shape=(17,), dtype=int64).
Pickling array (shape=(1068,), dtype=int64).
Pickling array (shape=(11,), dtype=int64).
Pickling array (shape=(1065,), dtype=int64).
Pickling array (shape=(14,), dtype=int64).
Pickling array (shape=(1059,), dtype=int64).
Pickling array (shape=(20,), dtype=int64).
Pickling array (shape=(1057,), dtype=int64).
Pickling array (shape=(22,), dtype=int64).
Pickling array (shape=(1054,), dtype=int64).
Pickling array (shape=(2

[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:    3.1s
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    3.1s
[Parallel(n_jobs=-1)]: Done   3 tasks      | elapsed:    3.1s
[Parallel(n_jobs=-1)]: Done   4 tasks      | elapsed:    3.2s
[Parallel(n_jobs=-1)]: Done   5 tasks      | elapsed:    3.2s
Pickling array (shape=(1079, 19), dtype=float64).
Pickling array (shape=(1079,), dtype=int64).
Pickling array (shape=(1079,), dtype=int64).
Pickling array (shape=(1070,), dtype=int64).
Pickling array (shape=(9,), dtype=int64).
Pickling array (shape=(1067,), dtype=int64).
Pickling array (shape=(12,), dtype=int64).
Pickling array (shape=(1066,), dtype=int64).
Pickling array (shape=(13,), dtype=int64).
Pickling array (shape=(1061,), dtype=int64).
Pickling array (shape=(18,), dtype=int64).
Pickling array (shape=(1062,), dtype=int64).
Pickling array (shape=(17,), dtype=int64).
Pickling array (shape=(1068,), dtype=int64).
Pickling array (shape=(11,), dtype=int64).
Pickling array (shape=(

KeyboardInterrupt: 

In [45]:
print("Accuracy: ", accuracy)
print("Permutation Score: ", np.mean(permutation_scores))
print("P value: ", p_value)

Accuracy:  0.5509481738477935
Permutation Score:  0.4916073266806892
P value:  0.000999000999000999
