# Experiment 7: First Naive Classifier
In this experiment we will be using the very simple processing pipeline from [commit #7](https://github.com/BIAPT/eeg-pain-detection/commits/master) to put together all the plumbing necessary to go from raw data to a trained model.

The model will only be trained on one feature: **Average Alpha Power Across All Channels**
This feature was choosen out of analysis convenience, a better feature would have been power at difference hemisphere however I would have had to deal with missing channels and such. 

The code that will be presented below is very easy to augment by feeding more feature in the data.csv file made by the `preprocessing_pipeline.m` file.

The classifier that will be used here is the very simple [linear support vector machine from sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html).

The cross validation scheme we will be using is a leave one subject out cross validation. We'll assess the classifier performance using accuracy (not ideal, but will do for now). 

In [23]:
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Setup
input_filename = '/media/yacine/Data/pain_and_eeg/machine_learning_data/data.csv'

# Experimental Variables
is_healthy = 1
clf = SVC(gamma='auto')


df = pd.read_csv(input_filename) 
p_ids = df.id.unique();

# Select the right dataset
df = df[df.type == is_healthy]

accuracies = []
for p_id in p_ids:
    mask = df.id.isin([p_id])
    
    df_test = df[mask]
    df_train = df[~mask]
    
    
    # Here we are using reshape because we have only one feature and the fit function is looking for 2d array
    X_train = df_train.avg_alpha_power.to_numpy().reshape(-1,1)
    y_train = df_train.is_hot
    X_test = df_test.avg_alpha_power.to_numpy().reshape(-1,1)
    y_test = df_test.is_hot
    
    
    print(X_train)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    
print(np.mean(accuracies))    

[[ 0.62014]
 [ 0.4635 ]
 [ 0.7438 ]
 [ 0.78269]
 [ 0.43052]
 [ 0.75476]
 [ 0.48378]
 [ 0.32518]
 [ 0.76571]
 [ 1.0274 ]
 [ 0.84681]
 [ 0.93191]
 [ 0.72969]
 [ 1.3287 ]
 [ 0.95487]
 [ 0.90529]
 [ 0.40122]
 [ 0.48202]
 [ 0.77752]
 [ 0.63647]
 [ 0.50111]
 [ 0.83107]
 [ 0.65358]
 [ 0.76707]
 [ 0.51502]
 [ 0.74822]
 [ 1.4768 ]
 [ 0.83127]
 [ 1.5101 ]
 [ 0.8711 ]
 [ 0.87362]
 [ 0.86094]
 [ 1.6323 ]
 [ 0.88557]
 [ 1.0781 ]
 [ 1.1071 ]
 [ 1.7147 ]
 [ 1.5133 ]
 [ 1.5373 ]
 [ 0.97453]
 [ 2.5863 ]
 [ 1.377  ]
 [ 1.638  ]
 [ 0.73973]
 [ 2.4614 ]
 [ 1.8635 ]
 [ 1.2192 ]
 [ 2.0332 ]
 [ 4.7213 ]
 [ 2.7798 ]
 [ 3.7454 ]
 [ 6.4041 ]
 [ 4.4027 ]
 [ 3.2286 ]
 [ 2.6088 ]
 [ 3.5881 ]
 [ 3.1305 ]
 [ 6.6794 ]
 [ 4.2866 ]
 [ 2.8286 ]
 [ 1.6893 ]
 [ 2.6021 ]
 [ 2.7875 ]
 [ 2.2076 ]
 [ 2.222  ]
 [ 3.437  ]
 [ 4.1587 ]
 [ 3.2694 ]
 [ 3.0044 ]
 [ 4.6793 ]
 [ 3.6487 ]
 [ 7.5468 ]
 [ 2.0728 ]
 [ 5.2167 ]
 [ 8.0722 ]
 [12.387  ]
 [ 4.0027 ]
 [ 9.4641 ]
 [ 4.5754 ]
 [ 2.1855 ]
 [ 1.8245 ]
 [ 1.8144 ]
 [ 2.2404 ]
 [ 2

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required.

In [8]:
df

Unnamed: 0,id,type,avg_alpha_power,is_hot
0,1,1,1.5982,0
1,1,1,2.4383,0
2,1,1,2.0472,0
3,1,1,2.9419,0
4,1,1,2.9029,0
...,...,...,...,...
1342,64,0,2.9964,1
1343,64,0,3.3376,1
1344,64,0,6.8673,1
1345,64,0,4.4508,1
