In [0]:
%matplotlib inline
import pandas as pd


<img src="https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/header.png" alt="drawing"/>


A multi-channel electroencephalography (EEG) system enables a broad range of applications including neurotherapy, biofeedback, and brain computer interfacing. The dataset you will analyse is created with the [Emotiv EPOC+](https://www.emotiv.com/product/emotiv-epoc-14-channel-mobile-eeg).  

It has 14 EEG channels with names based on the International 10-20 locations: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4:

<br/>
<br/>
<center>
<img src="https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/EEG.png" alt="drawing" width="200"/>
<center/>
<br/>
<br/>


All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. 

The experiment was conducted on one person only. The duration of the measurement was around 117 seconds.

From the paper:

> *The experiment was carried out in a quiet room. During
the experiment, the proband was being videotaped. To prevent
artifacts, the proband was not aware of the exact start time
of the measurement. Instead, he was told to sit relaxed, look
straight to the camera, and change the eye state at free will.
Only additional constraint was that, accumulated over the
entire session, the duration of both eye states should be about
the same and that the individual intervals should vary greatly
in length (from eye blinking to longer stretches)...*

The eye state was detected via a camera during the EEG measurement and later added manually to the file after analyzing the video frames. 

A label '1' indicates the eye-closed and '0' the eye-open state.

(*Source: Oliver Roesler, Stuttgart, Germany*)

Let's load the train and test set:

In [0]:
trainset = pd.read_csv("https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/eeg_train.csv")

testset = pd.read_csv("https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/eeg_test.csv")

sample_submission = pd.read_csv("https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/sample_submission.csv")


You will fit a model on the trainset and make predictions on the testset. 

To submit these predictions to Kaggle you need to write a .csv file with two columns: 
- `index` that matches the `index` column in the test set.
- `label` which is your prediction.

Here is an example predictions file for Kaggle:

In [76]:
sample_submission.head(10)

Unnamed: 0,index,label
0,0,0.168801
1,1,0.124169
2,2,0.947757
3,3,0.069585
4,4,0.635325
5,5,0.659027
6,6,0.653697
7,7,0.85003
8,8,0.160489
9,9,0.843272


Make sure to save your results without the extra Pandas index column that is written by default:

In [0]:
train_label = trainset.pop('label')

In [78]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
import numpy as np

model = LogisticRegression(C=1, max_iter= 10000)
scores = cross_val_score(model , trainset , train_label, cv = 10 )


print(np.mean(scores))


0.631


In [79]:
from sklearn.model_selection import GridSearchCV

search_space = [0.001,0.01,0.1,1,10,100]

###Start code here
model = LogisticRegression(max_iter = 10000)  #limit has to be increased 

parameters = dict( C = search_space)

grid_search = GridSearchCV(model, param_grid = parameters)

grid_search.fit(trainset, train_label)

for mean_score, std , params in zip(grid_search.cv_results_['mean_test_score'], grid_search.cv_results_['std_test_score'], grid_search.cv_results_['params']):
  print(mean_score,std*2,params)


###End code here

print(grid_search.best_estimator_)

0.627 0.02745906043549193 {'C': 0.001}
0.6275000000000001 0.03301514803843832 {'C': 0.01}
0.6285000000000001 0.035580893749314356 {'C': 0.1}
0.627 0.035128336140500566 {'C': 1}
0.6275000000000001 0.0339116499156263 {'C': 10}
0.6275000000000001 0.03376388603226827 {'C': 100}
LogisticRegression(C=0.1, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)


In [89]:
#index = testset.pop('index')

model = LogisticRegression(C=.1)

model.fit(trainset,train_label)

predictions = model.predict_proba(testset)

predictions[:,1]


array([0.2266744 , 0.36980789, 0.3123071 , ..., 0.47596087, 0.62913298,
       0.43349393])

In [91]:
my_prediction_results = pd.DataFrame({'index': index , 'label': predictions[:,1]})
my_prediction_results

Unnamed: 0,index,label
0,0,0.226674
1,1,0.369808
2,2,0.312307
3,3,0.396771
4,4,0.808697
...,...,...
12887,12887,0.440925
12888,12888,0.554251
12889,12889,0.475961
12890,12890,0.629133


In [0]:
filename = "my_prediction_results.csv"

#make sure to not write the Pandas index column (index=False)
#my_prediction_results.to_csv(filename,index=False)
my_prediction_results.to_csv(filename , index= False)