In [0]:
%matplotlib inline
import pandas as pd


<img src="https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/header.png" alt="drawing"/>


A multi-channel electroencephalography (EEG) system enables a broad range of applications including neurotherapy, biofeedback, and brain computer interfacing. The dataset you will analyse is created with the [Emotiv EPOC+](https://www.emotiv.com/product/emotiv-epoc-14-channel-mobile-eeg).  

It has 14 EEG channels with names based on the International 10-20 locations: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4:

<br/>
<br/>
<center>
<img src="https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/EEG.png" alt="drawing" width="200"/>
<center/>
<br/>
<br/>


All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. 

The experiment was conducted on one person only. The duration of the measurement was around 117 seconds.

From the paper:

> *The experiment was carried out in a quiet room. During
the experiment, the proband was being videotaped. To prevent
artifacts, the proband was not aware of the exact start time
of the measurement. Instead, he was told to sit relaxed, look
straight to the camera, and change the eye state at free will.
Only additional constraint was that, accumulated over the
entire session, the duration of both eye states should be about
the same and that the individual intervals should vary greatly
in length (from eye blinking to longer stretches)...*

The eye state was detected via a camera during the EEG measurement and later added manually to the file after analyzing the video frames. 

A label '1' indicates the eye-closed and '0' the eye-open state.

(*Source: Oliver Roesler, Stuttgart, Germany*)

Let's load the train and test set:

In [0]:
trainset = pd.read_csv("https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/eeg_train.csv")

testset = pd.read_csv("https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/eeg_test.csv")

sample_submission = pd.read_csv("https://raw.githubusercontent.com/sdgroeve/Machine_Learning_course_UGent_D012554_kaggle/master/sample_submission.csv")


You will fit a model on the trainset and make predictions on the testset. 

To submit these predictions to Kaggle you need to write a .csv file with two columns: 
- `index` that matches the `index` column in the test set.
- `label` which is your prediction.

Here is an example predictions file for Kaggle:

In [14]:
trainset.head

<bound method NDFrame.head of           AF3       F7       F3      FC5  ...       F4       F8      AF4  label
0     4299.49  3997.44  4277.95  4116.92  ...  4278.97  4600.00  4369.23      1
1     4302.05  3985.64  4261.03  4129.74  ...  4283.08  4607.18  4358.46      0
2     4321.03  4015.90  4265.13  4122.56  ...  4286.15  4608.21  4371.79      0
3     4408.21  4104.10  4380.00  4232.31  ...  4388.21  4715.90  4464.10      0
4     4347.18  3975.38  4266.67  4102.56  ...  4313.33  4664.10  4411.79      1
...       ...      ...      ...      ...  ...      ...      ...      ...    ...
1995  4211.79  4015.90  4230.26  4107.69  ...  4240.51  4544.62  4265.13      1
1996  4268.72  4035.38  4237.95  4112.82  ...  4250.77  4586.67  4321.54      0
1997  4287.69  4007.69  4267.18  4128.21  ...  4260.51  4597.44  4353.33      0
1998  4297.95  4031.79  4275.90  4147.69  ...  4279.49  4604.10  4340.51      0
1999  4303.08  4010.26  4270.77  4148.21  ...  4281.54  4626.67  4349.23      0

[2000 row

In [15]:
testset.head

<bound method NDFrame.head of            AF3       F7       F3      FC5  ...       F4       F8      AF4  index
0      4296.41  4040.51  4253.33  4124.10  ...  4268.72  4598.46  4342.56      0
1      4291.28  3994.36  4247.18  4102.56  ...  4260.51  4593.33  4337.95      1
2      4299.49  4019.49  4269.74  4116.41  ...  4280.51  4596.92  4350.26      2
3      4280.00  4004.62  4263.59  4120.51  ...  4271.79  4608.72  4344.10      3
4      4317.44  3968.72  4260.51  4101.54  ...  4282.05  4592.31  4372.82      4
...        ...      ...      ...      ...  ...      ...      ...      ...    ...
12887  4278.97  3986.67  4243.08  4116.41  ...  4267.18  4597.44  4343.08  12887
12888  4319.49  3998.46  4267.18  4106.67  ...  4286.15  4615.38  4386.67  12888
12889  4286.15  4006.67  4259.49  4122.56  ...  4269.74  4610.26  4350.26  12889
12890  4376.92  4024.10  4278.97  4120.00  ...  4305.13  4643.08  4442.56  12890
12891  4337.44  4038.97  4300.51  4142.56  ...  4308.21  4636.92  4401.03  1289

In [16]:
sample_submission.head(10)

Unnamed: 0,index,label
0,0,0.168801
1,1,0.124169
2,2,0.947757
3,3,0.069585
4,4,0.635325
5,5,0.659027
6,6,0.653697
7,7,0.85003
8,8,0.160489
9,9,0.843272


Make sure to save your results without the extra Pandas index column that is written by default:

In [0]:
filename = "my_prediction_results.csv"

#make sure to not write the Pandas index column (index=False)
sample_submission.to_csv(filename,index=False)