# Dataset
We will explore this dataset: https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State#

> All data is from one continuous EEG measurement with the Emotiv EEG Neuroheadset. The duration of the measurement was 117 seconds. The eye state was detected via a camera during the EEG measurement and added later manually to the file after analysing the video frames. '1' indicates the eye-closed and '0' the eye-open state. All values are in chronological order with the first measured value at the top of the data.

In [1]:
import tensorflow as tf
data_dir = "../../data/raw"
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00264/EEG%20Eye%20State.arff"
datapath = tf.keras.utils.get_file(
        "eeg", origin=url, untar=False, cache_dir=data_dir
    )

You can load the arff file with scipy

In [None]:
from scipy.io import arff
data = arff.loadarff(datapath)

The data is a tuple of a description and observations

In [None]:
len(data), type(data)

Description

In [None]:
data[1]

There are about 15k observations

In [None]:
len(data[0])

The observations are tuples of floats and a byte as label

In [None]:
data[0][0]

In [None]:
labels = []
for x in data[0]:
    labels.append(int(x[14]))

In [None]:
import numpy as np
np.array(labels).mean()

About 45% of the data has closed eyes.

# Excercises 1

- create a get_eeg function that downloads the data to a given path
- build a Dataset that yields a $X, y$ tuple of tensors. $X$ should be sequential in time. Remember: a dataset should implement `__get_item__` and `__len__`.
- note that you could model this as both a classification task, but also as a sequence-to-sequence task! For this excercise, make it a classification task with consecutive 0s or 1s only.
- Note that, for a training task, a seq2seq model will probably be more realistic. However, the classification is a nice excercise because it is harder to set up.
- figure out what the length distribution is of your dataset: how many timestamps do you have for every consecutive sequence of 0s and 1s? On average, median, min, max?
- create a dataloader that yields timeseries with (batch, sequence_lenght). You can implement: windowed, padded and batched.
    1. yielding a windowed item should be the easy level
    2. yielding windowed and padded is medium level 
    3. yielding windowed, padded and batched is expert level, because the windowing will cause the timeseries to have different sizes. You will need to buffer before you can yield a batch.

1. Upload this to github. 
2. Put your dev notebooks in a seperate folder
3. Put all your functions in the src folder
4. Use a formater & linter
5. Add a single notebook, that sources the src folder. Indicate which level you got (1, 2 or 3)
6. and that shows your dataloader works:
    - it should not give errors because it runs out of data! Either let is stop by itself, or run forever.
    - batchsize should be consistent (in case 1 and 2, batchsize is 1)
    - sequence length is allowed to vary

The first excercise is ex1, this one is ex2. You will get $max(ex1, average(ex1, ex2))$ as a final remark.
Level 3 can get you an 11, because it exceeds expectation.

# Excercise 2
- build a Dataset that yields sequences of X, y. This time, y is a sequence and can contain both 0s and 1s
- create a Dataloader with this
- Test appropriate architectures (RNN, Attention)
- for the loss, note that you will need a BCELoss instead of a CrossEntroyLoss

# BCELoss example
In this example, which input would you prefer for the given target?

In [None]:
import torch
import torch.nn as nn

input1 = torch.tensor([0.1, 0.1, 0.7, 0.9])
input2 = torch.tensor([0.1, 0.3, 0.6, 0.7])
target = torch.tensor([0., 0., 1., 1.])

So, which loss should you pick? CrossEntropyLoss won't work:

In [None]:
loss = nn.CrossEntropyLoss()
try:
    loss(input1, target)
except Exception as e:
    print(e)

You will need BCELoss for this.
Binary cross entropy loss works like this:
$$X = {x_i, \dots, x_n}$$

$$l_i =-(y_i \cdot log(x_i) + (1-y_i) \cdot log(1-x_i))$$
$$BCELoss = mean(l)$$

Note that the labels are assumed to be either 0 or 1 (hence, the binary part).
If a label is 0, only the second part is relevent. If the label is 1, only the first part is relevant. the default reduction is "mean":

$$
BCEloss = 
\begin{cases}
mean(-log(1 - x_i)) & \text{if\,} y = 0\\
mean(-log(x_i)) & \text{if\,} y = 1
\end{cases}
$$



 We can see this works nice for a sequence of 0s and 1s.
You can see that input1 is preferred, because it is more certain of the cases.

In [None]:
loss = nn.BCELoss()
loss(input1, target), loss(input2, target)

Or a more generic example

In [None]:
m = nn.Sigmoid() # make sure outputs are between 0 and 1
X = torch.randn(100) # generate 100 random inputs
yhat = m(X) # our dummy model

p = torch.ones_like(yhat) / 2
y = torch.bernoulli(p) # we create a random label sequence of 0s and 1s
loss(yhat, y)