# AlgoLYNXathon EEG Dataset

This notebook goes through the process of reading EEG data in feather format.

Feather allows us to store/load entire pandas dataframes.

In [None]:
from pyarrow.feather import feather
import matplotlib.pyplot as plt

In [None]:
# Read the dataset
path = 'fill_in'
df = feather.read_feather(path)
df.head()

### Analysing EEG data

The datasets consist of 100 trials from 48 participants doing 3 trials each, each of which is 300 seconds long with sampling frequency 100 (30000 data points). Each session has a total of 61 electrode channels and all participants have been concatenated together. The dimensions of the dataframe are (8784, 30000) where each row represents a single channel.

Each electrode channel is one row in the dataframe, each column is a timestep. To plot a single row, we can use the `iloc` method.

In [None]:
row_number = 0
plt.plot(df.iloc[row_number,])

To plot a subset of the columns, specify the range in the second position of `iloc` in the form `index_start:index_end`

In [None]:
# Plot the first 50 datapoints
plt.plot(df.iloc[0, 0:50])

### Using mne-bids to preprocess the dataset

If your data is in bids format, use the following code to create preprocessed pandas dataframes with a filter of 1 Hz - 40 Hz and a resampling frequency 100 Hz.

In [None]:
import os
import os.path as op
import openneuro
import matplotlib.pyplot as plt

import mne
from mne.datasets import sample
from mne_bids import BIDSPath, read_raw_bids, print_dir_tree, make_report
import pandas as pd
import pyarrow.feather as feather
from tqdm import tqdm

In [None]:
dataset = 'ds003685'
bids_root = op.join(op.dirname(sample.data_path()), dataset)
datatype = 'eeg'
session = 'session1' # change to session2, session3 after done with first one
task = 'mathematic' # change when finished
suffix = 'eeg'
bids_path = BIDSPath(task=task,
                     suffix=suffix, datatype=datatype, root=bids_root)

In [None]:
# get names of the .vhdr files
basenames = []
for i in tqdm(range(0, 440)):
    if bids_path.match()[i].basename[-4:] == 'vhdr':
        basenames.append(bids_path.match()[i])

In [None]:
# iterate through all files and create preprocessed dataframes
df = pd.DataFrame()
for basename in tqdm(basenames):
    raw = read_raw_bids(bids_path=basename, verbose=False)
    raw.load_data()
    raw = raw.filter(1,40)
    raw = raw.resample(100, npad="auto") 
    raw_data = raw.get_data()
    df1 = pd.DataFrame(raw_data)
    df = pd.concat([df, df1], sort=True)

In [None]:
# save dataframe in feather format
feather.write_feather(df, 'filename.feather')

### Download Sample Data from IPFS

The sample data contains 30 seconds of the trial 1 for the first 30 participants, all electrodes included.