# offline_experiment_unicorn-clean_data

This python notebook is used to clean up and reformat the data generated from Unicorn Recorder when running the experiment from [offline_experiment_unicorn.py](offline_experiment_unicorn.py).

The code looks first add column headers to columns for EEG data and stimulation markers as `EEG_Ch1(mV)`,	`EEG_Ch2(mV)`,	`EEG_Ch3(mV)`,	`EEG_Ch4(mV)`,	`EEG_Ch5(mV)`,	`EEG_Ch6(mV)`,	`EEG_Ch7(mV)`,	`EEG_Ch8(mV)` and `stim` respectively. 

It then for the small gaps in the `stim` column due to the lag in keyboard input, and creates a new column at the end called `filled_stim`.

It is necessary to specify the file path of the original data file, and the output will will be saved under the same directory with the same file name, but with a suffix of `_cleaned`.

When decoding, please use the raw EEG channels as input data, and use `filled_stim` as labels.

In [3]:
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt

file_path = r"~\Downloads\Katie_7-10_EEG.csv"
df = pd.read_csv(file_path,header=None)

num_columns = df.shape[1]
col_headers = ["" for _ in range(num_columns)]

for i in range(num_columns):
    if  i < 8: col_headers[i] = f"EEG_Ch{i+1}(mV)"
    if i == num_columns-1: col_headers[i] = "stim"

df.columns = col_headers

print(df.head())

   EEG_Ch1(mV)  EEG_Ch2(mV)  EEG_Ch3(mV)  EEG_Ch4(mV)  EEG_Ch5(mV)  \
0          0.0          0.0          0.0          0.0          0.0   
1          0.0          0.0          0.0          0.0          0.0   
2          0.0          0.0          0.0          0.0          0.0   
3          0.0          0.0          0.0          0.0          0.0   
4          0.0          0.0          0.0          0.0          0.0   

   EEG_Ch6(mV)  EEG_Ch7(mV)  EEG_Ch8(mV)                                      \
0          0.0          0.0          0.0  0.052  0.970  0.150  0.0  0.0  0.0   
1          0.0          0.0          0.0  0.061  0.995  0.145  0.0  0.0  0.0   
2          0.0          0.0          0.0  0.062  1.006  0.142  0.0  0.0  0.0   
3          0.0          0.0          0.0  0.067  1.011  0.139  0.0  0.0  0.0   
4          0.0          0.0          0.0  0.069  1.011  0.141  0.0  0.0  0.0   

                         stim  
0  1  93.333  1   0.000     0  
1  2  93.333  1  14.298     0  
2 

In [4]:
def fill_gaps(series):
    result = series.copy()
    non_zero_idx = np.where(series != 0)[0]

    for i in range(len(non_zero_idx) -1 ):
        start_idx = non_zero_idx[i]
        end_idx = non_zero_idx[i+1]
        if series[end_idx] == 9:
            result[start_idx:end_idx+1] = series[start_idx]
    return result

df['stim'] = fill_gaps(df['stim'])

print(df.head())

file_path_parts = file_path.rsplit('.', 1)
new_file_path = file_path_parts[0] + '_cleaned.' + file_path_parts[1]
df.to_csv(new_file_path) 

print(f"Reformatted data saved to {new_file_path}")

   EEG_Ch1(mV)  EEG_Ch2(mV)  EEG_Ch3(mV)  EEG_Ch4(mV)  EEG_Ch5(mV)  \
0          0.0          0.0          0.0          0.0          0.0   
1          0.0          0.0          0.0          0.0          0.0   
2          0.0          0.0          0.0          0.0          0.0   
3          0.0          0.0          0.0          0.0          0.0   
4          0.0          0.0          0.0          0.0          0.0   

   EEG_Ch6(mV)  EEG_Ch7(mV)  EEG_Ch8(mV)                                      \
0          0.0          0.0          0.0  0.052  0.970  0.150  0.0  0.0  0.0   
1          0.0          0.0          0.0  0.061  0.995  0.145  0.0  0.0  0.0   
2          0.0          0.0          0.0  0.062  1.006  0.142  0.0  0.0  0.0   
3          0.0          0.0          0.0  0.067  1.011  0.139  0.0  0.0  0.0   
4          0.0          0.0          0.0  0.069  1.011  0.141  0.0  0.0  0.0   

                         stim  
0  1  93.333  1   0.000     0  
1  2  93.333  1  14.298     0  
2 