# Converting the MATLAB Data Structure to a Pandas Dataframe

In [1]:
import numpy as np
import pandas as pd
from mne.externals.pymatreader import read_mat
import warnings

  from scipy.io.matlab.miobase import get_matfile_version


In [2]:
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    raw = read_mat('/Users/anastasiakuzmich/Downloads/Marios_data/Data.mat')

raw.keys()

dict_keys(['__header__', '__version__', '__globals__', 'CarData', 'FaceData', 'CarData2', 'FaceData2', 'AllData', 'AllData2', 'stiml', 'stim'])

## First Dataframe: EEG Signals

In [3]:
data = raw['AllData']
data.shape

(60, 333900)

In [4]:
neural_data = pd.DataFrame(data=data.T)
neural_data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,50,51,52,53,54,55,56,57,58,59
0,4.101563,0.488281,-1.074219,-1.074219,-0.78125,-1.220703,-1.220703,-1.220703,-0.585938,-1.171875,...,-0.146484,-0.634766,-4.44336,0.585938,0.341797,0.0,-0.341797,-0.830078,-1.220703,-0.878906
1,4.296875,-0.244141,-1.220703,-0.927734,-0.634766,-1.31836,-1.269531,-1.31836,-0.292969,-1.074219,...,-0.341797,-0.830078,-4.44336,0.341797,0.048828,-0.195313,-0.488281,-1.074219,-1.367188,-1.025391
2,3.417969,-0.78125,-1.171875,-0.439453,-0.439453,-1.123047,-1.123047,-1.220703,-0.146484,-0.878906,...,-0.537109,-0.830078,-4.052735,0.341797,0.048828,-0.244141,-0.488281,-1.025391,-1.025391,-0.732422
3,1.953125,-0.732422,-0.878906,-0.048828,-0.146484,-0.830078,-0.927734,-1.025391,0.048828,-0.634766,...,-0.488281,-0.585938,-3.466797,0.634766,0.390625,-0.097656,-0.292969,-0.78125,-0.585938,-0.341797
4,1.123047,-0.341797,-0.585938,-0.244141,0.0,-0.488281,-0.976563,-1.025391,-0.048828,-0.585938,...,-0.292969,-0.341797,-2.929688,1.31836,1.074219,0.146484,0.0,-0.439453,-0.292969,-0.097656


In [5]:
neural_data.to_pickle("neural_data.pkl")

## Second Dataframe: Events

In [6]:
event_code = raw['stiml']
face_or_car = raw['stim']

In [7]:
events_data = pd.DataFrame(data={"event_code" : event_code, 
                                 "face_or_car" : face_or_car})
events_data

Unnamed: 0,event_code,face_or_car
0,1,1
1,1,1
2,1,1
3,1,1
4,1,1
...,...,...
472,12,2
473,12,2
474,12,2
475,12,2


✏️ The event codes correspond to a given coherence level and face/car stimulus combination as follows:

    - 1: face, level 1
    - 2: face, level 2
    - 3: face, level 3
    - 4: face, level 4
    - 5: face, level 5
    - 6: face, level 6
    - 7: car, level 1
    - 8: car, level 2
    - 9: car, level 3
    - 10: car, level 4
    - 11: car, level 5
    - 12: car, level 6
    
I will therefore add an additional column that will indicate solely the coherence of the stimulus, regardless of its type.

In [8]:
def map_difficulty_level(x):
    if x == 7:
        return 1
    elif x == 8:
        return 2
    elif x == 9:
        return 3
    elif x == 10:
        return 4
    elif x == 11:
        return 5
    elif x == 12:
        return 6
    else:
        return x

In [9]:
events_data['coherence_level'] = events_data['event_code'].map(lambda x:map_difficulty_level(x))

In [27]:
events_data

Unnamed: 0,event_code,face_or_car,coherence_level
0,1,1,1
1,1,1,1
2,1,1,1
3,1,1,1
4,1,1,1
...,...,...,...
472,12,2,6
473,12,2,6
474,12,2,6
475,12,2,6


In [26]:
events_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 477 entries, 0 to 476
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   event_code       477 non-null    uint8
 1   face_or_car      477 non-null    uint8
 2   coherence_level  477 non-null    int64
dtypes: int64(1), uint8(2)
memory usage: 4.8 KB


In [11]:
events_data.to_pickle("events_data.pkl")

## Dataframe Descriptions

✏️ We now have two pandas DataFrames that can be used for machine learning operations. The data can be described as follows:

### Recorded EEG Signals: neural_data.pkl

#### Data Description

- This  dataframe contains 333900 rows and 60 columns. 
- The recorded values are the electrical signals produced by the brain, recorded in volts (V). 
- Each row represents a millisecond, while each column is the channel at which the signal was recorded. 
- The channels are encoded as follows:

        0 : Fp1
        1 : Fp2
        2 : Fpz
        3 : AF3
        4 : AF4
        5 : AF7
        6 : AF8
        7 : F1
        8 : F3
        9 : F5
        10 : F7
        11 : Fz
        12 : F2
        13 : F4
        14 : F6
        15 : F8
        16 : FT7
        17 : FC5
        18 : FC3
        19 : FC1
        20 : FCz
        21 : FC2
        22 : FC4
        23 : FC6
        24 : FT8
        25 : C1
        26 : C3
        27 : C5
        28 : Cz
        29 : C2
        30 : C4
        31 : C6
        32 : T8
        33 : TP7
        34 : TP8
        35 : T7
        36 : CP5
        37 : CP3
        38 : CP1
        39 : CP6
        40 : CP4
        41 : CP2
        42 : CPz
        43 : P1
        44 : P3
        45 : P5
        46 : P7
        47 : Pz
        48 : P2
        49 : P4
        50 : P6
        51 : P8
        52 : PO7
        53 : PO3
        54 : POz
        55 : PO4
        56 : PO8
        57 : O1
        58 : Oz
        59 : O2
        
These channels correspond to the following locations of the scalp electrodes:

![cap_64_layout_medium.jpeg](attachment:cap_64_layout_medium.jpeg)

In [28]:
events_data

Unnamed: 0,event_code,face_or_car,coherence_level
0,1,1,1
1,1,1,1
2,1,1,1
3,1,1,1
4,1,1,1
...,...,...,...
472,12,2,6
473,12,2,6
474,12,2,6
475,12,2,6


### Event Codes: events_data.pkl

#### Data Dictionary

**event_code**: trial conditions in terms of the stimulus (face/car) and coherence level, encoded as follows:
    
    1 = Face + Coherence Level 1 (20%)
    2 = Face + Coherence Level 2 (25%)
    3 = Face + Coherence Level 3 (30%)
    4 = Face + Coherence Level 4 (35%)
    5 = Face + Coherence Level 5 (40%)
    6 = Face + Coherence Level 6 (45%)
    7 = Car + Coherence Level 1 (20%)
    8 = Car + Coherence Level 2 (25%)
    9 = Car + Coherence Level 3 (30%)
    10 = Car + Coherence Level 4 (35%)
    11 = Car + Coherence Level 5 (40%)
    12 = Car + Coherence Level 6 (45%)
   
**face_or_car**: whether the stimulus of a face of a car was shown

    1 - face, 2 - car

**coherence_level**: the level of image coherence as a function of noise, aka how "easy" or "hard" it was to distinguish the stimulus within the image

    1 - 20% coherence (hardest)
    2 - 25% coherence 
    3 - 30% coherence 
    4 - 35% coherence 
    5 - 40% coherence 
    6 - 45% coherence (easiest)

#### Data Description

- This dataframe is structured as 477 rows × 3 columns. 
- The rows represent the nature of the stimulus shown to the participant, and thus the trial conditions the neural data were recorded under. 
- Each of the 477 trials was 700 milliseconds long, thus the conditions denoted in the first row of this dataframe correspond to the first 700 rows (and thus the 700ms of recorded signal) in the main neural_data dataframe, the second row of this dataframe corresponds the conoditions used when acquiring the next 700 rows in the neural dataframe etc. The 477 trials multiplied by the 700ms each trial lasted makes up the total of 333900 milliseconds, which is the length of the neural_data dataframe. 