# About
Use this notebook to extract ECG data from the DREAMER dataset. DREAMER has 22 participants watching 17 movie clips. This notebook will generate one CSV for each Person-MovieClip combo named like so:
    person_#_clip_#.csv

The CSV will have 3 columns:
    TIMESTAMP, CHANNEL1, CHANNEL2

DREAMER does not provide a start time for each ECG, so the TIMESTAMP starts at 0 and represents the elapsed time from the start of the trial recording.

NOTE: This will generate 3.2GB of CSV files.

# Setup
Before using this notebook, you must use MATLAB to extract the DREAMER.Data structure from the dataset's DREAMER.mat file, and save it as a JSON file. To do that, use Matlab or Octave and do the following:
```matlab
dreamer=load('DREAMER.mat')

fid=fopen('DREAMER_Data.json')
fprintf(fid, '%s', jsonencode(dreamer.Data)
fclose(fid)
```

*NOTE:* if you try to encode the entire `dreamer` struct, you'll get a truncated JSON file due to some 2GB file limit ... I'm sure there's a way to work around that, but we only need the `dreamer.Data` value anyway, and it fits nicely within the limit.

Finally, set PATH_TO_DREAMERDATA_JSON in the first cell to the path of your DREAMER_Data.json file.

In [1]:
import numpy as np
import json

# Path to the extracted contents of ASCERTAIN_Raw's ECGData.zip
PATH_TO_DREAMERDATA_JSON='/home/timcsf/PycharmProjects/pythonProject/dreamer/DREAMER_Data.json'

# Path to write CSV files for each ECG
PATH_TO_OUTPUT_ECGCSV='./processed'

# ECG in dreamer are samples at 256Hz
ECG_SAMPLE_RATE=256

In [None]:
dreamer_data_json_file=open(PATH_TO_DREAMERDATA_JSON)
dreamer_data=json.load(dreamer_data_json_file)
dreamer_data_json_file.close()

In [None]:
PARTICIPANT_COUNT=len(dreamer_data)

for p in range(PARTICIPANT_COUNT):
    participantData     = dreamer_data[p]
    participantECGData  = participantData['ECG']

    # participantECGData has two fields: baseline and stimuli
    # baseline is an ECG recorded watching a 'neutral' video clip
    # stimuli has the ECG recordings taken during each of the emotional movie clips.
    stimuli             = participantECGData['stimuli']

    MOVIE_CLIP_COUNT = len(stimuli)
    for c in range(MOVIE_CLIP_COUNT):
        ecg_data = np.array(stimuli[c])
        number_of_samples=len(ecg_data)

        ecg_duration = number_of_samples/ECG_SAMPLE_RATE
        sample_interval = 1/ECG_SAMPLE_RATE

        # generate timestamps from 0 to ecg_duration for each ECG sample, and append to the ecg_data
        # probably not needed but this is how I had formatted the data from ASCERTAIN and I need things to match.
        ts=np.arange(0, ecg_duration, sample_interval).reshape(-1,1)
        data=np.append(ts,ecg_data,1)

        np.savetxt(f'{PATH_TO_OUTPUT_ECGCSV}/person_{p}_clip_{c}.csv', data, delimiter=',')