# Biosignal Data Processing

This notebook serves to process biosingal data collected from Emaptica4.
```
user_id:
    eda.json
    hr.json
    
# Inside a eda.json
    session_id: 123 #in case we have multiple data by the same user
    user_id: 478
    tags: ["pilot", "bio", "time-series"]  # time-series tells us to expect a "data" key
    name: "EDA"
    description: "Electrodermal activity"
    channels: 1
    sampling_frequency: 3 #Hertz
    units: "microsiemens"
    ... # any other metadata
    timestamp: 12301923012390 #unix
    data: [1, 2, 3, 4, ..., 100900]
    
# Inside video.json
    session_id: 123 #in case we have multiple data sessions with the same user
    user_id: 478
    tags: ["pilot", "video", "resource"] # resource tells us to expect a "url" key
    name: "SC"
    description: "Screencapture of user screen"
    ...
    timestamp: 12301923012390 #unix
    url: "adjadllkf.mp4"
    
```

## Create and Save json Files

In [1]:
import json

def save_jsonfile(name, data):
    #file = name + ".json" #name is a string
    with open(name, 'w') as outfile:
        json.dump(data, outfile)
    print("File saved!", name)
    

## Bio_Data

.csv files in this archive are in the following format:
The first row is the initial time of the session expressed as unix timestamp in UTC.
The second row is the sample rate expressed in Hz.

### EDA.csv
Data from the electrodermal activity sensor expressed as microsiemens (μS).

### HR.csv
Average heart rate extracted from the BVP signal.The first row is the initial time of the session expressed as unix timestamp in UTC.
The second row is the sample rate expressed in Hz.

### Format of csv files
userID_sessionID_type.csv    
e.g: 25_0_HR.csv

## TODO:

* Need to generalize the following code to create specific folder under user folder
* code should automatically generate sessionID, userID, and name

In [2]:
BIO_ROOT = "data/biosignals/"
USER_ROOT = "data/users/"
import os
import pandas as pd
import numpy as np
import csv

# create .json file from csv files
def create_bio_json(bioName, sessionID, userID, name, sampFreq, timestamp, data):
    unit = ""
    description = ""
    if bioName == "HR":
        unit = "bpm"
        description = "Heart rate"
    if bioName == "EDA":
        unit = "microsiemens"
        description = "Electrodermal data"
    tags = ["bio", "time-series"]
    data = {"sessionID": sessionID, \
              "userID": userID, \
              "tags": tags, \
              "name": name, \
              "description": description, \
              "sampling_frequency": sampFreq, \
              "timestamp": timestamp, \
              "unit": unit, \
              "data": data \
             }
    filePath = USER_ROOT + bioName + ".json"
    # print(filePath)
    save_jsonfile(filePath, data)
    

# save the csv file into a .json file of the same name
def grab_and_save(filename):
    data = []
    file = BIO_ROOT + filename
    name = filename[:-4]
    #print(file)
    with open(file) as csvfile:
        csvreader = csv.reader(csvfile)
        timestamp = next(csvreader)[0]
        #print(timestamp)
        sampling_rate = next(csvreader)[0]
        #print(sampling_rate)
        for row in csvreader:
            data.append(row[0])
    # TODO: modify parameters
    # sessionID, userID, tags, name, description, sampFreq, data
    create_bio_json(name, 0, 25, "cesar",\
                    sampling_rate, timestamp, data)
    #create json file and save data into json file

# Iterate through the csv files in BIO_ROOT and generate json objects   
directory = os.fsencode(BIO_ROOT)

for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.endswith(".csv"):
        #print(filename[:-4])
        grab_and_save(filename)
    

File saved! data/users/HR.json
File saved! data/users/EDA.json


## Video Data

Video data are uploaded in data/videos. Coded csv files live in data/videos/coded. We will be aligning the video with biosignals using unix timestamp.

In [3]:
VIDEO_ROOT = "data/videos/"

import datetime, platform

# get the timestamp of the video
def get_video_timestamp(userID):
    uid = str(userID)
    raw_video = VIDEO_ROOT + uid + "H" + ".MOV"
    if platform.system() == "Darwin":
        posix_time = os.stat(raw_video).st_birthtime
#         t = datetime.datetime.fromtimestamp(posix_time).strftime(
#                 '%Y-%m-%dT%H:%M:%SZ')
        # print(t)
        # print(posix_time)
        # session = os.path.basename(raw_video).split('.')[0]
        # url = "/" + raw_video
        # print(url)
        return posix_time + 77 #just for this time. Adjust for delayed save of screencast

# For this specific subject run
get_video_timestamp(25)    

1530213266.0

## Biosignal + Video Data Analysis

In [4]:
CODED_ROOT = "data/videos/coded/"
from pprint import pprint

# calculates the difference in terms of unix timestamp (seconds) between the video and the biosignals.
# If difference is positive, then video is taken after the biosignal
# If difference is negative, then video is taken before the biosignal
def timestamp_difference(bioname, userID):
    videoT = get_video_timestamp(userID)
    print(videoT)
    filepath = USER_ROOT + bioname + ".json"
    with open(filepath) as f:
        data = json.load(f)
    bioT = float(data["timestamp"])
    difference = int(videoT) - int(bioT)
    return difference
    
# print(timestamp_difference("EDA", 25))
# print(timestamp_difference("HR", 25))

# A more general function of the above
def time_difference(videoT, bioname, userID):
    filepath = USER_ROOT + bioname + ".json"
    with open(filepath) as f:
        data = json.load(f)
    bioT = float(data["timestamp"])
    difference = int(videoT) - int(bioT)
    return difference


# Get the sampling frequency of this particular biodata
def get_frequency(bioname):
    filepath = USER_ROOT + bioname + ".json"
    with open(filepath) as f:
        data = json.load(f)
    frequency = int(float(data["sampling_frequency"]))
    return frequency

# Now calculate how much to index into the data array of biosignal json object to get matching data
def get_beginning_index(bioname, userID):
    difference = timestamp_difference(bioname, userID)
    frequency = get_frequency(bioname)
    return difference * frequency

# A more general function of the above
def get_adjusted_index(videoT, bioname, userID):
    difference = time_difference(videoT, bioname, userID)
    frequency = get_frequency(bioname)
    #print(frequency)
    #print(difference)
    return difference * frequency

# add spliced data to the biosignal json object
def add_spliced_data_to_json(bioname, startIndex):
    filepath = USER_ROOT + bioname + ".json"
    with open(filepath) as f:
        data = json.load(f)
    data["spliced_data"] = data["data"][startIndex:]
    # rewrite this dictionary back to json file
    save_jsonfile(filepath, data)
    
# Converts "HH:mm:ss.S" to number of seconds
def elapsed_to_seconds(elapsed):
    hour = 0
    minute = 0
    second = 0
    result = 0
    if len(elapsed) > 7:
        hour = int(elapsed[:2])
        minute = int(elapsed[3:5])
        second = int(elapsed[6:8])
        result = hour * 60 * 60 + minute * 60 + second
    else:
        minute = int(elapsed[:2])
        second = int(elapsed[3:5])
        result = minute * 60 + second
    return result
    
# Calculate average biosignal during stage
def average(bioname, stage, userID):
    file = CODED_ROOT + "25-" + stage + ".csv"
    bio_filepath = USER_ROOT + bioname + ".json"
    videoT = get_video_timestamp(userID)
    begin = 0
    end = 0
    spliced_data = []
    with open(file) as csvfile:
        csvreader = csv.reader(csvfile)
        next(csvreader) # pass the headings       
        for times in csvreader:
            begin = times[2]
            end = times[3]
            begin = elapsed_to_seconds(begin)
            end = elapsed_to_seconds(end)
            vT_begin = videoT + begin
            vT_end = videoT + end
            index_begin = get_adjusted_index(vT_begin, bioname, userID)
            index_end = get_adjusted_index(vT_end, bioname, userID)
            with open(bio_filepath) as bf:
                data = json.load(bf)
            for item in data["data"][index_begin:index_end]: # possibly need to add 1? Figure out
                # need to convert spliced data into a list of ints instead of strings
                spliced_data.append(item)
    spliced_data = list(map(float, spliced_data))
    count = len(spliced_data)
    result = sum(spliced_data)/count
    return result

# testing for heart rate of getting started
HR_GETSTART = average("HR", "Getting Started", 25)
EDA_GETSTART = average("EDA", "Getting Started", 25)
HR_SUCCESS = average("HR", "Success", 25)
EDA_SUCCESS = average("EDA", "Success", 25)
HR_ENCOUNTERDIFF = average("HR", "Encountering Difficulties", 25)
EDA_ENCOUNTERDIFF = average("EDA", "Encountering Difficulties", 25)
HR_DEALDIFF = average("HR", "Dealing with Difficulties", 25)
EDA_DEALDIFF = average("EDA", "Dealing with Difficulties", 25)

print("Average HR for Getting Started:")
print(HR_GETSTART)
print("Average EDA for Getting Started:")
print(EDA_GETSTART)
print("Average HR for Success:")
print(HR_SUCCESS)
print("Average EDA for Success:")
print(EDA_SUCCESS)  
print("Average HR for Encountering Difficulties:")
print(HR_ENCOUNTERDIFF)
print("Average EDA for Encountering Difficulties:")
print(EDA_ENCOUNTERDIFF)
print("Average HR for Dealing with Difficulties:")
print(HR_DEALDIFF)
print("Average EDA for Dealing with Difficulties:")
print(EDA_DEALDIFF)

    

Average HR for Getting Started:
70.63698598130843
Average EDA for Getting Started:
2.176238580023365
Average HR for Success:
95.77905405405407
Average EDA for Success:
2.7984748547297285
Average HR for Encountering Difficulties:
110.53377777777781
Average EDA for Encountering Difficulties:
3.026694483333332
Average HR for Dealing with Difficulties:
94.03788321167876
Average EDA for Dealing with Difficulties:
2.8561579133211685


## Finer Biosignal Analysis

We generate "cropped" biosignal csv files for each biosignal at each stage. There should be 8 biosignal files as of now generated. 


Naming convention:

GS - Getting Started

S - Success

ED - Encountering Difficulties

DD - Dealing with Difficulties

e.g: ED_HR.csv, ED_EDA.csv

The csv files will have a single column of all the biosignal measures during the specific stage.

In [5]:
CROPPED_ROOT = "data/cropped/"
def generate_csv_at_cropped(filename, mylist):
    name = CROPPED_ROOT + filename + ".csv"
    with open(name, 'w') as csvfile:
        wr = csv.writer(csvfile)
        for v in mylist:
            wr.writerow([v])

def shorten(stage):
    if stage == "Getting Started":
        return "GS"
    elif stage == "Success":
        return "S"
    elif stage == "Encountering Difficulties":
        return "ED"
    elif stage == "Dealing with Difficulties":
        return "DD"
    else:
        print("No match for this stage.")

def splice(bioname, stage, userID):
    file = CODED_ROOT + "25-" + stage + ".csv"
    bio_filepath = USER_ROOT + bioname + ".json"
    videoT = get_video_timestamp(userID)
    begin = 0
    end = 0
    spliced_data = []
    with open(file) as csvfile:
        csvreader = csv.reader(csvfile)
        next(csvreader) # pass the headings       
        for times in csvreader:
            begin = times[2]
            end = times[3]
            begin = elapsed_to_seconds(begin)
            end = elapsed_to_seconds(end)
            vT_begin = videoT + begin
            vT_end = videoT + end
            index_begin = get_adjusted_index(vT_begin, bioname, userID)
            index_end = get_adjusted_index(vT_end, bioname, userID)
            with open(bio_filepath) as bf:
                data = json.load(bf)
            for item in data["data"][index_begin:index_end]: # possibly need to add 1? Figure out
                # need to convert spliced data into a list of ints instead of strings
                spliced_data.append(item)
    spliced_data = list(map(float, spliced_data))
    #print(spliced_data)
    shortened = shorten(stage)
    filename = shortened + "_" + bioname
    generate_csv_at_cropped(filename, spliced_data)
    
splice("HR", "Encountering Difficulties", 25)
splice("EDA", "Encountering Difficulties", 25)
splice("HR", "Getting Started", 25)
splice("EDA", "Getting Started", 25)
splice("HR", "Success", 25)
splice("EDA", "Success", 25)
splice("HR", "Dealing with Difficulties", 25)
splice("EDA", "Dealing with Difficulties", 25)


# generate_csv_at_cropped("dog")

## Experiment with Tau Value

In [6]:
TAU = 18
# Calculate average biosignal during stage, but delayed by TAU seconds
def average_shifted(bioname, stage, userID):
    file = CODED_ROOT + "25-" + stage + ".csv"
    bio_filepath = USER_ROOT + bioname + ".json"
    videoT = get_video_timestamp(userID)
    begin = 0
    end = 0
    spliced_data = []
    with open(file) as csvfile:
        csvreader = csv.reader(csvfile)
        next(csvreader) # pass the headings       
        for times in csvreader:
            begin = times[2]
            end = times[3]
            begin = elapsed_to_seconds(begin) + TAU
            end = elapsed_to_seconds(end) + TAU
            vT_begin = videoT + begin
            vT_end = videoT + end
            index_begin = get_adjusted_index(vT_begin, bioname, userID)
            index_end = get_adjusted_index(vT_end, bioname, userID)
            with open(bio_filepath) as bf:
                data = json.load(bf)
            for item in data["data"][index_begin:index_end]: # possibly need to add 1? Figure out
                # need to convert spliced data into a list of ints instead of strings
                spliced_data.append(item)
    spliced_data = list(map(float, spliced_data))
    count = len(spliced_data)
    result = sum(spliced_data)/count
    return result

HR_GETSTART_S = average_shifted("HR", "Getting Started", 25)
EDA_GETSTART_S = average_shifted("EDA", "Getting Started", 25)
HR_SUCCESS_S = average_shifted("HR", "Success", 25)
EDA_SUCCESS_S = average_shifted("EDA", "Success", 25)
HR_ENCOUNTERDIFF_S = average_shifted("HR", "Encountering Difficulties", 25)
EDA_ENCOUNTERDIFF_S = average_shifted("EDA", "Encountering Difficulties", 25)
HR_DEALDIFF_S = average_shifted("HR", "Dealing with Difficulties", 25)
EDA_DEALDIFF_S = average_shifted("EDA", "Dealing with Difficulties", 25)
print("Average HR for Getting Started:")
print(HR_GETSTART_S)
print("Average EDA for Getting Started:")
print(EDA_GETSTART_S)
print("Average HR for Success:")
print(HR_SUCCESS_S)
print("Average EDA for Success:")
print(EDA_SUCCESS_S)  
print("Average HR for Encountering Difficulties:")
print(HR_ENCOUNTERDIFF_S)
print("Average EDA for Encountering Difficulties:")
print(EDA_ENCOUNTERDIFF_S)
print("Average HR for Dealing with Difficulties:")
print(HR_DEALDIFF_S)
print("Average EDA for Dealing with Difficulties:")
print(EDA_DEALDIFF_S)


Average HR for Getting Started:
70.42018691588785
Average EDA for Getting Started:
2.172140256425234
Average HR for Success:
97.45864864864866
Average EDA for Success:
2.8615414493243243
Average HR for Encountering Difficulties:
108.91074074074078
Average EDA for Encountering Difficulties:
2.997055990740741
Average HR for Dealing with Difficulties:
94.45558394160587
Average EDA for Dealing with Difficulties:
2.9017885857664303


## Tau value experiment 2


In [8]:
TAU_ROOT = "data/TAU/"
def generate_csv_at_TAU(filename, mylist):
    name = TAU_ROOT + filename + ".csv"
    with open(name, 'w') as csvfile:
        wr = csv.writer(csvfile)
        for v in mylist:
            wr.writerow([v])

def splice_TAU(bioname, stage, userID):
    file = CODED_ROOT + "25-" + stage + ".csv"
    bio_filepath = USER_ROOT + bioname + ".json"
    videoT = get_video_timestamp(userID)
    begin = 0
    end = 0
    index = 0
    spliced_data = []
    with open(file) as csvfile:
        csvreader = csv.reader(csvfile)
        next(csvreader) # pass the headings       
        for times in csvreader:
            begin = times[2]
            end = times[3]
            begin = elapsed_to_seconds(begin)
            end = elapsed_to_seconds(end)
            vT_begin = videoT + begin - 5 # Want to see the previous 5 seconds of the biosignal prior to stimulus
            vT_end = videoT + end
            index_begin = get_adjusted_index(vT_begin, bioname, userID)
            index_end = get_adjusted_index(vT_end, bioname, userID)
            with open(bio_filepath) as bf:
                data = json.load(bf)
            for item in data["data"][index_begin:index_end]: # possibly need to add 1? Figure out
                # need to convert spliced data into a list of ints instead of strings
                spliced_data.append(item)
            # for every occurence of encountering difficulties, make a csv file of biosignals around that timestamp
            spliced_data = list(map(float, spliced_data))
            shortened = shorten(stage)
            filename = shortened + "_" + bioname + "_" + str(index)
            generate_csv_at_TAU(filename, spliced_data)
            # increment index, restore spliced_data array
            index += 1
            spliced_data = []

splice_TAU("EDA","Encountering Difficulties", 25 )
splice_TAU("HR","Encountering Difficulties", 25 )
