# Skips and delays analysis

During the study, participants could skip any film clip.
To see if a skip occurred, the signal recorded during stimulus was compared with the original film clip duration.

Delays in the stimuli phase were caused by connection errors with the application server.
Delays in the washout phase were caused by the reaction times of subjects.

All data are available in the dataset.

In [1]:
import json
import os
import configparser
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
from pprint import pprint
from datetime import datetime
from helpers import STIMULATION_TIMES

In [2]:
config = configparser.ConfigParser()
config.read("config.ini")

if not os.path.exists(config['DataDirectories']['unzipped_dataset']):
    raise Exception("Please set path for unzipped dataset in config.ini")

DATA_DIR = config['DataDirectories']['unzipped_dataset']

In [3]:
# generate paths
washout_paths, stimuli_paths = [], []

for part in os.listdir(DATA_DIR):
    for f_name in os.listdir(DATA_DIR + '/' + part):
        if 'EMPATICA' in f_name:
            if 'WASHOUT' in f_name:
                # files recorded during washouts
                washout_paths.append((part, f_name.split('_')[1], DATA_DIR + '/' + part + '/' + f_name))
            elif 'STIMULUS' in f_name:
                # files recorded during stimuli
                stimuli_paths.append((part, f_name.split('_')[1], DATA_DIR + '/' + part + '/' + f_name))

## Stimuli

Skip (shorter duration) means that the participant skipped some part of the film clip. The amount of skipped time is stored in the `skips` dictionary. Tolerance time for skips was set to 1s.

Delay (longer duration) means that the participant watched the whole film clip, but the questionnaire did not load instantly. The amount of delay time is stored in the `delays` dictionary. The tolerance time for delays was set to 5s.

In [4]:
skip_delta = pd.Timedelta('00:00:01')
delay_delta = pd.Timedelta('00:00:00') - pd.Timedelta('00:00:05')
skips, delays = dict(), dict()

for part, emotion, path in tqdm(stimuli_paths):
    if emotion == 'BASELINE':
        continue
    with open(path, 'r') as f:
        bvp_times = [datetime.strptime(p[0], '%Y-%m-%dT%H:%M:%S:%f') for p in json.load(f)['BVP']]
    
    diff = STIMULATION_TIMES[emotion] - (bvp_times[-1] - bvp_times[0])
    if diff > skip_delta:
        # if skip occured add to skips
        skips.setdefault(part, [])
        skips[part].append((emotion, diff))
    elif diff < delay_delta:
        # if delay occured add to delays
        delays.setdefault(part, [])
        delays[part].append((emotion, pd.Timedelta('00:00:00') - diff))

  0%|          | 0/464 [00:00<?, ?it/s]

In [5]:
pprint(skips)

{'37': [('FEAR', Timedelta('0 days 00:01:47.015031'))],
 '58': [('AWE', Timedelta('0 days 00:01:06.013337')),
        ('DISGUST', Timedelta('0 days 00:00:04.012696')),
        ('ENTHUSIASM', Timedelta('0 days 00:01:36.014573')),
        ('LIKING', Timedelta('0 days 00:00:28.027452'))],
 '62': [('FEAR', Timedelta('0 days 00:01:22.013886'))]}


In [6]:
delays

{'44': [('ENTHUSIASM', Timedelta('0 days 00:08:28.981870'))]}

## Washouts

Skip (shorter duration) means that only part of the washout was presented. The amount of skipped time is stored in the `skips` dictionary. Tolerance time for skips was set to 1s.

Delay (longer duration) means that the washout phase lasted longer than the washout film clip. In general, such delays are caused by the participant's reaction time, as subjects had to confirm they wanted to watch the next stimulus film clip after the washout ended. The amount of delay time is stored in the `delays` dictionary. The tolerance time for delays was set to 5s. 

In [7]:
skip_delta = pd.Timedelta('00:00:01')
delay_delta = pd.Timedelta('00:00:00') - pd.Timedelta('00:00:05')
skips, delays = dict(), dict()

for part, emotion, path in tqdm(washout_paths):
    with open(path, 'r') as f:
        bvp_times = [datetime.strptime(p[0], '%Y-%m-%dT%H:%M:%S:%f') for p in json.load(f)['BVP']]
    
    diff = STIMULATION_TIMES['WASHOUT'] - (bvp_times[-1] - bvp_times[0])
    if diff > skip_delta:
        # if skip occured add to skips
        skips.setdefault(part, [])
        skips[part].append((emotion, diff))
    elif diff < delay_delta:
        # if delay occured add to delays
        delays.setdefault(part, [])
        delays[part].append((emotion, pd.Timedelta('00:00:00') - diff))

  0%|          | 0/422 [00:00<?, ?it/s]

In [8]:
# shorter washout for the participant 48 occured due to a application malfunction
# after that, the rest of data from empatica is lost for this participant
pprint(skips)

{'48': [('FEAR', Timedelta('0 days 00:00:32.527246'))]}


In [9]:
# all washouts other than fear (cell above) were longer than delay_time 
print('All washouts:', len(washout_paths))
print('Wahouts longer than delay_time:', len([t[1] for times_list in delays.values() for t in times_list]))

All washouts: 422
Wahouts longer than delay_time: 421


In [10]:
# avg delay
np.mean([t[1] for times_list in delays.values() for t in times_list])

Timedelta('0 days 00:00:08.746948824')