## Data Format

This notebook contains the necessary data wrangling processes to format data acquired from environment sensors. 
* Video codes
* IMU data streams
* Empatica biosignals
```
compiled_data = {
     user1: {
         video: 
             web: "adkada/akdlakds/asdkad.webm"
             mp4: "video.mp4"
             codes: "codes.json"
         sensors: 
             iron: 
                 imu: "imu.json"
             bio: 
                 acc: acc.json
                 ...
     }
 }
```

In [262]:
compiled_data = {}
import glob, os, json
import pandas as pd
import numpy as np

def save_jsonfile(fn, data):
    file = fn
    with open(file, 'w') as outfile:
        json.dump(data, outfile)
    print("File saved!", file)
def append_data(root, directory, category, type):
    videos = glob.glob(root + directory)
    for i in videos:
        name = os.path.basename(i).split('.')[0]
        if name in compiled_data:
            if category in compiled_data[name]:
                compiled_data[name][category][type] = "/"+i
            else:
                compiled_data[name][category] = {type: "/"+i}
        else:
            compiled_data[name] = {category: {type: "/"+i}}

## Video Codes
Root directory `VIDEO_ROOT = data/video_data`

### Routine
Processing video obtained from the session should be done as follows:
1. [TODO] Run a ffmpeg batch script to convert `VIDEO_ROOT/raw` to optimized `MP4` and store in `VIDEO_ROOT/processed`.
2. [TODO] Run a ffmpeg batch script to convert `VIDEO_ROOT/processed` to `WEBM` and store in `VIDEO_ROOT/web`. Should be a lower resolution [TODO] ... 720?
3. In MaxQDA, code the video with the following code system `TODO`
4. Activate all codes and export. Remove redundant columns in Excel and save as `VIDEO_ROOT/video_data.csv`


```
brew install ffmpeg --with-libvpx --with-vorbis --with-libvorbis --with-vpx --with-vorbis --with-theora --with-libogg --with-libvorbis --with-gpl --with-version3 --with-nonfree --with-postproc --with-libaacplus --with-libass --with-libcelt --with-libfaac --with-libfdk-aac --with-libfreetype --with-libmp3lame --with-libopencore-amrnb --with-libopencore-amrwb --with-libopenjpeg --with-openssl --with-libopus --with-libschroedinger --with-libspeex --with-libtheora --with-libvo-aacenc --with-libvorbis --with-libvpx --with-libx264 --with-libxvid
```

Use [ffmpeg](https://gist.github.com/clayton/6196167) to convert videos
```
ffmpeg -i cesar.mp4 -c:v libvpx-vp9 -pass 2 -b:v 0 -crf 33 -threads 8 -speed 2 -tile-columns 6 -frame-parallel 1 -auto-alt-ref 1 -lag-in-frames 25  -f webm cesar.webm
```

In [263]:
VIDEO_ROOT = "data/video_data/"
WEB = "web/*.webm"
RAW = "raw/*"
MP4 = "mp4/*.mp4"

def process_video(RAW, MP4):
    #SKIP PROCESSED FILES
    #RUN FFMPEG SCRIPT
    pass
    
append_data(VIDEO_ROOT, RAW, "video", "raw")
append_data(VIDEO_ROOT, MP4, "video", "mp4")
append_data(VIDEO_ROOT, WEB, "video", "web")

### MaxQDA codes

In [279]:
MAXQDA_OUTPUT="codes.csv"
CODES_ROOT="codes/"
file = VIDEO_ROOT + CODES_ROOT + MAXQDA_OUTPUT
print(file)
df = pd.read_csv(file)

start = pd.to_datetime(df['Begin'])
end = pd.to_datetime(df['End'])
t0 = start[0]
df['t_i'] = (start - t0).dt.total_seconds().astype(int) # Don't need millisecond precision for hand-coded codes
df['t_f'] = (end - t0).dt.total_seconds().astype(int)
df.drop(['Begin', 'End'], 1)

data = {}
for index, row in df.iterrows():
    user = row[0]
    codes = row[1].split("\\")
    if not user in data: 
        data[user] = []
    data[user].append({
        'codes': codes,
        'start': row[4],
        'end': row[5]
    })
   

for user in data:
    file = VIDEO_ROOT + CODES_ROOT + user + ".json"
#     print(file)
    save_jsonfile(file, data[user])
append_data(VIDEO_ROOT, CODES_ROOT + "*.json", "video", "codes")

data/video_data/codes/codes.csv
File saved! data/video_data/codes/kevin.json
File saved! data/video_data/codes/chris.json
File saved! data/video_data/codes/cesar.json
File saved! data/video_data/codes/molly.json


## IMU_Data

In [280]:
# Gather sensor files
SENSOR_ROOT = "data/sensor_data/"
IMU = "*.json"
append_data(SENSOR_ROOT, IMU, "iron", "imu")

## Bio_Data

.csv files in this archive are in the following format:
The first row is the initial time of the session expressed as unix timestamp in UTC.
The second row is the sample rate expressed in Hz.

### temp.csv
Data from temperature sensor expressed degrees on the Celsius (°C) scale.

### EDA.csv
Data from the electrodermal activity sensor expressed as microsiemens (μS).

### BVP.csv
Data from photoplethysmograph.

### ACC.csv
Data from 3-axis accelerometer sensor. The accelerometer is configured to measure acceleration in the range [-2g, 2g]. Therefore the unit in this file is 1/64g.
Data from x, y, and z axis are respectively in first, second, and third column.

### IBI.csv
Time between individuals heart beats extracted from the BVP signal.
No sample rate is needed for this file.
The first column is the time (respect to the initial time) of the detected inter-beat interval expressed in seconds (s).
The second column is the duration in seconds (s) of the detected inter-beat interval (i.e., the distance in seconds from the previous beat).

### HR.csv
Average heart rate extracted from the BVP signal.The first row is the initial time of the session expressed as unix timestamp in UTC.
The second row is the sample rate expressed in Hz.

### tags.csv
Event mark times.
Each row corresponds to a physical button press on the device; the same time as the status LED is first illuminated.
The time is expressed as a unix timestamp in UTC and it is synchronized with initial time of the session indicated in the related data files from the corresponding session.


In [281]:
BIO_ROOT = "data/bio_data/"

videos = glob.glob(BIO_ROOT + "*")

def grab_and_save_data(user, sensor, name, file, df, columns):
    timestamp = df.columns[0]
    sampling_rate = df.iloc[[0]][timestamp][0].astype(int)
    df = df.iloc[1:]
    data = {
        'name': name,
        'timestamp': int(float(timestamp)),
        'sampling_rate': int(sampling_rate)
    }

    for i in range(len(columns)):
        name = columns[i]
        data[name] = df.iloc[:,i].values.tolist()

    save_jsonfile(file, data)
    if user not in compiled_data:
        compiled_data[user] = {'bio': {}}
    if 'bio' not in compiled_data[user]:
        compiled_data[user]['bio'] = {}
    compiled_data[user]['bio'][sensor] = "/"+file
        
for session in videos:
    bio_data = glob.glob(session + "/*.csv")
    user = os.path.basename(session)
    for sensor in bio_data: 
        sensor_name = name = os.path.basename(sensor).split('.')[0].lower()
        print(sensor)
        try: 
            df = pd.read_csv(sensor)
        except pd.io.common.EmptyDataError:
            print("Empty file:", sensor)
            continue
        
        # CONVERT CSV FILES INTO JSON         
        file = session + "/" + sensor_name + ".json"
        
        # SENSOR SPECIFIC PARSING
        if sensor_name == "temp":
            grab_and_save_data(user, sensor_name, "Temperature (C)", file, df, ['celsius'])
        if sensor_name == "tags":
            pass
        if sensor_name == "acc":
            grab_and_save_data(user, sensor_name, "3-Axis Accelerometer (1/64g)", file, df, ['x', 'y', 'z'])
        if sensor_name == "eda":
            grab_and_save_data(user, sensor_name, "Electrodermal Activity (μS)", file, df, ['mag'])
        if sensor_name == "bvp":
            grab_and_save_data(user, sensor_name, "Blood Volume Pulse (BVP) from PPG", file, df, ['mag'])
        if sensor_name == "hr":
            grab_and_save_data(user, sensor_name, "Heart rate", file, df, ['mag'])
            


## Save compiled data as a JSON

In [282]:
import json
COMPILED_DATA = "data/compiled.json"
save_jsonfile(COMPILED_DATA, compiled_data)
print(json.dumps(compiled_data, indent=2))

File saved! data/compiled.json
{
  "chris": {
    "video": {
      "mp4": "/data/video_data/mp4/chris.mp4",
      "codes": "/data/video_data/codes/chris.json"
    },
    "iron": {
      "imu": "/data/sensor_data/chris.json"
    }
  },
  "cesar": {
    "video": {
      "mp4": "/data/video_data/mp4/cesar.mp4",
      "codes": "/data/video_data/codes/cesar.json"
    },
    "iron": {
      "imu": "/data/sensor_data/cesar.json"
    }
  },
  "kevin": {
    "video": {
      "mp4": "/data/video_data/mp4/kevin.mp4",
      "codes": "/data/video_data/codes/kevin.json"
    },
    "iron": {
      "imu": "/data/sensor_data/kevin.json"
    }
  },
  "molly": {
    "video": {
      "mp4": "/data/video_data/mp4/molly.mp4",
      "codes": "/data/video_data/codes/molly.json"
    },
    "iron": {
      "imu": "/data/sensor_data/molly.json"
    }
  }
}


### Copy to Rails App

In [283]:
#RUN BASH SCRIPT
#DO NOT COPY LARGE VIDEO FILES
import subprocess
print("start")
output = subprocess.call("bash transfer.sh", shell=True)
a = subprocess.Popen("ls",shell=True)
print(output)
print("end")

start
0
end
