# How to process codes using the EmoCodes Library
This notebook details how to take a codes CSV (exported from Datavyu) and fully process it using the emocodes library.

## 1. Validate the code file

In [1]:
import emocodes as ec

# first assign your codes and video files to a variable (for readability)
code_file = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/raw/AHKJS1E2_objective_codes_DB.csv'
video_file = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/episodes/AHKJ_S1E2.mp4'

# now run the validation class
ec.ValidateTimeSeries().run(code_file, video_file)

<emocodes.processing.codes.ValidateTimeSeries at 0x103c42640>

That command produced a series of reports in the same folder as the code file (by default). The report for this file is reproduced below.  Based on this report, it looks like I need to double check the onsets for has_faces and num_chars.

### EmoCodes Code Validation Report

**Datavyu file:** /Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/raw/AHKJS1E2_objective_codes_DB.csv 

**video file:** /Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/episodes/AHKJ_S1E2.mp4 

**Full Report Table**: /Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/raw/AHKJS1E2_objective_codes_DB_report_20211026.csv

**Code labels found**: closeup, collective, has_body, has_faces, has_words, num_chars, time_of_day

#### Timestamps Brief Report 

Please note that the cell numbers are zero-indexed, meaning the count starts at 0, not 1.

| Label | Cells with Bad Onsets | Cells with Bad Offsets | Cells with Bad Durations |
| :---- | :-------------------: | :--------------------: | :----------------------: |
| closeup | None | None | None |
| collective | None | None | None |
| has_body | None | None | None |
| has_faces | 9,25,26,39,41,62,66,68,80,86,95,100,105 | None | None |
| has_words | None | None | None |
| num_chars | 115 | None | None |
| time_of_day | None | None | None |

******

#### Values Brief Report 

Please note that the cell numbers are zero-indexed, meaning the count starts at 0, not 1.

| Label | Unique Values | # Empty Cells | List Empty Cells |
| :---- | :-----------: | :-----------: | :--------------: |
| closeup | 0.0,1.0 | 0 | None |
| collective | 0.0,1.0 | 0 | None |
| has_body | 0.0,1.0 | 0 | None |
| has_faces | 0.0,1.0 | 0 | None |
| has_words | 0.0,1.0 | 0 | None |
| num_chars | 0,1,2,3,4,5 | 0 | None |
| time_of_day | 0.0,1.0 | 0 | None |

## 2. Process validated code file
Once the codes are what you want and the timestamps/values have been finalized, the next step is to convert it to timeseries. In this example, I would like to analyze this data at 1.2 Hz (corresponding to an 800ms TR in my fMRI data).

**Note**: The CodeTimeSeries class prints out any assumptions it makes when converting the time segments to the timeseries values.  In the below example, there are short periods in each of the labels that were not included in the code segments, so the class interpolated their values using nearnest neightbor interpolation.  This feature can be turned off using the "interpolate_gaps=False" argument.  Read more here: https://emocodes.readthedocs.io/en/main/autoapi/emocodes/processing/index.html#emocodes.processing.CodeTimeSeries

In [2]:
# first assign your codes and video files to a variable (for readability)
code_file = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/raw/AHKJS1E2_objective_codes_DB.csv'
video_file = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/episodes/AHKJ_S1E2.mp4'
out_file = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/processed/AHKJS1E2_objective_codes_timeseries_DB'

ec.CodeTimeSeries(sampling_rate=1.2).proc_codes_file(code_file, video_file, out_file)

Code time series saved at /Users/catcamacho/Dropbox/Mac/Documents/GitHub/emocodes/examples/AHKJS1E2_objective_codes_timeseries_DB_20211026.csv


Here is what the ouput looks like:

In [3]:
import pandas as pd

codes = pd.read_csv('/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/processed/AHKJS1E2_objective_codes_timeseries_DB_20211026.csv', index_col=0)
codes.head()

Unnamed: 0_level_0,closeup,collective,has_body,has_faces,has_words,num_chars,time_of_day
onset_ms,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
833,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1666,0.0,0.0,0.0,0.0,0.0,0.0,1.0
2499,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3332,0.0,0.0,0.0,0.0,0.0,0.0,1.0


## 3. Make a summary report
To better visualize the data, it can be helpful to produce a summary report.  These reports can also be used before neuroimaging analysis to better gauge represention of each code across the video and the collineary between the coded features among other things.  Below is the report for the example file.

In [4]:
in_file = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/processed/AHKJS1E2_objective_codes_timeseries_DB_20211026.csv'
out_folder = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/processed/report'

ec.SummarizeVideoFeatures().compile(in_file, out_folder)

<emocodes.analysis.features.SummarizeVideoFeatures at 0x103c42580>

<Figure size 504x324 with 0 Axes>

<Figure size 504x324 with 0 Axes>

# EmoCodes Analysis Summary Report

**in_file:** /Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/processed/AHKJS1E2_objective_codes_timeseries_DB_20211026.csv 

| Feature | Non-Zero | Min Value | Max Value |
| :------ | :------: | :-------: | :-------: |
| closeup | 20.23% | 0.0 | 1.0 |
| collective | 14.72% | 0.0 | 1.0 |
| has_body | 93.39% | 0.0 | 1.0 |
| has_faces | 90.73% | 0.0 | 1.0 |
| has_words | 4.86% | 0.0 | 1.0 |
| num_chars | 94.81% | 0.0 | 5.0 |
| time_of_day | 84.7% | 0.0 | 1.0 |

******

## Features Included in this Analysis

### Original Features

![feature plots](figs/features_plot.png)

### After HRF convolution (6s peak, 12s undershoot)

![hrf-convolved feature plots](figs/hrf_features_plot.png)

******

## Spearman Correlations

![correlation plots](figs/corr_plot.png)

******
## Mean Instantaneous Phase Synchrony

![mean IPS plots](figs/mean_ips_plot.png)

******
## Variance Inflation Factors

![VIF plots](figs/vif_plot.png)
