# How to check the consistency between two code sets
This notebook demonstrates how to use the emocodes library to examine how similar two sets of codes are. 

**Note**: the files used in this notebook have already been validated and processed (converted to timeseries format).

## Use Case 1: Check consistency between two coders and combine to create codes set for analysis

In [8]:
import emocodes as ec
import pandas as pd

coder1 = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/processed/AHKJ_S1E2/subjective_character_Clover_timeseries_DB_20210705-163848.csv'
coder2 = '/Users/catcamacho/Box/CCP/EmoCodes_project/reliability_data/processed/AHKJ_S1E2/subjective_character_Clover_timeseries_RK_20210705-163849.csv'

# read the csvs
coder1_df = pd.read_csv(coder1, index_col=0)
coder2_df = pd.read_csv(coder2, index_col=0)

# run the consensus class
results = ec.Consensus().interrater_consensus([coder1_df, coder2_df], rater_list = ['RK', 'DB'])

In [9]:
results.consensus_scores

Unnamed: 0,RatingsA,RatingsB,ColumnVariable,PercentOverlap
0_0,RK,DB,char_intensity,88.897546
0_1,RK,DB,char_valence_negative,70.440203
0_2,RK,DB,char_valence_positive,87.245812
0_3,RK,DB,on_screen,98.441761


In [11]:
results.mismatch_segments.to_csv('RK_DB_mismatch_report.csv')

Unnamed: 0,variable,mismatch_onset,mismatch_offset,rater1,rater2
0_0,char_intensity,25600,25600,RK,DB
0_1,char_intensity,40400,40600,RK,DB
0_2,char_intensity,232100,232300,RK,DB
0_3,char_intensity,241400,241400,RK,DB
0_4,char_intensity,243700,243800,RK,DB
...,...,...,...,...,...
3_16,char_valence_negative,1029500,1030900,RK,DB
3_17,char_valence_negative,1040300,1040700,RK,DB
3_18,char_valence_negative,1074700,1074800,RK,DB
3_19,char_valence_negative,1078900,1174800,RK,DB


If we are happy with the overlap, we can average them to create one final version:

In [14]:
combined_codes = pd.concat([coder1_df, coder2_df])
combined_codes.head()

Unnamed: 0_level_0,char_intensity,char_valence_negative,char_valence_positive,on_screen
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100,0,,,0.0
200,0,,,0.0
300,0,,,0.0
400,0,,,0.0
500,0,,,0.0


In [15]:
averaged_codes = combined_codes.groupby('time').mean()
averaged_codes.head()

Unnamed: 0_level_0,char_intensity,char_valence_negative,char_valence_positive,on_screen
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100,0.0,,,0.0
200,0.0,,,0.0
300,0.0,,,0.0
400,0.0,,,0.0
500,0.0,,,0.0


In [16]:
averaged_codes.to_csv('final_codes.csv')