# Homework 1

Matheus Schmitz  
USC ID: 5039286453  
mschmitz@usc.edu   

### Imports

In [1]:
import numpy as np
import pandas as pd
pd.options.display.max_columns = 30
pd.options.display.float_format = '{:.8f}'.format
from statsmodels.stats.inter_rater import fleiss_kappa as statsmodels_fleiss_kappa
from statsmodels.stats.weightstats import ttest_ind

C:\Users\Matheus\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll
C:\Users\Matheus\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
  stacklevel=1)


## Task 1

Compute the Fleiss’ kappa inter-annotator agreement on the set of videos in the provided **tabulatedVotes.csv**  
Report and interpret the results using Table 1.  

In [2]:
# Custom function to calculate Fleiss' Kappa
def fleiss_kappa(table):

    # N = number of items | k = number of categories
    N, k = table.shape

    # n = number of ratings per item, assume a complete matrix, aka all items have the same number of ratings
    n_max = table.sum(axis='columns').max()
    n_min = table.sum(axis='columns').min()
    assert n_max == n_min, "Complete Matrix Required: All items must have the same number of votes."
    n = n_max

    # p_j = (1/Nn) * (sum{i=1,...,N} n_ij)
    p_j = table.sum(axis='rows') / (N * n)
    assert all(df_votes.apply(lambda row: sum(row)/n, axis='columns') == 1), "Complete Matrix Required: All items must have the same number of votes."
    assert p_j.sum() == 1, "Sum of baseline probabilities must add to 1."
    
    # P_i = 1/(n*(n-1)) * [(sum{j=1,...,k} (n_ij)^2) - n]
    P_i = ((table**2).sum(axis='columns') - n) / ((n) * (n-1))

    # P_bar = 1/N * (sum{i=1,...,N} P_i)
    P_bar = P_i.sum() / N

    # Pe_bar = sum{j=1,...,k} (p_j)^2
    Pe_bar = (p_j**2).sum()

    # Fleiss' kappa = (P_bar - Pe_bar) / (1 - Pe_bar)
    kappa = (P_bar - Pe_bar) / (1 - Pe_bar)
    
    return kappa

In [3]:
# Load data
df = pd.read_csv('tabulatedVotes.csv', index_col='Unnamed: 0')
df_votes = df[['A', 'D', 'F', 'H', 'N', 'S']]
df_votes.head()

Unnamed: 0_level_0,A,D,F,H,N,S
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
100001,0,0,0,1,10,0
100003,0,0,0,4,7,0
100012,0,3,5,0,3,0
100013,2,3,3,1,2,0
100014,0,2,1,0,7,1


In [4]:
# Run my function to calculate Fleiss' kappa
kappa = fleiss_kappa(df_votes)
print(f"Fleiss' kappa: {kappa}")

# Run the statsmodels implementation to check if my results are accurate
kappa_sm = statsmodels_fleiss_kappa(df_votes)
print(f"Statsmodels: {kappa_sm}")

Fleiss' kappa: 0.4802678299370798
Statsmodels: 0.4802678299370798


For this dataset the inter-rater agreement, as measured by Fleiss' kappa, is 0.48, which, according to table 1, can be considered a *moderate agreement*.

## Task 2

Now go through the videos and pay attention to both verbal and nonverbal behaviors. Note one audio and one
visual cue which you believe to have correlations with ANGRY, HAPPY, and SAD. For example ”downward
gaze” may have a positive correlation with the SAD emotion. Don’t limit yourself to one modality when
selecting the behavioral cues.

**Answer:**  
To me it seems that ANGRY people are distinguisable visually by a "raising eyebrows" movement, and also distinguishable auditorily by "accelerated talking speed".  
Also, even though in class we spoke about how depressed people don't smile less often, just less intensely, I nevertheless get the impression that the HAPPY people both smile more and also squint their eyes more. Some HAPPY people also user a higher pitch.  
Then, for SAD most seem to have a much lower voice volume as well as blinking more.

## Task 3

Confirm your suspicions and subjective observations from the previous task with statistical analysis and by
using the expert annotations. You can do this by manually annotating the videos for each specific behavior
and performing correlation analysis. You need to annotate and calculate the p-value from the Student t-test
between the group of videos with emotion X and the group of videos without emotion X. You can use the
emotions in the video names as the labels (e.g. the emotion for the video 1001 IEO ANG HI is ANGRY). For
simplicity, pick 2 of your observations from above (different emotions), annotate 20 videos of different subjects
(for each emotion), and save your annotations in a .csv file. The the file should have 40 rows and 3 columns:
filename, observation1(boolean), observation2(boolean).

**Answer:**   
I chose to label 3 cues, namely "raising eyebrows', "accelerated talking speed", and "squints eyes".  
I also chose to label all threee emotions, meaning my .csv file has 4 columns (one extra for the extra cue), and 60 rows (20 for each emotion).

In [5]:
# Load csv
labels = pd.read_csv('manual_labels.csv')

# Extract label from file name
labels['label'] = labels.apply(lambda row: row['Video Name'].split('_')[2], axis='columns')

print(labels.shape)
labels.head()

(60, 5)


Unnamed: 0,Video Name,Raising Eyebrows,Accelerated Talking Speed,Squints Eyes,label
0,1091_IEO_SAD_HI,0,0,0,SAD
1,1091_IEO_HAP_HI,1,1,1,HAP
2,1091_IEO_ANG_HI,1,0,0,ANG
3,1090_IEO_SAD_HI,0,0,0,SAD
4,1090_IEO_HAP_HI,1,0,0,HAP


In [6]:
### statsmodels.stats.weightstats.ttest_ind ###
# Returns

# tstat: float
# test statistic

# pvalue: float
# pvalue of the t-test

# df: int or float
# degrees of freedom used in the t-test

In [7]:
# Run all pairwise tests
results = pd.DataFrame(columns=['ANG', 'HAP', 'SAD', 'cue', 'tstat', 'pvalue', 'df'])
emotions = ['ANG', 'HAP', 'SAD']
cues = ['Raising Eyebrows', 'Accelerated Talking Speed', 'Squints Eyes']

# Compare all pairwise emotions (but not against itself)
for emotion1 in emotions:
    for emotion2 in emotions:
        # Only compare emotions if emotion1 comes first alphatically, this avoids duplicated comparisons
        if emotion1 >= emotion2:
            continue
        
        # Compare over all cues
        for cue in cues:
        
            # Select samples
            samples1 = labels[labels['label'] == emotion1][cue]
            samples2 = labels[labels['label'] == emotion2][cue]
            
            # Run Student t-test
            tstat, pvalue, df = ttest_ind(samples1, samples2)
            
            # Store results
            new_entry = {emotion1: 1,
                         emotion2: 1,
                         'cue': cue,
                         'tstat': tstat,
                         'pvalue': pvalue,
                         'df': df}
            results = results.append(new_entry, ignore_index=True)

# Assing 0 to the third emotion which was not compared
results.fillna(0, inplace=True)
results[emotions + ['df']] = results[emotions + ['df']].astype(int)

## Task 4

**Statistically Significant Results**

In [8]:
# Check statistically significant results
significant = results[results['pvalue'] < 0.05]
display(significant)

# Compare all pairwise emotions (but not against itself)
for emotion1 in emotions:
    for emotion2 in emotions:
        # Only compare emotions if the emotion1 comes first alphatically, this avoids duplicated comparisons
        if emotion1 >= emotion2:
            continue
        count = significant[(significant[emotion1]==1) & (significant[emotion2]==1)].shape[0]
        print(f'{emotion1} vs {emotion2} has {count} significant cues')

print()
print("Number of significant tests per cue:")
display(significant['cue'].value_counts())

Unnamed: 0,ANG,HAP,SAD,cue,tstat,pvalue,df
2,1,1,0,Squints Eyes,-6.29420537,2.2e-07,38
3,1,0,1,Raising Eyebrows,3.23564333,0.00251622,38
4,1,0,1,Accelerated Talking Speed,3.55902608,0.00101913,38
6,0,1,1,Raising Eyebrows,2.84722087,0.00707579,38
7,0,1,1,Accelerated Talking Speed,3.55902608,0.00101913,38
8,0,1,1,Squints Eyes,6.29420537,2.2e-07,38


ANG vs HAP has 1 significant cues
ANG vs SAD has 2 significant cues
HAP vs SAD has 3 significant cues

Number of significant tests per cue:


Squints Eyes                 2
Raising Eyebrows             2
Accelerated Talking Speed    2
Name: cue, dtype: int64

**Non-Statistically Significant Results**

In [9]:
# Check NOT statistically significant results
not_significant = results[results['pvalue'] >= 0.05]
display(not_significant)

# Compare all pairwise emotions (but not against itself)
for emotion1 in emotions:
    for emotion2 in emotions:
        # Only compare emotions if the emotion1 comes first alphatically, this avoids duplicated comparisons
        if emotion1 >= emotion2:
            continue
        count = not_significant[(not_significant[emotion1]==1) & (not_significant[emotion2]==1)].shape[0]
        print(f'{emotion1} vs {emotion2} has {count} non-significant cues')
        
print()
print("Number of non-significant tests per cue:")
display(not_significant['cue'].value_counts())

Unnamed: 0,ANG,HAP,SAD,cue,tstat,pvalue,df
0,1,1,0,Raising Eyebrows,0.31214724,0.75663507,38
1,1,1,0,Accelerated Talking Speed,0.0,1.0,38
5,1,0,1,Squints Eyes,0.0,1.0,38


ANG vs HAP has 2 non-significant cues
ANG vs SAD has 1 non-significant cues
HAP vs SAD has 0 non-significant cues

Number of non-significant tests per cue:


Raising Eyebrows             1
Accelerated Talking Speed    1
Squints Eyes                 1
Name: cue, dtype: int64

### Conclusion

**Task 2**  
To me it seems that ANGRY people are distinguisable visually by a "raising eyebrows" movement, and also distinguishable auditorily by "accelerated talking speed".  
Also, even though in class we spoke about how depressed people don't smile less often, just less intensely, I nevertheless get the impression that the HAPPY people both smile more and also squint their eyes more. Some HAPPY people also user a higher pitch.  
Then, for SAD most seem to have a much lower voice volume as well as blinking more.  

**Task 3**  
I chose to label 3 cues, namely "raising eyebrows', "accelerated talking speed", and "squints eyes".  
I also chose to label all threee emotions, meaning my .csv file has 4 columns (one extra for the extra cue), and 60 rows (20 for each emotion).

**Task 4**  
We observe that using the selected cues, we are able to consistently differentiate between HAPPY and SAD emotions, with all cues being statistically significant.  
Between ANGRY and SAD we find that *Raising Eyebrows* and *Accelerated Talking Speed* can distinguish the emotions with statistical significant, whereas *Squints Eyes* cannot.  
Lastly, ANGRY and HAPPY are the hardest emotions to distinguish with the chosen cues, with only *Squints Eyes* being significant.  
Globally we find that SAD could be statistically identified based on the selected features 5 times, being the most identifiable emotion. It is followed by HAPPY with 4 significant tests. Lastly ANGER has only 3 significant tests, meaning it was involed in all tests that failed to obtain statistical significance.  

Overall we find that each of the chosen cues is statistically significant in two comparions, and not significant in one.  
It is also interesting to note that *Squints Eyes* has the most confident p-values, which are either very large (1.0) or very small(2.2e^-7), while on the other hand *Raising Eyebrows* has the least confident p-values, being at most 0.757 and at least 2.5e^-3, which, while significant, is nevertheless orders of magnitude less significant *Squints Eyes* and 2.5x less significant than *Accelerated Talking Speed* .    
Based on this, we have some indication that, despite all cues having the same count of significant / non-signicant tests in this small analysis, in a larger analysis we might find that *Squints Eyes* has the best performance, followed by *Accelerated Talking Speed*, and lastly by *Raising Eyebrows*.

# End
Matheus Schmitz  
USC ID: 5039286453  
mschmitz@usc.edu   