## First Look at the Data

1. How consistent are the emotion annotations? Do different annotators
agree in their ratings of the same excerpt?
2. Derive discrete quadrant-based class labels from the raw
arousal/valence annotations. Any ideas on how to aggregate conflicting
annotations? Outliers?
3. How are the resulting discrete class labels distributed? Are the classes
unbalanced, and how much? How are the features distributed? Are
there any pairs or subsets of features that seem highly correlated or
redundant?
4. How are GEMS9 and GEMMES related to arousal and valence?
5. Which features seem useful for classification? Which ones are
correlated with the labels?
6. Any interesting conclusions you can draw from this for the next project
phase?

### Imports

In [1]:
import os
import glob
import numpy as np
import pandas as pd
import altair as alt
import matplotlib as plt
import seaborn as sns
import ipywidgets as widgets
from ipywidgets import *

dirname = 'Task2_due_April-30'

### Files

In [2]:
files = glob.glob(os.path.join(dirname, '**'), recursive=True)
files = sorted(files[1:], key=lambda x: os.path.splitext(x)[1])
for i, file in enumerate(files):
    print(f'index {i}:', file[len(dirname)+1:])

index 0: task_2_annotations_82d1d6d1093eaab6_e330cbf_weka.arff
index 1: task_2_features_1d8b658c21ddc127_e330cbf_weka.arff
index 2: task_2_annotations_82d1d6d1093eaab6_e330cbf_generic.csv
index 3: task_2_features_1d8b658c21ddc127_e330cbf_generic.csv
index 4: task_2_annotations_82d1d6d1093eaab6_e330cbf_matlab.mat
index 5: task_2_features_1d8b658c21ddc127_e330cbf_matlab.mat
index 6: task_2_annotations_82d1d6d1093eaab6_e330cbf_numpy.pkl
index 7: task_2_annotations_82d1d6d1093eaab6_e330cbf_pandas.pkl
index 8: task_2_features_1d8b658c21ddc127_e330cbf_numpy.pkl
index 9: task_2_features_1d8b658c21ddc127_e330cbf_pandas.pkl


#### .arff: Attribute-Relation File Format

In [3]:
from scipy.io import arff

In [4]:
weka_annot = arff.loadarff(files[0])
weka_annot_df = pd.DataFrame(weka_annot[0])
weka_annot_df.head()

Unnamed: 0,pianist_id,segment_id,annotator_id,arousal,valence,gems_wonder,gems_transcendence,gems_tenderness,gems_nostalgia,gems_peacefulness,gems_power,gems_joyful_activation,gems_tension,gems_sadness,gemmes_flow,gemmes_movement,gemmes_force,gemmes_interior,gemmes_wandering
0,1.0,0.0,91.0,1.0,-1.0,2.0,1.0,2.0,4.0,2.0,1.0,1.0,1.0,2.0,3.0,2.0,1.0,1.0,2.0
1,1.0,0.0,19.0,2.0,-1.0,3.0,3.0,3.0,4.0,4.0,1.0,2.0,3.0,3.0,3.0,2.0,2.0,3.0,3.0
2,1.0,0.0,189.0,2.0,0.0,2.0,1.0,2.0,1.0,4.0,2.0,2.0,1.0,1.0,3.0,2.0,1.0,1.0,4.0
3,1.0,0.0,126.0,2.0,2.0,4.0,5.0,2.0,3.0,5.0,2.0,4.0,1.0,3.0,5.0,1.0,2.0,2.0,5.0
4,1.0,0.0,26.0,4.0,2.0,3.0,5.0,2.0,3.0,3.0,1.0,3.0,4.0,1.0,4.0,1.0,2.0,3.0,1.0


In [5]:
weka_features = arff.loadarff(files[1])
weka_feature_df = pd.DataFrame(weka_features[0])
# weka_feature_df.head()

#### .csv: comma seperated values

In [6]:
import csv

In [7]:
generic_annot = csv.reader(open(f'{files[2]}', 'r', newline='\n'))
generic_annot_df = pd.DataFrame(generic_annot)
# generic_annot_df.head()

In [8]:
generic_features = csv.reader(open(f'{files[3]}', 'r', newline='\n'))
generic_feature_df = pd.DataFrame(generic_features)
# generic_feature_df.head()

#### .mat: MatLab

In [9]:
from mat4py import loadmat

In [10]:
matlab_annot = loadmat(f'{files[4]}')
matlab_annot_df = pd.DataFrame(matlab_annot)
# matlab_annot_df.head()

In [11]:
matlab_features = loadmat(f'{files[5]}')
matlab_feature_df = pd.DataFrame(matlab_features)
# matlab_feature_df.head()

#### .pkl: Numpy Pickle

In [12]:
import pickle

In [13]:
numpy_annot_dict = pd.read_pickle(f'{files[6]}')
# print(numpy_annot_dict)

In [14]:
numpy_feature_dict = pd.read_pickle(f'{files[8]}')
# print(numpy_annot_dict)

In [15]:
pandas_annot = pd.read_pickle(f'{files[7]}')
pandas_annot_df = pd.DataFrame.from_dict(pandas_annot)
# pandas_annot_df.head()

In [16]:
pandas_features = pd.read_pickle(f'{files[9]}')
pandas_feature_df = pd.DataFrame.from_dict(pandas_features)
# pandas_feature_df.head()

### Plots

In [17]:
d = widgets.Dropdown(options=['weka_annot_df', 'weka_feature_df', 'generic_annot_df', 'generic_feature_df', 
                              'matlab_annot_df', 'matlab_feature_df', 'numpy_annot_dict', 'numpy_feature_dict',
                              'pandas_annot_df', 'pandas_feature_df'], description='Dataset')
display(d)

Dropdown(description='Dataset', options=('weka_annot_df', 'weka_feature_df', 'generic_annot_df', 'generic_feat…

In [18]:
source=d.value
def ret_data(source):
    if source == 'weka_annot_df':
        return weka_annot_df
    if source == 'weka_feature_df':
        return weka_features_df
    if source == 'generic_annot_df':
        return generic_annot_df
    if source == 'generic_feature_df':
        return generic_feature_df
    if source == 'matlab_annot_df':
        return matlab_annot_df
    if source == 'matlab_feature_df':
        return matlab_feature_df
    if source == 'numpy_annot_dict':
        return numpy_annot_dict
    if source == 'numpy_feature_dict':
        return numpy_feature_dict
    if source == 'pandas_annot_df':
        return pandas_annot_df
    if source == 'pandas_feature_df':
        return pandas_feature_df
source=ret_data(source)

In [19]:
f = widgets.Dropdown(options=source.columns, description='Feature')
c = widgets.Dropdown(options=['arousal'], description='Label')
display(f, c)

Dropdown(description='Feature', options=('pianist_id', 'segment_id', 'annotator_id', 'arousal', 'valence', 'ge…

Dropdown(description='Label', options=('arousal',), value='arousal')

In [20]:
# against Segment ID

In [21]:
alt.Chart(source).mark_circle(size=60).encode(
    x='segment_id',
    y=f.value,
    color=f'{c.value}:N',
).interactive()

In [22]:
# against Pianist ID

In [23]:
alt.Chart(source).mark_circle(size=60).encode(
    x='pianist_id',
    y=f.value,
    color=f'{c.value}:N',
).interactive()