## Multilabel binary classification of speaker traits
### Laura Fernández Gallardo

After evaluating the binary classification of speakers' warmth-attractiveness (WAAT), I examine in this notebook multilabel classification, that is, predicting several traits attributed to speakers, which are not mutually exclusive.   

* For each perceptive speaker interpersonal dimension generated from [factor analysis](https://github.com/laufergall/Subjective_Speaker_Characteristics/tree/master/speaker_characteristics/factor_analysis) thresholding scores based on percentiles to define 3 classes ("high", "mid", and "low") with approximately the same number of samples. These dimensions are: *warmth*, *attractiveness*, *confidence*, *compliance*, and *maturity*.
* "high", "mid", and "low" classes -> **multilabel multiclass classification**.
* As evaluation metric, I will consider the average per-class accuracy (average of sensitivity and specificity)

In [1]:
import io
import requests

import numpy as np
import pandas as pd

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, recall_score

%matplotlib inline

## Load features and labels

Different speakers in train and test sets.

In [2]:
path = 'https://raw.githubusercontent.com/laufergall/ML_Speaker_Characteristics/master/data/generated_data/'

url = path + "feats_ratings_scores_train.csv"
s = requests.get(url).content
feats_ratings_scores_train = pd.read_csv(io.StringIO(s.decode('utf-8')))

url = path + "feats_ratings_scores_test.csv"
s = requests.get(url).content
feats_ratings_scores_test = pd.read_csv(io.StringIO(s.decode('utf-8')))

with open(r'..\data\generated_data\feats_names.txt') as f:
    feats_names = f.readlines()
feats_names = [x.strip().strip('\'') for x in feats_names] 

with open(r'..\data\generated_data\items_names.txt') as f:
    items_names = f.readlines()
items_names = [x.strip().strip('\'') for x in items_names] 

with open(r'..\data\generated_data\traits_names.txt') as f:
    traits_names = f.readlines()
traits_names = [x.strip().strip('\'') for x in traits_names] 

# read speaker trait classes
url = path + "classes_train.csv"
s = requests.get(url).content
classes_train = pd.read_csv(io.StringIO(s.decode('utf-8')))

url = path + "classes_test.csv"
s = requests.get(url).content
classes_test = pd.read_csv(io.StringIO(s.decode('utf-8')))

In [3]:
# Looking at the target classes

classes_test.head()

Unnamed: 0,sample_heard,warmth,attractiveness,confidence,compliance,maturity,gender,spkID
0,w282_gizo_stimulus.wav,high,high,high,high,high,w,282
1,w252_thessaloniki_stimulus.wav,high,high,low,high,low,w,252
2,w296_avarua_d5.wav,mid,low,mid,high,mid,w,296
3,m155_blantyre_stimulus.wav,high,mid,low,high,low,m,155
4,m298_copenhagen_stimulus.wav,low,mid,low,low,high,m,298


In [4]:
# appending classes to features

dropcolumns = ['name','speaker_gender'] + items_names + traits_names # 'spkID' in for the merge
feats_train = feats_ratings_scores_train.drop(dropcolumns, axis=1) # shape (2700, 88)
feats_test = feats_ratings_scores_test.drop(dropcolumns, axis=1) # shape (2700, 88)

feats_class_train = pd.merge(feats_train, classes_train.drop(['sample_heard','gender',], axis=1)) # shape (2700, 94)
feats_class_test = pd.merge(feats_test, classes_test.drop(['sample_heard','gender',], axis=1)) # shape (891, 94)

# classes as categorical
for col in traits_names:
    feats_class_train[col]=feats_class_train[col].astype('category')
    feats_class_test[col]=feats_class_test[col].astype('category')

In [5]:
# Standardize speech features  

dropcolumns2 = ['spkID'] + traits_names

# learn transformation on training data
scaler = StandardScaler()
scaler.fit(feats_class_train.drop(dropcolumns2, axis=1))

 
# numpy n_instances x n_feats
feats_s_train = scaler.transform(feats_class_train.drop(dropcolumns2, axis=1))
feats_s_test = scaler.transform(feats_class_test.drop(dropcolumns2, axis=1)) 

### quick example

With KNeighborsClassifier(), no model tuning, no feature selection.

#### model training

Categorization of classes: high = 0; low = 1; mid = 2.

In [6]:
X = feats_s_train
Y = feats_class_train[traits_names].apply(lambda x: x.cat.codes).as_matrix()

In [7]:
feats_class_train[traits_names].head()

Unnamed: 0,warmth,attractiveness,compliance,confidence,maturity
0,mid,mid,low,high,mid
1,mid,mid,low,high,mid
2,mid,mid,low,high,mid
3,mid,mid,low,high,mid
4,mid,mid,low,high,mid


In [8]:
Y

array([[2, 2, 1, 0, 2],
       [2, 2, 1, 0, 2],
       [2, 2, 1, 0, 2],
       ..., 
       [0, 0, 2, 2, 2],
       [0, 0, 2, 2, 2],
       [0, 0, 2, 2, 2]], dtype=int8)

In [9]:
from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier()

model.fit(X, Y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

#### testing performance

In [10]:
Xt = feats_s_test
Yt = feats_class_test[traits_names].apply(lambda x: x.cat.codes).as_matrix()

In [11]:
Y_pred = model.predict(Xt)

In [12]:
feats_class_test[traits_names].head()

Unnamed: 0,warmth,attractiveness,compliance,confidence,maturity
0,low,mid,low,high,low
1,low,mid,low,high,low
2,low,mid,low,high,low
3,low,mid,low,high,low
4,low,mid,low,high,low


In [13]:
Yt

array([[1, 2, 1, 0, 1],
       [1, 2, 1, 0, 1],
       [1, 2, 1, 0, 1],
       ..., 
       [2, 1, 0, 2, 2],
       [2, 1, 0, 2, 2],
       [2, 1, 0, 2, 2]], dtype=int8)

In [14]:
Y_pred

array([[2, 0, 2, 0, 2],
       [2, 0, 2, 0, 2],
       [0, 0, 2, 0, 2],
       ..., 
       [1, 2, 2, 1, 2],
       [0, 0, 0, 0, 0],
       [1, 1, 0, 2, 1]], dtype=int8)

#### confusion matrix for each of the labels

Treating each pair of classes separately: high vs. low, high vs. mid, and low vs. mid as in the binary case: 

spec = tn / (tn + fp)

sens = tp / (tp + fn)

In [15]:
for i in range(len(traits_names)): 

    print(traits_names[i])

    # get true/false positives and true/false negatives for high vs. low, high vs. mid, and low vs. mid

    tph, fnlh, fnmh, fplh, tpl, fnlm, fpmh, fplm, tpm = confusion_matrix(Yt[:,i], Y_pred[:,i]).ravel()

    # high vs. low

    sens_hl = tph / (tph + fnlh)
    spec_hl = tpl / (tpl + fplh)

    print('rate of correctly classifing speakers as high (vs. low): %.2f' % sens_hl)
    print('rate of correctly classifing speakers as low (vs. high): %.2f' % spec_hl)

    # high vs. mid

    sens_mh = tph / (tph + fnmh)
    spec_mh = tpm / (tpm + fpmh)

    print('rate of correctly classifing speakers as high (vs. mid): %.2f' % sens_mh)
    print('rate of correctly classifing speakers as mid (vs. high): %.2f' % spec_mh)

    # low vs. mid

    sens_lm = tpl / (tpl + fnlm)
    spec_lm = tpm / (tpm + fplm)

    print('rate of correctly classifing speakers as low (vs. mid): %.2f' % sens_lm)
    print('rate of correctly classifing speakers as mid (vs. low): %.2f' % spec_lm)


    avg_pc_acc = (sens_hl+spec_hl)/2
    print('Average per-class accuracy high WAAT vs. low WAAT: %.2f' % avg_pc_acc )

warmth
rate of correctly classifing speakers as high WAAT (vs. low WAAT): 0.83
rate of correctly classifing speakers as low WAAT (vs. high WAAT): 0.51
rate of correctly classifing speakers as high WAAT (vs. mid WAAT): 0.71
rate of correctly classifing speakers as mid WAAT (vs. high WAAT): 0.46
rate of correctly classifing speakers as low WAAT (vs. mid WAAT): 0.48
rate of correctly classifing speakers as mid WAAT (vs. low WAAT): 0.58
Average per-class accuracy high WAAT vs. low WAAT: 0.67
attractiveness
rate of correctly classifing speakers as high WAAT (vs. low WAAT): 0.80
rate of correctly classifing speakers as low WAAT (vs. high WAAT): 0.42
rate of correctly classifing speakers as high WAAT (vs. mid WAAT): 0.75
rate of correctly classifing speakers as mid WAAT (vs. high WAAT): 0.28
rate of correctly classifing speakers as low WAAT (vs. mid WAAT): 0.40
rate of correctly classifing speakers as mid WAAT (vs. low WAAT): 0.47
Average per-class accuracy high WAAT vs. low WAAT: 0.61
compli

#### Future work:
    
* Better performance metrics can be defined.
    * Interesting to take a look at the [Hamming score](http://stackoverflow.com/q/32239577/395857).
    * Also, this article by Zhang and Zhou (2014): "A Review on Multi-Label Learning Algorithms".
* try different classifiers and model tuning according to the performance metrics of interest. 
    * As done for binary classification, nested hyperparameter tuning with feature selection can be performed.
    * Classifiers that support multiclass-multioutput:
sklearn.tree.DecisionTreeClassifier
sklearn.tree.ExtraTreeClassifier
sklearn.ensemble.ExtraTreesClassifier
sklearn.neighbors.KNeighborsClassifier
sklearn.neighbors.RadiusNeighborsClassifier
sklearn.ensemble.RandomForestClassifier
http://scikit-learn.org/stable/modules/multiclass.html
