## How to evaluate uncertainty quantification methods for speech emotion classifiers in real-world scnearios

In the following we will load the for non-comercial use available models we trained for the paper from zenodo and evaluate their uncertainty quantification capabilities on the proposed tests.

### Evaluate the model trained on cross entropy

In [9]:
import w2v2_cat as w2v2

DEVICE = 'cpu'

model = w2v2.ModelCategorical
model = model.from_pretrained('./pre-trained_models/cat/torch')
model.to(DEVICE)
model.eval()


ModelCategorical(
  (wav2vec2): Wav2Vec2Model(
    (feature_extractor): Wav2Vec2FeatureEncoder(
      (conv_layers): ModuleList(
        (0): Wav2Vec2LayerNormConvLayer(
          (conv): Conv1d(1, 512, kernel_size=(10,), stride=(5,))
          (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (activation): GELUActivation()
        )
        (1-4): 4 x Wav2Vec2LayerNormConvLayer(
          (conv): Conv1d(512, 512, kernel_size=(3,), stride=(2,))
          (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (activation): GELUActivation()
        )
        (5-6): 2 x Wav2Vec2LayerNormConvLayer(
          (conv): Conv1d(512, 512, kernel_size=(2,), stride=(2,))
          (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (activation): GELUActivation()
        )
      )
    )
    (feature_projection): Wav2Vec2FeatureProjection(
      (layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
      (proj

Define the prediction function

In [10]:
import numpy as np
import torch


def predict_func(
        signal: np.ndarray,
        sampling_rate: int,
) -> np.ndarray:
    y = torch.from_numpy(signal).to(DEVICE)

    with torch.no_grad():
        y = model(y)
        y['logits'] = torch.softmax(y['logits'], dim=-1)
     
    return y['logits'].squeeze().detach().cpu().numpy()

categorical_predictions = ["anger", "happiness", "neutral","sadness"]

Load Emodb

In [29]:
from omegaconf import OmegaConf
from data.create_data_frames import load_testing_data
import audeer

conf = OmegaConf.create({"test":{"important_columns_labels" : 
                        { "emotion" : ["anger", "happiness", "neutral","sadness"]},
                          "cache_path": "./notebook_csv/emodb",
                          "data_source_labeled_test": [{"name": "emodb",
                                                        "version": "1.4.1",
                                                        "table": "emotion.categories.test.gold_standard"}],
                          "uncertainty_method": "entropy",
                          "type": "correctness",
                                                        },
                        "results_root": "./notebook_results/cat",
                        "sampling_rate": 16_000,
                        "testing": {"combined_df": "combined"},
                        
                        })
audeer.mkdir(conf.results_root)
df_true = load_testing_data(conf.test)
df_true

Get:   emodb v1.4.1
Cache: /cache/audb/emodb/1.4.1/fe182b91


                                                                                                    

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,emotion
file,start,end,Unnamed: 3_level_1
/cache/audb/emodb/1.4.1/fe182b91/wav/12a01Fb.wav,0 days,0 days 00:00:01.863625,happiness
/cache/audb/emodb/1.4.1/fe182b91/wav/12a01Nb.wav,0 days,0 days 00:00:01.721937500,neutral
/cache/audb/emodb/1.4.1/fe182b91/wav/12a01Wc.wav,0 days,0 days 00:00:02.358812500,anger
/cache/audb/emodb/1.4.1/fe182b91/wav/12a02Nb.wav,0 days,0 days 00:00:01.731937500,neutral
/cache/audb/emodb/1.4.1/fe182b91/wav/12a02Wa.wav,0 days,0 days 00:00:01.507437500,anger
...,...,...,...
/cache/audb/emodb/1.4.1/fe182b91/wav/16b10Fb.wav,0 days,0 days 00:00:02.583500,happiness
/cache/audb/emodb/1.4.1/fe182b91/wav/16b10Tb.wav,0 days,0 days 00:00:03.500625,sadness
/cache/audb/emodb/1.4.1/fe182b91/wav/16b10Td.wav,0 days,0 days 00:00:03.934187500,sadness
/cache/audb/emodb/1.4.1/fe182b91/wav/16b10Wa.wav,0 days,0 days 00:00:02.414125,anger


emodb false vs correct

In [30]:
from evaluation.tests import test_categorical, test_uncertainty
results = test_categorical(df_true, categorical_predictions, predict_func, "test", conf.test, conf)
print(results)
results = test_uncertainty(df_true, categorical_predictions, predict_func, "test", conf.test, conf)
print(results)

{'UAR': 0.8425925925925926, 'ACC': 0.875}


ConfigAttributeError: Missing key testing
    full_key: testing
    object_type=dict

<Figure size 640x480 with 0 Axes>

emodb vs cochlscene vs whitenoise

emodb + whitenoise

Everything for CE + KL(out)