Just a couple of questions #1

mirix · 2023-08-08T07:36:12Z

Hi,

I see you have several models for Speech Emotion Recognition (SER).

Would you say Vesper is the best?

I have also noticed you use acted databases for training, in your experience, does learning between acted and, so called, natural databases transfer well?

I was considering training a (unimodal audio) model on CMU-MOSEI to check if training on a natural database would produce a better performing model in real-life scenarios.

What do you think?

Of course, one could argue that a significant percent of the utterances from YouTube are also acted and do not reflect real emotions, in which case, it it would be better go with professional actors than with amateur Youtubers...

Best,

Ed

mirix · 2023-08-10T12:49:55Z

I have forked MOSEI to build a unimodal SER dataset:

https://github.com/mirix/messaih/tree/main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Just a couple of questions #1

Just a couple of questions #1

mirix commented Aug 8, 2023 •

edited

Loading

mirix commented Aug 10, 2023

Just a couple of questions #1

Just a couple of questions #1

Comments

mirix commented Aug 8, 2023 • edited Loading

mirix commented Aug 10, 2023

mirix commented Aug 8, 2023 •

edited

Loading