Skip to content

AlinaShapiro/Audio-Classification-HF

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AUDIO-CLASSIFICATION-HF

Introduction

Emotion recognition is a powerful tool that can be used in a variety of applications, such as improving customer service, personalizing user experiences, and helping people with speech disorders communicate more effectively.

This project aimes to develop a solution that can accurately detect and analyze the emotional state of call center employees during customer interactions, leveraging the power of transformer models such as Wav2Vec2.0, HuBERT and WavLM.

Datasets

The are two datasets that are used for this project:

  • Dusha is a bi-modal corpus suitable for speech emotion recognition (SER) tasks. The dataset consists of about 300 000 audio recordings with Russian speech, their transcripts and emotiomal labels. The corpus contains approximately 350 hours of data. Four basic emotions that usually appear in a dialog with a virtual assistant were selected: Happiness (Positive), Sadness, Anger and Neutral emotion.

NB: In this project only small subset of Dusha dataset was used.

  • EmoCall is a data set of 329 telephone recordings with Russian speech from 10 actors. Actors spoke from a selection of 10 sentences for each emotion. The sentences were presented using one of six different emotions (Anger, Positive, Neutral, Sad and Other).

The files contain speech that is sampled at 16 kHz and saved as 16-bit PCM WAV files.

Models

Pre-trained speech model based on transformer architecture

All checkpoints can be found here

Models Pretrained Checkpoints
Wav2Vec2.0 facebook/hubert-large-ls960-ft
HuBERT jonatasgrosman/wav2vec2-large-xlsr-53-russian
WavLM microsoft/wavlm-large

Speech models after training on Dusha dataset

Models (Group_1) Checkpoints
Wav2Vec2.0 dusha/wav2vec2/audio-model
HuBERT dusha/hubert/audio-model
WavLM dusha/wavlm/audio-model

Speech models after training on Dusha and EmoCall datasets

Models (Group_2) Checkpoints
Wav2Vec2.0 emocall/wav2vec/audio-model
HuBERT emocall/hubert/audio-model
WavLM emocall/wavlm/audio-model

Training

In the scripts folder you can fined training and evaluation scripts for Wav2Vec2.0, HuBERT, WavLM on Dusha and EmoCall Datasets.

Results

Models Accuracy on EmoCall (Group_1) Accuracy on EmoCall (Group_2)
Wav2Vec2.0 0.88 0.98
HuBERT 0.73 0.98
WavLM 0.93 0.99

According to the results of evaluation on EmoCall dataset, it was decided to use WavLM model (Group_2) for the prototype application as it demonstrates its ability to accurately recognize emotions in speech 99% accuracy.

Requirements

To install the necessary dependencies, run the following command: Clone the repository:

git clone https://github.com/AlinaShapiro/Audio-Classification-HF.git

Install the requirements

pip install -r requirements.txt

Usage

To run the app, run the following command:

python app.py

Screenshots of App

Main Window alt text

Select a video(.mp4) or audio(.wav, .mp3, .m4a) file to analyze by cklicking on "Выбрать файл" button alt text

Cklick on a playback button to play selected video or audio file alt text

Cklick on a "Анализировать эмоции" button to analayze file on the subject of presence of certain emotion (Anger, Positive, Neutral, Sad and Other) with WavLM model. alt text

After the process of analyzing is done you can see a graphical report of emotional state alt text In addition, you can dowload an emotional state report to .json file by cklicking on "Выгрузить отчет" button. alt text

Conclusion

Emotion recognition in speech is a challenging but important task that has many practical applications. By using application that accuratly identifies emotional state of call-center employees call-center athorities may use that to enhance employees' emotional well-being, productivity level.

About

Audio emotion recognition using Huggingface Library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.7%
  • Python 9.3%