MemoCMT

Official code repository for paper "MemoCMT: Cross-Modal Transformer-Based Multimodal Emotion Recognition System". Paper submitted to Scientific Reports

Abstract • How To Use • Citation • References •

Abstract

Recent speech recognition models leverage transformers to analyze the long-range context (both spatial and temporal) within speech signals. While audio transformers excel at capturing global context using self-attention mechanisms, this comes at a significant computational cost. In contrast, convolutional designs could be more efficient but are ineffective at modeling long-range dependencies. To address these limitations, we propose MemoCMT, a multimodal emotion recognition system that incorporates cross-modal transformer (CMT) to effectively capture both local and global contexts in speech signals and text transcripts. Our system utilizes advancements in self-supervised speech representation models and pre-trained deep language representation models to improve efficiency. Specifically, we utilize HuBERT to extract audio embeddings from audio input, while BERT is applied to extract text embeddings from text input. Importantly, we employ CMT to leverage feature representations from both audio and text embeddings. These feature representations then undergo a fusion mechanism for final classification. We evaluated our system on the IEMOCAP and ESD datasets, achieving 81.33% and 91.93% unweighted accuracy (UW-Acc), and 81.85% and 91.84% weighted accuracy (W-Acc), respectively.

How To Use

Clone this repository

git clone https://github.com/namphuongtran9196/MemoCMT.git 
cd MemoCMT

Create a conda environment and install requirements

conda create -n MemoCMT python=3.8 -y
conda activate MemoCMT
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Dataset used in this project is IEMOCAP and ESD.

cd scripts && python preprocess.py -ds IEMOCAP --data_root ./data/IEMOCAP_full_release

Before starting training, you need to modify the config file in the config folder. You can refer to the config file in the config folder for more details.

cd scripts && python train.py -cfg ../src/configs/hubert_base.py

You can also find our pre-trained models in the release.

Citation

GitHub @namphuongtran9196 ·

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.vscode		.vscode
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemoCMT

Official code repository for paper "MemoCMT: Cross-Modal Transformer-Based Multimodal Emotion Recognition System". Paper submitted to Scientific Reports

Abstract

How To Use

Citation

About

Releases

Packages

Languages

License

tpnam0901/MemoCMT

Folders and files

Latest commit

History

Repository files navigation

MemoCMT

Official code repository for paper "MemoCMT: Cross-Modal Transformer-Based Multimodal Emotion Recognition System". Paper submitted to Scientific Reports

Abstract

How To Use

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages