Official code repository for paper "MemoCMT: Cross-Modal Transformer-Based Multimodal Emotion Recognition System". Paper submitted to Scientific Reports
Abstract • How To Use • Citation • References •
Recent speech recognition models leverage transformers to analyze the long-range context (both spatial and temporal) within speech signals. While audio transformers excel at capturing global context using self-attention mechanisms, this comes at a significant computational cost. In contrast, convolutional designs could be more efficient but are ineffective at modeling long-range dependencies. To address these limitations, we propose MemoCMT, a multimodal emotion recognition system that incorporates cross-modal transformer (CMT) to effectively capture both local and global contexts in speech signals and text transcripts. Our system utilizes advancements in self-supervised speech representation models and pre-trained deep language representation models to improve efficiency. Specifically, we utilize HuBERT to extract audio embeddings from audio input, while BERT is applied to extract text embeddings from text input. Importantly, we employ CMT to leverage feature representations from both audio and text embeddings. These feature representations then undergo a fusion mechanism for final classification. We evaluated our system on the IEMOCAP and ESD datasets, achieving 81.33% and 91.93% unweighted accuracy (UW-Acc), and 81.85% and 91.84% weighted accuracy (W-Acc), respectively.
- Clone this repository
git clone https://github.com/namphuongtran9196/MemoCMT.git
cd MemoCMT
- Create a conda environment and install requirements
conda create -n MemoCMT python=3.8 -y
conda activate MemoCMT
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
cd scripts && python preprocess.py -ds IEMOCAP --data_root ./data/IEMOCAP_full_release
- Before starting training, you need to modify the config file in the config folder. You can refer to the config file in the config folder for more details.
cd scripts && python train.py -cfg ../src/configs/hubert_base.py
- You can also find our pre-trained models in the release.
GitHub @namphuongtran9196 ·