Skip to content
Switch branches/tags
Go to file


Failed to load latest commit information.
Latest commit message
Commit time


From September 27, 2019, all future readings will be recorded in the document. Because I will graduate soon, there are many things to do, so the update is slow. After a period of time, I will organize all papers in a large scale during my graduate study

Multimodal Sentiment analysis

1、ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection

2、Context-Dependent Sentiment Analysis in User-Generated Videos (ACL 2017).

3、Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis(ICDM 2017).

Code for 2、3:

4、Tensor Fusion Network for Multimodal Sentiment Analysis(EMNLP 2017)

Code for 4:

5、Multimodal Transformer for Unaligned Multimodal Language Sequences(ACL 2019)

Code for 5:

6、Memory Fusion Network for Multi-view Sequential Learning(AAAI 2018)

Code for 6:


Code for 7: April 15th, 2020)

8、Multimodal Transformer for Unaligned Multimodal Language Sequences(ACL 2019)

Code for 8:

9、Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling

Code for 9:

10、Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities(2019 AAAI)

Code for 10:

11、Words Can Shift:Dynamically AdjustingWord Representations Using Nonverbal Behaviors(2019 AAAI)

Code for 11:

12、A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis(2020 ACL workshop)

Code for 12:

13、Low Rank Fusion based Transformers for Multimodal Sequences(2020 ACL workshop)

Multimodal BERT

1、VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV2019)

2、ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS2019)

3、VisualBERT: A Simple and Performant Baseline for Vision and Language

4、Selfie: Self-supervised Pretraining for Image Embedding

5、Contrastive Bidirectional Transformer for Temporal Representation Learning

6、M-BERT: Injecting Multimodal Information in the BERT Structure

7、LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP2019)

8、Fusion of Detected Objects in Text for Visual Question Answering (EMNLP2019)

9、Unified Vision-Language Pre-Training for Image Captioning and VQA

Code for 9:

10、VL-BERT: Pre-training of Generic Visual-Linguistic Representations

11、Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training

12、UNITER: Learning UNiversal Image-TExt Representations

13、SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering

14、Multimodal Transformer for Unaligned Multimodal Language Sequences

Code for 14:

15、Integrating Multimodal Information in Large Pretrained Transformers(2020 ACL)

Code for 15:

16、CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis(2020 ACMMM,ours)

Code for 16:

Multi-task Sentiment analysis

1、Attention-augmented end-to-end multi-task learning for emotion prediction from speech.

2、Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis.

3、Multi-task Learning for Target-dependent Sentiment Classification.

4、Sentiment and Sarcasm Classification with Multitask Learning.


This paper list is about multimodal sentiment analysis.




No releases published


No packages published