this work is follow the paper
1.CH-SIMS: A Chinese Multimodal Sentiment Analysis Dataset with Fine-grained Annotations of Modality
2.Towards Multimodal Sarcasm Detection (An Obviously Perfect Paper)
the code is also referenced from the above work
dataset: MUStARD
t: text
v: visual
a: audio
late fusion with residual connection three level fusion framework
i:speaker independent set
context: text context
spekaer: speaker one-hot identification