Multimodal Discourse

PyTorch Code for the following paper at EMNLP2022 findings:
Title: Understanding Social Media Cross-Modality Discourse in Linguistic Space
Authors: Chunpu Xu, Hanzhuo Tan, Jing Li, Piji Li
Institute: PolyU and NUAA
Abstract
The multimedia communications with texts and images are popular on social media. However, limited studies concern how images are structured with texts to form coherent meanings in human cognition. To fill in the gap, we present a novel concept of cross-modality discourse, reflecting how human readers couple image and text understandings. Text descriptions are first derived from images (named as subtitles) in the multimedia contexts. Five labels -- entity-level insertion, projection and concretization and scene-level restatement} and extension --- are further employed to shape the structure of subtitles and texts and present their joint meanings. As a pilot study, we also build the very first dataset containing 16K multimedia tweets with manually annotated discourse labels. The experimental results show that the multimedia encoder based on multi-head attention with captions is able to obtain the-state-of-the-art results.
Framework illustration

Data

The annotated 16k multimedia tweets could be find from data/social_text_all.json, which contains the tweet texts, annotated labels and generated captions. For raw tweet image data, please find it from here. You can also download the extracted image features from here.

Installation

# Create environment
conda create -n multimodal_discourse  python==3.6
# Install pytorch 
conda install -n multimodal_discourse  -c pytorch pytorch==1.10.0 torchvision

Training

python run_img_text_caption.py --img_feature_path final_dataset_features_att

We provide our pretrained models in here

License

This project is licensed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
__pycache__		__pycache__
data		data
models		models
dataloader.py		dataloader.py
eval.py		eval.py
model.pdf		model.pdf
model.png		model.png
optimization.py		optimization.py
opts.py		opts.py
readme.md		readme.md
run_img_text_caption.py		run_img_text_caption.py
train.py		train.py
utils.py		utils.py

cpaaax/Multimodal_Discourse

Folders and files

Latest commit

History

Repository files navigation

Multimodal Discourse

Data

Installation

Training

License

About

Resources

Stars

Watchers

Forks

Languages