This repository contains:
- Instructions on how to get the OCR extraction (text) and labels through a request form.
- Dataset download script and baselines
Please make sure you understand the data source here
The dataset contains memes that may be offensive to readers. Please see the Dataset Statement
section from our paper here to understand the risks before you proceed!
To request the dataset (labels and extracted texts), please fill out the following form.
After submitting the required info, you will see a link to a folder containing the datasets in a zip format and the password to uncompress the files.
Note: this dataset can only be used for non-commercial, research purposes.
Don't hesitate to report an issue, if something is broken or if you have further questions or feedback. Email: figmemes22 AT gmail DOT com
https://www.ukp.tu-darmstadt.de/
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
We present our code that we used to run the different baseline models.
Install all Python requirements listed in requirements.txt
(check here
to see how to install Pytorch on your system).
Tested with torch==1.10.0
, transformers==4.17.0
- newer version likely work but no guarantees.
You can install the requirements.txt like:
pip install --upgrade pip
pip install -r requirements.txt
Data (FigMemes):
The script expects the data resides in {data_root}
(specified as input arg to run the script), the {data_root}
must contain:
figmemes_annotations.tsv
with the labels and style,figmemes_ocrs.tsv
with OCR captions,- a
{name}_split.tsv
that sets train, validation, and test set to each example as well as optionally a cluster for OOD experiments.
CLIP and image-only models expect the images in {data_root}/images
.
You can download the images following these instructions.
Data (other):
You need to obtain the images and labels for the other datasets (Memotion 2
, MAMI) yourself.
The script expects again a {name}_split.tsv
that resplits the data
(NOTE: we append the image folder name to the name to differentiate between same-name images from different original splits).
The script also expects in {data_root}/style_labels
our style annotation tsvs.
Image Features: If you use VinVL or BERT+CLIP, you need to pre-extract the image features.
See here for a description.
Feature files have to be in {data_root}/features
VinVL checkpoint: You have to download the VinVL checkpoint manually. See here for more information. You can run
azcopy copy 'https://biglmdiag.blob.core.windows.net/vinvl/model_ckpts/vqa' base –recursive
and then in vqa\base\checkpoint-2000000 you find the model. This is not a VQA-finetuned model but the checkpoint after pretraining (as can be seen from instructions here: https://github.com/microsoft/Oscar/blob/master/VinVL_MODEL_ZOO.md)
Our code uses the Huggingface trainer class so most config values are defined there.
Image-only and multimodal CLIP:
python -u run_classification_img.py \
--linear_probe False \ #Toggle linear probe or full finetuning
--model_name_or_path "RN50x4" \ #CLIP model name or ConvNext model name
--clip_text True \ #False to use image-only CLIP
--class_weights True \ #False to not up-weight poitive examples
--data_root "/path/to" \
--split text_cluster \ #Name of the split file: {split}_split.tsv
--train_split $train_cluster \ #For OOD experiments: name of style or cluster to train on
--test_split "cluster_0,cluster_1,no_text" \ #For OOD experiments: name of styles or clusters to test on
--validation_split $train_cluster \ #For OOD experiments: style/cluster for dev set
--metric_for_best_model "f1" \
--lr_scheduler_type linear \
--greater_is_better True \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 128 \
--num_train_epochs 20 \
--warmup_ratio 0.00 \
--dataloader_num_workers 8 \
--gradient_accumulation_steps 1 \
--max_grad_norm 1.0 \
--learning_rate 2e-5 \
--weight_decay 0.05 \
--output_dir $output \
--do_predict \
--do_train \
--do_eval \
--evaluation_strategy epoch \
--save_strategy epoch \
--save_total_limit 2 \
--load_best_model_at_end True \
--logging_steps 25 \
--seed $seed \
--fp16 \
--report_to all \
--all_feature_type ""
Text-only or VinVL/Bert+CLIP:
python -u run_classification.py \
--all_feature_type "clip-RN50x4-6,vinvl-obj36" \ # "" for text-only or all used image features (avoids redundant dataset creation with different features)
--feature_type vinvl-obj36 \ # Features to use for model. "" for text-only
--visual_feat_dim 2054 \ #image feature dimension; 2054 for VinVL, 2560 for RN50x4
#--no_vision \ # Toggle to use no image features for text-only
--data_root "/path/to" \
--class_weights True \ #False to not up-weight poitive examples
--model_name_or_path "bert-base-uncased" \
--split text_cluster \ #Name of the split file: {split}_split.tsv
--train_split $train_cluster \ #For OOD experiments: name of style or cluster to train on
--test_split "cluster_0,cluster_1,no_text" \ #For OOD experiments: name of styles or clusters to test on
--validation_split $train_cluster \ #For OOD experiments: style/cluster for dev set
--metric_for_best_model "f1" \
--greater_is_better True \
--lr_scheduler_type linear \
--per_device_train_batch_size 64 \
--per_device_eval_batch_size 128 \
--num_train_epochs 20 \
--warmup_ratio 0.00 \
--dataloader_num_workers 8 \
--gradient_accumulation_steps 1 \
--max_grad_norm 1.0 \
--learning_rate 3e-5 \
--weight_decay 0.05 \
--output_dir $output \
--do_predict \
--do_train \
--do_eval \
--evaluation_strategy epoch \
--save_strategy epoch \
--save_total_limit 2 \
--load_best_model_at_end True \
--logging_steps 25 \
--seed $seed \
--report_to all \
--fp16
CLIP-MM-OOD: This is identical to CLIP above expect the script is changed and one option is added to set the learning rate for the second (full fine-tune) stage.
python -u run_classification_img_twostage.py \
--learning_rate2 2e-5 \
If you find this repository helpful, feel free to cite the following publication:
@inproceedings{liu-etal-2022-figmemes,
title = "{F}ig{M}emes: A Dataset for Figurative Language Identification in Politically-Opinionated Memes",
author = "Liu, Chen and
Geigle, Gregor and
Krebs, Robin and
Gurevych, Iryna",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-main.476",
pages = "7069--7086",
}