model | source/framework | usage | download link |
---|---|---|---|
Res2Net Patch Detector | mmdetection/pytorch | extract image patch | link |
DeepFillV2 | mmediting/pytorch | remove text from meme image | link |
InceptionV2 OID | tf-hub/tensorflow | extract meme image bbox | auto download |
FairFace Classifier | fairface/pytorch | classifiy race of face/head detect by InceptionV2 | link is in the repo |
VisualGnome FasterRCNN | py-bottom-up-attention/pytorch | extract RoI feature from inpainted image | auto download |
Some of the memes are compose by multiple image(patch), and offen the meaning of the meme will different depend on which text located on which image patch. So I trained a imgae patch detector with off-the-shelf faster-rcnn detector provide by mmdetection and training data generated by random stick togehter 2~3 images from GQA. Script and config that train the model and generating the dataset could be found in repo linked in the table above.
NOTE: If you want to repoduce the original competetion result, you should use the pretrined model sinece both the training and dataset generating process are not deterministic.
Data processing flow is dived into two stage. First stage detecting objects & text and creating clean images. Second stage utilze those infomation to extract more feature from image and integrade the extra feature into original meme annotation.
We first should download and unzip the facebook hateful meme data to ROOT/data
. And we also need to download pretrain models to ROOT/pretrain_model
. After doing all that your project structure now should look like:
.
├── VL-BERT
├── data
│ └── hateful_memes
│ ├── LICENSE.txt
│ ├── README.md
│ ├── dev_seen.jsonl
│ ├── dev_unseen.jsonl
│ ├── img
│ ├── test_seen.jsonl
│ ├── test_unseen.jsonl
│ └── train.jsonl
├── data_utils
├── pretrain_model
│ ├── deepfillv2_256x256_8x2_places_20200619-10d15793.pth
│ └── faster_rcnn_r2_101_fpn_2x_img_clip
│ ├── epoch_1.pth
│ ├── epoch_2.pth
│ ├── epoch_3.pth
│ ├── faster_rcnn_r2_101_fpn_2x_img_clip.py
│ └── latest.pth
└── test
If you want to go straight to model training, you can download pre-builded data from here:
meme annotation with entity and box annotation with race tag
image feature pt file for ernie-vil
image feature npz file for untier
And you can find out where to put those files in the following sections.
- OCR
- Image Inpainting
- Get OID bbox
- Face detection and human race classification
# data/hateful_memes/img
# --> data/hateful_memes/img_clean
# --> data/hateful_memes/box_annos.race.json
# --> data/hateful_memes/split_img_clean
# --> data/hateful_memes/split_img_clean_boxes.json
bash run_stage_1.sh
After the first stage your data folder should look like:
.
├── VL-BERT
├── data
│ ├── hateful_memes
│ │ ├── img
│ │ ├── img_clean # img with text removed
│ │ ├── img_mask_3px # text mask used in removing text using DeepFillV2
│ │ ├── split_img_clean # if meme image is compose by multiple patches, it will be break down and store to here
│ │ ├── split_img_clean_boxes.json # results of image patche detection
│ │ ├── box_annos.json # OID V4 object detection result
│ │ ├── box_annos.race.json # same as box_annos.json but with race and gender tags attach to bbox
│ │ ├── face_race_boxes.json # race and gender tags
│ │ ├── ocr.json # raw EasyOCR detction result
│ │ └── ocr.box.json # same EasyOCR detction result but in bbox format
│ └── ...
└── ...
-
Extract RoI feature
# data/hateful_memes/img_clean # --> data/hateful_memes/hateful_memes_v2*.pt # --> data/uniter/MEME_NPZ_CLEAN bash run_extract_feat.sh
-
Web Entity Detection
NOTE: If you want to reproduce the original experiment or don't have access to GCP service you could just download the detection results from link and place it underROOT/data
. If your want to generate new web entity result for new dataset, follow this guide to setup gcloud enviorment.bash run_web_det.sh
-
Insert extra feature into annotations
bash run_anno_preocessing.sh
After the second stage your data folder should look like:
.
├── VL-BERT
├── data
│ ├── uniter
│ │ └── MEME_NPZ # img feature for uniter
│ ├── hateful_memes
│ │ ├── LICENSE.txt
│ │ ├── img
│ │ ├── img_clean
│ │ ├── README.md
│ │ ├── hateful_boxes_attr.json
│ │ ├── hateful_memes_v2.pt # img feature for ernie-vil
│ │ ├── hateful_memes_v2.aug.0.pt # img feature for ernie-vil
│ │ ├── hateful_memes_v2.aug.1.pt # img feature for ernie-vil
│ │ ├── hateful_memes_v2.aug.2.pt # img feature for ernie-vil
│ │ ├── dev_seen.entity.jsonl
│ │ ├── dev_unseen.entity.jsonl
│ │ ├── test_seen.entity.jsonl
│ │ ├── test_unseen.entity.jsonl
│ │ ├── train_dev_all.entity.jsonl
│ │ └── train.entity.jsonl
│ └── entity_json # web entity detection
│ ├── 05819.json
│ ├── ...
│ ├── 01236.0.json
│ └── 01236.1.json
└── ...
JSONl files with .eniity.jsonl
extentions are the final output we will using to train the modified VL-BERT in the next step.