<a href="https://colab.research.google.com/github/dameem4/new-project/blob/master/Kasali_CSC_40098_Disseration_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Table of Contents

<details><summary>
<font color='Blue'> I. Installation of MMF & dependencies </font></summary>

- Install MMF from source
</details>

<details><summary>
<font color='Blue'> II. Download the datasets & convert them into MMF format </font></summary>


</details>


<details><summary>
<font color='Blue'> III. Feature Extraction </font></summary>

</details>

<details><summary>
<font color='Blue'> IV. Fine-tuning pre-trained VisualBERT models on Hateful Memes </font></summary>

</details>


<details><summary>
<font color='Blue'> V. Generate predictions for the Challenge (`test_unseen.jsonl`) </font></summary>

</details>


## <font color='green'> <b> I. Installation of MMF & dependencies </b> </font>

Please set your `$HOME` directory.\
**e.g.** For *Linux* users it can be: `"/home"`,\
For *Colab* it would be: `"/content"`

In [1]:
import os
home = "/content"
os.chdir(home)
os.getcwd()

'/content'

In [None]:
# Install specified versions of `torch` and `torchvision`, before installing mmf (causes an issue)
!pip install torch torchvision -f https://download.pytorch.org/whl/torch_stable.html

#### *Install MMF from source* 


In [None]:
# Clone the following repo where mmf does not install default image features, 
# since we will use our own features
!git clone https://github.com/facebookresearch/mmf.git
%cd /content/mmf
!pip install --editable .

In [5]:
!rm -rf mmf

In [6]:
os.chdir(os.path.join(home, "mmf"))

---
## <font color='green'> <b> II. Download the datasets & convert them into *MMF* format </b> </font> <font color='red'><b> --Action required!-- </b></font>

### <font color='Orchid'> <b> Hateful Memes  dataset </b> </font>

Please download the `Hateful Memes Dataset` from the official challenge webpage: https://hatefulmemeschallenge.com/#download

After filling the form the `hateful_memes.zip` file will be downloaded, which includes all the required data including images. Please define the variable `PATH_TO_ZIP_FILE` in the following code cell which stores the full path of the downloaded `.zip` file:


In [9]:
PATH_TO_ZIP_FILE = "/content/drive/Othercomputers/MyLaptop/memes/hateful_memes.zip"
!cp -r $PATH_TO_ZIP_FILE /content/mmf/

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install mmf

In [11]:
# Add the mmf folder to Python Path
os.environ['PYTHONPATH'] += ":/content/mmf/"

In [None]:
!mmf_convert_hm --zip_file="hateful_memes.zip" --password DontTellYou

In [None]:
!pip install mmf

In [None]:
# Check how many images we have in total
!ls /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/img/ | wc -l

That means there are `12.140` **'uniquely named'** images in total and you might recall that the sizes of each set was the following:

- `|train.jsonl| = 8.500`
- `|dev_seen.jsonl| = 500`
- `|dev_unseen.jsonl| = 540`
- `|test_seen.jsonl| = 1.000`
- `|test_unseen.jsonl| = 2.000`

Well, this makes `8.500 + 500 + 540 + 1.000 + 2.000 = 12.540` in total. \
> *Is there something wrong?*\
> **TL;DR:** Nope. Some images in `dev_seen` are used in `dev_unseen`, too. To be specific, they have `400` common images. Hence, in total we have `12.540 - 400 = 12.140` *'unique'* images.\
See <font color='orange'> <b> Extras </b> </font> --> <font color='Gold'><b> Number of 'unique' (based on file names) images </b></font> at the end of this script to see the explanation in detail.

In [19]:
# Free up the disk by removing .zip, .tar files
!rm -rf /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/hateful_memes.zip
!rm -rf $home/mmf/hateful_memes.zip

### <font color='Orchid'> <b> Memotion dataset </b> </font>

There are 2 options for downloading the dataset: 
1. download the dataset (a `.zip` file) using `Kaggle API`\
OR
2. download the dataset (a `.zip` file) from [Kaggle](https://www.kaggle.com/williamscott701/memotion-dataset-7k) directly, **(<font color='red' >preferred </font> if you're not familiar with Kaggle API)**

#### <font color='Thistle'> <b> 1. Download Memotion dataset using `Kaggle API` </b> </font>

Check out the official documentation to get more information on Kaggle API and how to create a Kaggle API Key:
- [Link#1](https://github.com/Kaggle/kaggle-api#api-credentials)
- [Link#2](https://www.kaggle.com/docs/api)

The API Key is stored in a file named `kaggle.jsonl`, which has the folowing line inside: 
`{"username":"your_user_name","key":"some_values_here"}`

> Upload the `kaggle.json` file to your `$HOME` directory and run the following cell.

In [None]:
# Install kaggle library
!pip install -q kaggle
# Create a directory where API key will be stored
!mkdir -p ~/.kaggle
# Move the API key to where Kaggle expects it to be
!mv $home/kaggle.json ~/.kaggle/
# Give according rights to the file
!chmod 600 /root/.kaggle/kaggle.json
# Finally, download the dataset (.zip file)
!kaggle datasets download -d williamscott701/memotion-dataset-7k
# Unzip the data 
!unzip -qq memotion-dataset-7k.zip -d $home/

#### <font color='Thistle'> <b> 2. Download Memotion dataset directly from [Kaggle](https://www.kaggle.com/williamscott701/memotion-dataset-7k) </b> </font>

Download the dataset and put the `.zip` file into your `$HOME` directory and then run the following cell:

In [None]:
# Unzip the data 
# !unzip memotion-dataset-7k.zip -d $home/

#### <font color='Thistle'> <b> Labeling Memotion Dataset </b> </font>

We have added `Memotion Dataset` to `Hateful Memes Dataset` and fine-tuned some models on the *aggregated* data. But there was no significant improvement seen neither on the `ROC-AUC score`, nor on the `accuracy`. We then discovered that the dataset is *horribly* labeled. Therefore, one needs to label the dataset.

So we went through the dataset and cherry-picked the memes that would be suitable for the challenge, considering the idea of `Hateful Memes Challenge`.

The following cell can be run to clone a repository which includes helpful scripts for the project such as; a script for labeling the `Memotion Dataset` and saving the data in the same format as the `Hateful Memes Dataset`.

In [None]:
os.chdir(home)
!git clone https://github.com/facebookresearch/detectron2.git

Cloning into 'hateful_memes-hate_detectron'...
remote: Enumerating objects: 47, done.[K
remote: Counting objects: 100% (47/47), done.[K
remote: Compressing objects: 100% (31/31), done.[K
remote: Total 47 (delta 19), reused 43 (delta 15), pack-reused 0[K
Unpacking objects: 100% (47/47), done.


Labeling the dataset is not necessary for reproducing our results but one can check out the [labeling script](https://github.com/rizavelioglu/hateful_memes-hate_detectron/tree/main/utils/label_memotion.py) and execute the following line of code to run the script and see how the labeling is done.

```
# Start labeling Memotion dataset and save it at the end
%run $home/hateful_memes-hate_detectron/utils/label_memotion.py --home $home
```

---
> In total, we have labeled $328$ memes.\
Check out the following file to find the ones we labeled: [/hateful_memes-hate_detectron/utils/label_memotion.jsonl](https://github.com/rizavelioglu/hateful_memes-hate_detectron/tree/main/utils/label_memotion.jsonl)

Next, we move those labeled images from `Memotion Dataset` into the same folder where the images from `Hateful Memes Dataset` are, so that when the image features are being extracted all the images are inside the same folder.


In [None]:
import pandas as pd
# read the .jsonl file and get the img column
labeled_memo_samples = pd.read_json(os.path.join(home, "hateful_memes-hate_detectron/utils/label_memotion.jsonl"), lines=True)['img']
# parse the img entries and get the image names
labeled_memo_samples = [i.split('/')[1] for i in list(labeled_memo_samples)]

img_dir = os.path.join(home, f"memotion_dataset_7k/images/")
for img in labeled_memo_samples:
    os.rename(f"{img_dir+img}", f"/root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/img/{img}")

In [None]:
# Check how many images we have in total
!ls /root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/img/ | wc -l

12468


### <font color='Orchid'> <b> Merging the two datasets to get a larger training data</b> </font>


Simply execute the following cell which concatanates;
- Labeled Memotion dataset,
- Hateful Memes' training data, and
- 100 images from `dev_seen.jsonl` that are not in `dev_unseen.jsonl`

and generates `train_v10.jsonl`, which will be used for fine-tuning.

In [None]:
!python $home/hateful_memes-hate_detectron/utils/concat_memotion-hm.py --home $home

---
## <font color='green'> <b> III. Feature Extraction </b> </font>

### <font color='lightgreen'> <b> Extract image features using [`mmf/tools/scripts/features/extract_features_vmb.py`](https://github.com/facebookresearch/mmf/blob/master/tools/scripts/features/extract_features_vmb.py) </b> </font>

#### Install packages & repos

In [None]:
import os
os.chdir(home)
!git clone https://gitlab.com/vedanuj/vqa-maskrcnn-benchmark

In [None]:
!pip install ninja yacs cython matplotlib

In [None]:
os.chdir(os.path.join(home, "vqa-maskrcnn-benchmark"))
!rm -rf build
!python setup.py build develop

#### Extract!

In [None]:
# !wget https://dl.fbaipublicfiles.com/pythia/detectron_model/FAST_RCNN_MLP_DIM2048_FPN_DIM512.pkl
# !wget https://dl.fbaipublicfiles.com/pythia/detectron_model/e2e_faster_rcnn_X-101-64x4d-FPN_1x_MLP_2048_FPN_512.yaml
os.chdir(os.path.join(home, "mmf/tools/scripts/features/"))
out_folder = os.path.join(home, "features/")

!python extract_features_vmb.py --config_file "https://dl.fbaipublicfiles.com/pythia/detectron_model/detectron_model_x152.yaml" \
                                --model_name "X-152" \
                                --output_folder $out_folder \
                                --image_dir "/root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/img/" \
                                --num_features 100 \
                                # --exclude_list "/content/exclude.txt"
                                # --feature_name "fc6" \
                                # --confidence_threshold 0. \

---
## <font color='green'> <b> IV. Fine-tuning pre-trained VisualBERT models on Hateful Memes </b> </font>

*italicised text*### <font color='Violet'> <b> Fine tuning  </b> </font>

In [None]:
"""
Uncomment it if needed
"""

# os.chdir(home)
# # Define where image features are
# feats_dir = os.path.join(home, "features")
# # Define where train.jsonl is
# train_dir = os.path.join(home, "train_v9.jsonl")

# !mmf_run config="projects/visual_bert/configs/hateful_memes/from_coco.yaml" \
#         model="visual_bert" \
#         dataset=hateful_memes \
#         run_type=train_val \
#         checkpoint.max_to_keep=1 \
#         checkpoint.resume_zoo=visual_bert.pretrained.cc.full \
#         training.tensorboard=True \
#         training.checkpoint_interval=50 \
#         training.evaluation_interval=50 \
#         training.max_updates=3000 \
#         training.log_interval=100 \
#         dataset_config.hateful_memes.max_features=100 \
#         dataset_config.hateful_memes.annotations.train[0]=$train_dir \
#         dataset_config.hateful_memes.annotations.val[0]=hateful_memes/defaults/annotations/dev_unseen.jsonl \
#         dataset_config.hateful_memes.annotations.test[0]=hateful_memes/defaults/annotations/test_unseen.jsonl \
#         dataset_config.hateful_memes.features.train[0]=$feats_dir \
#         dataset_config.hateful_memes.features.val[0]=$feats_dir \
#         dataset_config.hateful_memes.features.test[0]=$feats_dir \
#         training.lr_ratio=0.3 \
#         training.use_warmup=True \
#         training.batch_size=32 \
#         optimizer.params.lr=5.0e-05 \
#         env.save_dir=./sub1 \
#         env.tensorboard_logdir=logs/fit/sub1 \

##### **Visualize losses/accuracy via Tensorboard**

In [None]:
# Load the TensorBoard notebook extension
# %load_ext tensorboard

In [None]:
# %tensorboard --logdir logs/fit

---
## <font color='green'> <b> V. Generate predictions for the Challenge (`test_unseen.jsonl`) </b> </font>

*italicised text*### <font color='Thistle'> <b> Testing Phase 1 </b> </font>

In [None]:
"""
Uncomment it if needed
"""

# os.chdir(home)
# # where checkpoint is
# ckpt_dir = os.path.join(home, "sub1/best.ckpt")
# feats_dir = os.path.join(home, "features/feats_hm")

# !mmf_predict config="projects/visual_bert/configs/hateful_memes/defaults.yaml" \
#     model="visual_bert" \
#     dataset=hateful_memes \
#     run_type=test \
#     checkpoint.resume_file=$ckpt_dir \
#     checkpoint.reset.optimizer=True \
#     dataset_config.hateful_memes.annotations.val[0]=hateful_memes/defaults/annotations/dev_unseen.jsonl \
#     dataset_config.hateful_memes.annotations.test[0]=hateful_memes/defaults/annotations/test_unseen.jsonl \
#     dataset_config.hateful_memes.features.train[0]=$feats_dir \
#     dataset_config.hateful_memes.features.val[0]=$feats_dir \
#     dataset_config.hateful_memes.features.test[0]=$feats_dir \


### <font color='Gold'> <b> Image feature type conversion </b> </font>
Convert image features from `.npy` --> `.lmdb` and vice versa

You can also try to use the .npy files directly. Just point to the folder which contains those files in your config. lmdb is not a necessary requirement.

#### <font color='PaleGoldenrod'> <b> Convert .npy files to .lmdb </b> </font>

In [None]:
import argparse
import glob
import os
import pickle

import lmdb
import numpy as np
import tqdm


class LMDBConversion():
    def __init__(self, features_folder, lmdb_path):
        self.features_folder = features_folder
        self.lmdb_path = lmdb_path

    def convert(self):
        env = lmdb.open(self.lmdb_path, map_size=1099511627776)
        id_list = []
        features = glob.glob(
            os.path.join(self.features_folder, "**", "*.npy"), recursive=True
        )


        with env.begin(write=True) as txn:
            for infile in tqdm.tqdm(features):
                reader = np.load(infile, allow_pickle=True)
                item = {}
                split = os.path.relpath(infile, self.features_folder).split(
                    ".npy"
                )[0]
                item["feature_path"] = split
                key = split.encode()
                id_list.append(key)

                item["features"] = reader.item().get("features")
                item["image_height"] = reader.item().get("image_height")
                item["image_width"] = reader.item().get("image_width")
                item["num_boxes"] = reader.item().get("num_boxes")
                item["objects"] = reader.item().get("objects")
                item["cls_prob"] = reader.item().get("cls_prob", None)
                item["bbox"] = reader.item().get("bbox")

                txn.put(key, pickle.dumps(item))

            txn.put(b"keys", pickle.dumps(id_list))


In [None]:
features_folder = '/content/features/'
lmdb_path = "/content/"
lmdb_converter = LMDBConversion(features_folder, lmdb_path)
lmdb_converter.convert()

100%|██████████| 16522/16522 [01:30<00:00, 181.86it/s]


#### <font color='PaleGoldenrod'> <b> Convert .lmdb to .npy </b> </font>
just to check if everything's okay

In [None]:
features_folder = "/content/features_from_lmdb/"
lmdb_path = "/content/drive/MyDrive/"

def extract():
    os.makedirs(features_folder, exist_ok=True)
    env = lmdb.open(
        lmdb_path,
        max_readers=1,
        readonly=True,
        lock=False,
        readahead=False,
        meminit=False,
    )
    with env.begin(write=False) as txn:
        _image_ids = pickle.loads(txn.get(b"keys"))
        for img_id in tqdm.tqdm(_image_ids):
            item = pickle.loads(txn.get(img_id))
            img_id = img_id.decode("utf-8")
            
            tmp_dict = {
                "image_id"    : img_id,
                "bbox"        : item["bbox"],
                "num_boxes"   : item["num_boxes"],
                "image_height": item["image_height"],
                "image_width" : item["image_width"],
                "objects"     : item["objects"],
                "cls_prob"    : item["cls_prob"],
            }

            info_file_base_name = str(img_id) + "_info.npy"
            file_base_name = str(img_id) + ".npy"

            np.save(
                os.path.join(features_folder, file_base_name),
                item["features"],
            )
            np.save(
                os.path.join(features_folder, info_file_base_name),
                tmp_dict,
            )

In [None]:
extract()

100%|██████████| 16522/16522 [04:24<00:00, 62.54it/s] 


In [None]:
data = np.load("/content/features_from_lmdb/01243.npy", allow_pickle=True)
data_info = np.load("/content/features_from_lmdb/01243_info.npy", allow_pickle=True)

In [None]:
data_info.item()

{'bbox': array([[1.39431671e+02, 2.48186661e+02, 4.36927917e+02, 7.72290344e+02],
        [1.02417374e+02, 2.38131317e+02, 3.80355438e+02, 6.80631226e+02],
        [2.33086594e+02, 3.11982147e+02, 3.19798279e+02, 3.83706116e+02],
        [2.26076691e+02, 3.67014771e+02, 3.88840271e+02, 5.06836884e+02],
        [1.80767273e+02, 1.68776794e+02, 5.02467957e+02, 6.86551453e+02],
        [1.05255386e+02, 2.35436798e+02, 4.55437012e+02, 6.23370544e+02],
        [1.89080795e+02, 6.48054321e+02, 2.63529999e+02, 7.08438538e+02],
        [1.52574844e+02, 2.76297699e+02, 4.75713959e+02, 6.58706787e+02],
        [6.44385071e+01, 1.51709244e+02, 4.10103607e+02, 5.57978638e+02],
        [1.49728775e+02, 1.49498413e+02, 4.97210022e+02, 5.43969360e+02],
        [3.32468300e+01, 2.24651459e+02, 3.89471802e+02, 6.22743164e+02],
        [7.07439804e+01, 1.73837357e+02, 5.24785095e+02, 5.11764557e+02],
        [1.02569702e+02, 1.88668671e+02, 2.00243805e+02, 2.64704437e+02],
        [6.80242767e+01, 3.265

In [None]:
data.shape

(36, 2048)

### <font color='Gold'><b> Number of 'unique' (based on file names) images </b></font>


In [None]:
import pandas as pd
import os

annotation_dir = "/root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/annotations"
img_dir = "/root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/img/"

# Collect all the annotations (from Phase-2)
train       = pd.read_json(f"{annotation_dir}/train.jsonl", lines=True)
dev_seen    = pd.read_json(f"{annotation_dir}/dev_seen.jsonl", lines=True)
dev_unseen  = pd.read_json(f"{annotation_dir}/dev_unseen.jsonl", lines=True)
test_seen   = pd.read_json(f"{annotation_dir}/test_seen.jsonl", lines=True)
test_unseen = pd.read_json(f"{annotation_dir}/test_unseen.jsonl", lines=True)

# Create 2 sets: 
#   A set of strings, 'a': for all the image names,
#   A set of lists, 'b': for all the image names in dataset, e.g. train, dev_seen, etc.
a = os.listdir(f"{img_dir}")
b = []
for i in [train, dev_seen, dev_unseen, test_seen, test_unseen]:
    b.append(list(i["img"].str.split("/").str.get(1)))

set_mapping = ['train', 'dev_seen', 'dev_unseen', 'test_seen', 'test_unseen']
total_size = 0
print("#of images in: ")
for idx, i in enumerate(b):
    total_size += len(set(i))
    print(f"\t'{set_mapping[idx]}'  \t:", len(set(i)))
else:
    print(f"\nIn total there are {total_size} images,",
          "\nBut the # of images in /img/ directory is: ", len(a))

In [None]:
# First, let's check if all the images are within jsonl files, in other words: 
# 'do we have an image in /img folder that's not in one of the .jsonl files?'
# 0 means every image in /img directory is in a jsonl file
print("#of images that are not in one of the .jsonl files: ", 
      len(set(a).symmetric_difference(set(b[0] + b[1] + b[2] + b[3] + b[4]))))

In [None]:
print("#of same images in between: ")
for i in range(0, 5):
    print("\n")
    for j in range(0, 5):
        if i != j:
            print(f"{set_mapping[i], set_mapping[j]}   \t: {len(set(b[i]) & set(b[j]))}")

As seen, `dev_seen.jsonl` and `dev_unseen.jsonl` have `400` same images. Let's double check that:

In [None]:
print(f"#of same images in {set_mapping[1], set_mapping[2]}: {len(set(b[1]) & set(b[2]))}",
      f"\n#of different images: {len(set(b[1]).symmetric_difference(set(b[2])))}")

#of same images in ('dev_seen', 'dev_unseen'): 400 
#of different images: 240


That means in Phase-2, `100` images were removed from `dev_seen.jsonl` and `140` new images are added to the validation set.\
Hence;  `|dev_unseen.jsonl|=500-100+140=540`

### <font color='Gold'> <b> Image Feature Extraction </b> </font>

#### <font color='PaleGoldenrod'> <b> Discovering default image features from MMF </b> </font>

Download the features for Phase-2 which was published by MMF on 01.10.2020 [[source of link]](https://github.com/facebookresearch/mmf/blob/518a5a675586e4dc1b415a52a8a80c75edfc2960/mmf/configs/zoo/datasets.yaml#L232)

In [None]:
# Download the features for Phase-2 from the following link
!wget https://dl.fbaipublicfiles.com/mmf/data/datasets/hateful_memes/defaults/features/features_2020_10_01.tar.gz

Extract `features_2020_10_01.tar.gz`:
> this file actually extracts a folder `detectron.lmdb/` which
stores the `data.mdb` file, which is where all the image features are compressed into.

In [None]:
!tar -xzf /content/features_2020_10_01.tar.gz

Extract image features from `detectron.lmdb/` folder to `/features/`.

**Note**
Interrupt the execution as we only want to have a sneek peek into a few features. So, no need to extract the whole image features.

In [None]:
!python /content/mmf/tools/scripts/features/lmdb_conversion.py \
        --mode "extract" \
        --lmdb_path "/content/detectron.lmdb" \
        --features_folder "/content/features/" \

In [None]:
# Load only one image feature
import numpy as np
import os

img_list = os.listdir("/content/features/")
feat_dir = "/content/features/"


if len(img_list[0].split('_'))==2:
    img_id = img_list[0].split('_')[0]
else:
    img_id = img_list[0].split('_')[0].split('.')[0]

# There are 2 .npy files for each image, 
#   e.g. : for the image with 'image_id=75349':
#       - '75349.npy' : actual feature embedding
#       - '75349_info.npy' : meta-data about the image
data      = np.load(f"{feat_dir + img_id}.npy", allow_pickle=True)
data_info = np.load(f"{feat_dir + img_id}_info.npy", allow_pickle=True)

In [None]:
# Images are embedded to 2048 dimension!
# There are 100 bbox's
data.shape

In [None]:
# The meta-data about the image
data_info.item().keys()

In [None]:
print(f"image_id\t: {data_info.item()['image_id']}",
      f"\nnum_boxes\t: {data_info.item()['num_boxes']}",
      f"\nimage_height\t: {data_info.item()['image_height']}",
      f"\nimage_width\t: {data_info.item()['image_width']}",
      f"\nshape(bbox)\t: {data_info.item()['bbox'].shape}",
      f"\nshape(objects)\t: {data_info.item()['objects'].shape}",
      f"\nshape(cls_prob)\t: {data_info.item()['cls_prob'].shape}")

In [None]:
data_info.item()["objects"]

In [None]:
for i in [0, 1, 2]:
  print(f"max cls_prob of box #{[i]}: {data_info.item()['cls_prob'][i].max()}",
        f"\nindex of that class\t: {data_info.item()['cls_prob'][i].argmax()}\n",
        "-"*10)

In [None]:
data_info.item()["bbox"]

#### <font color='PaleGoldenrod'> <b> Different techniques for image feature extraction </b> </font>

##### <font color='PeachPuff'> <b> Extract image features using `Detectron2` & `ResNet-152` </b> </font>

In [None]:
!python extract_region_feature.py

	nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
	nonzero(Tensor input, *, bool as_tuple) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  inds = torch.nonzero(level_assignments == level).squeeze(1)
100% 3/3 [00:20<00:00,  6.84s/it]


In [None]:
data = np.load("/content/features/hateful_memes/image_1.npy", allow_pickle=True)
data.item(0)

{'bbox': array([[207.1685 , 102.26065, 653.63135, 386.67548],
        [116.85211,  46.02007, 470.8916 , 412.14105],
        [388.9102 , 190.30983, 491.19983, 226.42296]], dtype=float32),
 'cls_prob': array([0.9472457 , 0.79215443, 0.5706494 ], dtype=float32),
 'features': array([[0.5375976 , 0.41097498, 0.49014562, ..., 0.41597521, 0.27225962,
         0.42191768],
        [0.48594958, 0.323035  , 0.50595093, ..., 0.33366024, 0.319493  ,
         0.40102425],
        [0.34906894, 1.17614079, 0.90047979, ..., 0.23805985, 0.04871404,
         0.25833166],
        ...,
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ]]),
 'image_height': 500,
 'image_id': 'image_1',
 'image_width': 735,
 'num_boxes': 3,
 'objects': array([ 0,  0, 79])}

In [None]:
data.item(0)["features"].shape

(20, 2048)

In [None]:
data.item().keys()

dict_keys(['image_id', 'bbox', 'num_boxes', 'image_height', 'image_width', 'objects', 'cls_prob', 'features'])

##### <font color='PeachPuff'> <b> Extract image features using [`facebookresearch/grid-feats-vqa`](https://github.com/facebookresearch/grid-feats-vqa) </b> </font>

In [None]:
# Install required packages
!pip install -U git+https://github.com/facebookresearch/fvcore
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'

Collecting git+https://github.com/facebookresearch/fvcore
  Cloning https://github.com/facebookresearch/fvcore to /tmp/pip-req-build-3dp0yqz1
  Running command git clone -q https://github.com/facebookresearch/fvcore /tmp/pip-req-build-3dp0yqz1
Collecting yacs>=0.1.6
  Downloading https://files.pythonhosted.org/packages/38/4f/fe9a4d472aa867878ce3bb7efb16654c5d63672b86dc0e6e953a67018433/yacs-0.1.8-py3-none-any.whl
Collecting pyyaml>=5.1
[?25l  Downloading https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz (269kB)
[K     |████████████████████████████████| 276kB 4.1MB/s 
Collecting portalocker
  Downloading https://files.pythonhosted.org/packages/89/a6/3814b7107e0788040870e8825eebf214d72166adf656ba7d4bf14759a06a/portalocker-2.0.0-py2.py3-none-any.whl
Building wheels for collected packages: fvcore, pyyaml
  Building wheel for fvcore (setup.py) ... [?25l[?25hdone
  Created wheel for fvcore: filename=fvcore-0.1.2-

In [None]:
!git clone https://github.com/vedanuj/grid-feats-vqa.git --branch region_features

Cloning into 'grid-feats-vqa'...
remote: Enumerating objects: 102, done.[K
remote: Counting objects: 100% (102/102), done.[K
remote: Compressing objects: 100% (67/67), done.[K
remote: Total 102 (delta 61), reused 75 (delta 35), pack-reused 0[K
Receiving objects: 100% (102/102), 36.21 KiB | 210.00 KiB/s, done.
Resolving deltas: 100% (61/61), done.


In [None]:
cd grid-feats-vqa/

/content/grid-feats-vqa


In [None]:
!python extract_region_feature.py \
              --config-file configs/X-152-region-c4.yaml \
              --dataset "hateful_memes" \
              --dataset-path "/root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/img/" \

Command Line Args: Namespace(config_file='configs/X-152-region-c4.yaml', dataset='hateful_memes', dataset_path='/root/.cache/torch/mmf/data/datasets/hateful_memes/defaults/images/img/', feature_name='fc7', opts=[])
[32m[09/30 16:01:05 detectron2]: [0mRank of current process: 0. World size: 1
[32m[09/30 16:01:06 detectron2]: [0mEnvironment info:
------------------------  ---------------------------------------------------------------
sys.platform              linux
Python                    3.6.9 (default, Jul 17 2020, 12:50:27) [GCC 8.4.0]
numpy                     1.18.5
detectron2                0.1.1 @/usr/local/lib/python3.6/dist-packages/detectron2
detectron2 compiler       GCC 7.5
detectron2 CUDA compiler  10.1
detectron2 arch flags     sm_75
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.6.0+cu101 @/usr/local/lib/python3.6/dist-packages/torch
PyTorch debug build       False
CUDA available            True
GPU 0                     Tesla T4
CUDA_HOME          

In [None]:
!ls /content/grid-feats-vqa/output/features/hateful_memes/

01235.npy  14728.npy  28764.npy  42650.npy  57183.npy  71390.npy  85209.npy
01236.npy  14765.npy  28765.npy  42653.npy  57189.npy  71392.npy  85213.npy
01243.npy  14769.npy  28790.npy  42658.npy  57193.npy  71396.npy  85237.npy
01245.npy  14782.npy  28793.npy  42673.npy  57198.npy  71398.npy  85239.npy
01247.npy  14783.npy  28905.npy  42675.npy  57203.npy  71403.npy  85243.npy
01256.npy  14789.npy  28930.npy  42681.npy  57208.npy  71428.npy  85261.npy
01258.npy  14793.npy  28935.npy  42685.npy  57209.npy  71429.npy  85269.npy
01264.npy  14802.npy  28936.npy  42687.npy  57236.npy  71430.npy  85271.npy
01268.npy  14823.npy  28945.npy  42690.npy  57248.npy  71432.npy  85290.npy
01269.npy  14829.npy  28951.npy  42691.npy  57249.npy  71436.npy  85291.npy
01274.npy  14830.npy  28954.npy  42693.npy  57260.npy  71450.npy  85307.npy
01275.npy  14836.npy  28957.npy  42705.npy  57261.npy  71452.npy  85310.npy
01276.npy  14837.npy  28964.npy  42706.npy  57268.npy  71453.npy  85314.npy
01284.npy  1

In [None]:
import numpy as np
data = np.load("/content/grid-feats-vqa/output/features/hateful_memes/01235.npy", allow_pickle=True)

In [None]:
data.item(0)

{'bbox': array([[259.67154, 161.62383, 278.1095 , 173.00247],
        [252.43747, 161.38416, 272.26633, 180.26404],
        [257.69632, 153.35738, 278.7746 , 171.86337],
        [316.04187,  91.28587, 339.1968 , 105.36515],
        [249.3239 , 166.94267, 270.6754 , 180.54642],
        [268.07816, 146.82487, 280.08188, 184.44374],
        [241.51526, 187.56104, 260.1389 , 204.11038],
        [259.67154, 161.62383, 278.1095 , 173.00247],
        [281.35248, 159.39366, 305.5214 , 179.62006]], dtype=float32),
 'cls_prob': array([[3.7774025e-05, 2.7411718e-07, 1.4468035e-04, ..., 3.3372964e-07,
         8.6575026e-05, 2.7763026e-05],
        [1.9372252e-04, 3.7264224e-06, 1.1961237e-04, ..., 5.5164464e-06,
         6.1771495e-04, 1.5503154e-04],
        [3.2860064e-04, 8.9784944e-06, 3.0677838e-04, ..., 9.0119511e-06,
         4.8007708e-04, 2.8090217e-04],
        ...,
        [7.5297671e-06, 1.0568231e-05, 2.4204587e-03, ..., 1.0737324e-06,
         1.0481614e-03, 2.0361526e-04],
        

In [None]:
data.item(0)["bbox"].shape

(9, 4)

##### <font color='PeachPuff'> <b> Extract image features using [`airsplay/py-bottom-up-attention`](https://github.com/airsplay/py-bottom-up-attention) </b> </font>

###### **Install packages**

In [None]:
import os
os.chdir("/content/")
!git clone https://github.com/airsplay/py-bottom-up-attention.git
os.chdir("py-bottom-up-attention/")

Cloning into 'py-bottom-up-attention'...
remote: Enumerating objects: 1991, done.[K
remote: Total 1991 (delta 0), reused 0 (delta 0), pack-reused 1991[K
Receiving objects: 100% (1991/1991), 8.94 MiB | 34.95 MiB/s, done.
Resolving deltas: 100% (1225/1225), done.


In [None]:
!pip install -r requirements.txt

Collecting git+https://github.com/facebookresearch/fvcore.git (from -r requirements.txt (line 1))
  Cloning https://github.com/facebookresearch/fvcore.git to /tmp/pip-req-build-xlnh10sa
  Running command git clone -q https://github.com/facebookresearch/fvcore.git /tmp/pip-req-build-xlnh10sa
Collecting torch==1.4.0
[?25l  Downloading https://files.pythonhosted.org/packages/24/19/4804aea17cd136f1705a5e98a00618cb8f6ccc375ad8bfa437408e09d058/torch-1.4.0-cp36-cp36m-manylinux1_x86_64.whl (753.4MB)
[K     |████████████████████████████████| 753.4MB 21kB/s 
[?25hCollecting torchvision==0.5.0
[?25l  Downloading https://files.pythonhosted.org/packages/7e/90/6141bf41f5655c78e24f40f710fdd4f8a8aff6c8b7c6f0328240f649bdbe/torchvision-0.5.0-cp36-cp36m-manylinux1_x86_64.whl (4.0MB)
[K     |████████████████████████████████| 4.0MB 50.8MB/s 
Collecting yacs>=0.1.6
  Downloading https://files.pythonhosted.org/packages/38/4f/fe9a4d472aa867878ce3bb7efb16654c5d63672b86dc0e6e953a67018433/yacs-0.1.8-py3-non

In [None]:
!pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
# Install detectron2
!python setup.py build develop

Collecting git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI
  Cloning https://github.com/cocodataset/cocoapi.git to /tmp/pip-req-build-0j280n2k
  Running command git clone -q https://github.com/cocodataset/cocoapi.git /tmp/pip-req-build-0j280n2k
Building wheels for collected packages: pycocotools
  Building wheel for pycocotools (setup.py) ... [?25l[?25hdone
  Created wheel for pycocotools: filename=pycocotools-2.0-cp36-cp36m-linux_x86_64.whl size=266454 sha256=9655c7c86ccf9a18d1c64fdb5ea74841877be994fae1953f90005aedcd78f6af
  Stored in directory: /tmp/pip-ephem-wheel-cache-ik2jc_lf/wheels/90/51/41/646daf401c3bc408ff10de34ec76587a9b3ebfac8d21ca5c3a
Successfully built pycocotools
Installing collected packages: pycocotools
  Found existing installation: pycocotools 2.0.2
    Uninstalling pycocotools-2.0.2:
      Successfully uninstalled pycocotools-2.0.2
Successfully installed pycocotools-2.0
running build
running build_py
creating build
creating build/lib.linux-x8

###### **Extract!**

In [None]:
os.chdir("/content/hateful_memes/region_feature_extraction/")
!python extract.py

In [None]:
!ls /content/features/ | wc -l

16522


*What is the size of the training data?*
> 

*Did you use additional dataset?*
> Yes. I used Memotion as an additional dataset

*Did you use pre-trained models?*
> Yes. I used `VisualBERT` which was pre-trained on `Masked Conceptual Captions` dataset, see <font color='magenta'> <b> IV. Fine-tuning pre-trained VisualBERT models on Hateful Memes </b> </font>. Then, the model was fine-tuned on the HM dataset. The pre-trained model is available from MMF: [See all the available pre-trained VisualBERT models from MMF](https://github.com/facebookresearch/mmf/tree/master/projects/pretrain_vl_right)

*Did you use default image features provided by MMF?*
> No. I extracted our own image features using Facebook's Detectron model, which uses ResNet-152 as its backbone. See <font color='magenta'> <b> III. Feature Extraction </b> </font> part in the notebook.

*What is the impact of `Majority Voting` technique in ROC-AUC score? And what do you think about the reason behind?*
>

**
>