# **Inspriation**

https://medium.com/pytorch/bootstrapping-a-multimodal-project-using-mmf-a-pytorch-powered-multimodal-framework-464f75164af7

# **Installing particular dependencies**

In [None]:
#!pip install yacs cython matplotlib
!pip install --upgrade matplotlib
!pip install sentencepiece
!pip install torch pytorch-lightning

# **Installing MMF**

**Using drive**

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [16]:
# if drive
%cd /content/drive/MyDrive/0-VQA
root = '/content/drive/MyDrive/0-VQA'

/content/drive/MyDrive/0-VQA


In [None]:
# if not drive
%cd /content
root = '/content'

In [17]:
%rm -rf explainableVQA
!git clone https://github.com/albertkjoller/explainableVQA.git explainableVQA

%cd ./explainableVQA/mmf
# Don't modify torch version
!sed -i '/torch/d' requirements.txt
!pip install -e .

import sys
sys.path.append(root+'/mmf')

Cloning into 'explainableVQA'...
remote: Enumerating objects: 1692, done.[K
remote: Counting objects: 100% (1692/1692), done.[K
remote: Compressing objects: 100% (1262/1262), done.[K
remote: Total 1692 (delta 386), reused 1591 (delta 294), pack-reused 0[K
Receiving objects: 100% (1692/1692), 264.88 MiB | 11.89 MiB/s, done.
Resolving deltas: 100% (386/386), done.
Checking out files: 100% (1121/1121), done.
/content/drive/MyDrive/0-VQA/explainableVQA/mmf
Obtaining file:///content/drive/MyDrive/0-VQA/explainableVQA/mmf
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting iopath==0.1.8
  Downloading iopath-0.1.8-py3-none-any.whl (19 kB)
Collecting nltk==3.4.5
  Downloading nltk-3.4.5.zip (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 5.2 MB/s 
Collecting tqdm<4.50.0,>=4.43.0
  Downloading tqdm-4.49.0-py2.py3-none-any.whl (69 kB)
[K     |██████████████████

**If needed:** refresh git

In [None]:
# only pulling for update
pwd # what is working directory?
!git pull #https://github.com/albertkjoller/explainableVQA.git explainableVQA


# **Downloading dataset** for visualization (not working)

In [None]:
# Importing
# registry is need to register the dataset or our new model so as to be MMF discoverable
from mmf.common.registry import registry

from mmf.models.mmbt import MMBT
from mmf.utils.build import build_dataset
from mmf.utils.env import setup_imports

import matplotlib.pyplot as plt


In [None]:
# downloading
'''
setup_imports()
dataset = build_dataset("okvqa")

# visualizing
plt.rcParams["figure.figsize"] = (20, 20)
dataset.visualize(num_samples=8, size=(512, 512), nrow=4)

!curl -o /content/vqa2.zip "$url" -H 'Referer: http://mscoco.org/dataset/#download' --compressed

setup_imports()
dataset = build_dataset("hateful_memes")
'''

'\nsetup_imports()\ndataset = build_dataset("okvqa")\n\n# visualizing\nplt.rcParams["figure.figsize"] = (20, 20)\ndataset.visualize(num_samples=8, size=(512, 512), nrow=4)\n'

# **Building the model**

In [20]:
# importing
import torch
# All model using MMF need to inherit BaseModel
from mmf.models.base_model import BaseModel

# registry is need to register the dataset or our new model so as to be MMF discoverable
from mmf.common.registry import registry

# Builder methods for image encoder and classifier
from mmf.utils.build import (
    build_classifier_layer,
    build_image_encoder,
    build_text_encoder,
)


In [21]:


# Register the model for MMF, "concat_bert_tutorial" key would be used to find the model
@registry.register_model("first_model")
class First_Model(BaseModel):
    # All models in MMF get first argument as config which contains all
    # of the information you stored in this model's config (hyperparameters)
    def __init__(self, config):
        # This is not needed in most cases as it just calling parent's init
        # with same parameters. But to explain how config is initialized we
        # have kept this
        super().__init__(config)
        self.build()

    # This classmethod tells MMF where to look for default config of this model
    @classmethod
    def config_path(cls):
        # Relative to user dir root
        return "configs/models/first_model/defaults.yaml"

    # Each method need to define a build method where the model's modules
    # are actually build and assigned to the model
    def build(self):
        """
        Config's image_encoder attribute will be used to build an MMF image
        encoder. This config in yaml will look like:

        # "type" parameter specifies the type of encoder we are using here.
        # In this particular case, we are using resnet152
        type: resnet152
        # Parameters are passed to underlying encoder class by
        # build_image_encoder
        params:
            # Specifies whether to use a pretrained version
            pretrained: true
            # Pooling type, use max to use AdaptiveMaxPool2D
            pool_type: avg
            # Number of output features from the encoder, -1 for original
            # otherwise, supports between 1 to 9
            num_output_features: 1
        """
        self.vision_module = build_image_encoder(self.config.image_encoder)

        """
        For text encoder, configuration would look like:
        # Specifies the type of the langauge encoder, in this case mlp
        type: transformer
        # Parameter to the encoder are passed through build_text_encoder
        params:
            # BERT model type
            bert_model_name: bert-base-uncased
            hidden_size: 768
            # Number of BERT layers
            num_hidden_layers: 12
            # Number of attention heads in the BERT layers
            num_attention_heads: 12
        """
        self.language_module = build_text_encoder(self.config.text_encoder)

        """
        For classifer, configuration would look like:
        # Specifies the type of the classifier, in this case mlp
        type: mlp
        # Parameter to the classifier passed through build_classifier_layer
        params:
            # Dimension of the tensor coming into the classifier
            # Visual feature dim + Language feature dim : 2048 + 768
            in_dim: 2816
            # Dimension of the tensor going out of the classifier
            out_dim: 2
            # Number of MLP layers in the classifier
            num_layers: 2
        """
        self.classifier = build_classifier_layer(self.config.classifier)

    # Each model in MMF gets a dict called sample_list which contains
    # all of the necessary information returned from the image
    def forward(self, sample_list):
        # Text input features will be in "input_ids" key
        text = sample_list["input_ids"]
        # Similarly, image input will be in "image" key
        image = sample_list["image"]

        # Get the text and image features from the encoders
        text_features = self.language_module(text)[1]
        image_features = self.vision_module(image)

        # Flatten the embeddings before concatenation
        image_features = torch.flatten(image_features, start_dim=1)
        text_features = torch.flatten(text_features, start_dim=1)

        # Concatenate the features returned from two modality encoders
        combined = torch.cat([text_features, image_features], dim=1)

        # Pass final tensor to classifier to get scores
        logits = self.classifier(combined)

        # For loss calculations (automatically done by MMF
        # as per the loss defined in the config),
        # we need to return a dict with "scores" key as logits
        output = {"scores": logits}

        # MMF will automatically calculate loss
        return output


# **Training**

In [22]:
from mmf_cli.run import run

#!mmf_run config="configs/experiments/first_model/defaults.yaml" model=first_model dataset=okvqa run_type=train_val

registry.mapping["state"] = {}
opts = opts=[
             "config='mmf/configs/models/first_model/defaults.yaml'", 
             "model=first_model", 
             "dataset=okvqa", 
             "run_type=train_val"
             ]
run(opts=opts)


[32m2022-03-05T12:18:42 | mmf.utils.configuration: [0mOverriding option config to 'mmf/configs/models/first_model/defaults.yaml'
[32m2022-03-05T12:18:42 | mmf.utils.configuration: [0mOverriding option model to first_model
[32m2022-03-05T12:18:42 | mmf.utils.configuration: [0mOverriding option datasets to okvqa
[32m2022-03-05T12:18:42 | mmf.utils.configuration: [0mOverriding option run_type to train_val


  "The `env` resolver is deprecated, see https://github.com/omry/omegaconf/issues/573"
  "Device specified is 'cuda' but cuda is not present. "


[32m2022-03-05T12:18:43 | mmf: [0mLogging to: ./save/train.log
[32m2022-03-05T12:18:43 | mmf_cli.run: [0mNamespace(config_override=None, opts=["config='mmf/configs/models/first_model/defaults.yaml'", 'model=first_model', 'dataset=okvqa', 'run_type=train_val'])
[32m2022-03-05T12:18:43 | mmf_cli.run: [0mTorch version: 1.10.0+cu111
[32m2022-03-05T12:18:43 | mmf_cli.run: [0mUsing seed 43034480
[32m2022-03-05T12:18:43 | mmf.trainers.mmf_trainer: [0mLoading datasets
[ Downloading: https://dl.fbaipublicfiles.com/mmf/data/datasets/okvqa/defaults/images/images.tar.gz to /root/.cache/torch/mmf/data/datasets/okvqa/defaults/images/images.tar.gz ]


Downloading images.tar.gz: 100%|██████████| 2.30G/2.30G [00:50<00:00, 45.1MB/s]


[ Starting checksum for images.tar.gz]
[ Checksum successful for images.tar.gz]
Unpacking images.tar.gz
[ Checksum not provided, skipping for annotations.tar.gz]
[ Downloading: https://dl.fbaipublicfiles.com/mmf/data/datasets/okvqa/defaults/annotations.tar.gz to /root/.cache/torch/mmf/data/datasets/okvqa/defaults/annotations/annotations.tar.gz ]


Downloading annotations.tar.gz: 100%|██████████| 2.74G/2.74G [00:54<00:00, 50.3MB/s]


[ Checksum not provided, skipping for annotations.tar.gz]
Unpacking annotations.tar.gz
[ Downloading: https://dl.fbaipublicfiles.com/mmf/data/datasets/okvqa/defaults/extras.tar.gz to /root/.cache/torch/mmf/data/datasets/okvqa/defaults/extras.tar.gz ]


Downloading extras.tar.gz: 100%|██████████| 280k/280k [00:00<00:00, 521kB/s]


[ Starting checksum for extras.tar.gz]
[ Checksum successful for extras.tar.gz]
Unpacking extras.tar.gz
[32m2022-03-05T12:22:35 | torchtext.vocab.vectors: [0mDownloading vectors from http://nlp.stanford.edu/data/glove.6B.zip


/root/.cache/torch/mmf/glove.6B.zip: 862MB [02:40, 5.37MB/s]                           

[32m2022-03-05T12:25:16 | torchtext.vocab.vectors: [0mExtracting vectors into /root/.cache/torch/mmf





[32m2022-03-05T12:25:42 | torchtext.vocab.vectors: [0mLoading vectors from /root/.cache/torch/mmf/glove.6B.300d.txt


100%|█████████▉| 399999/400000 [00:51<00:00, 7797.47it/s]


[32m2022-03-05T12:26:35 | torchtext.vocab.vectors: [0mSaving vectors to /root/.cache/torch/mmf/glove.6B.300d.txt.pt
[32m2022-03-05T12:26:38 | torchtext.vocab.vectors: [0mLoading vectors from /root/.cache/torch/mmf/glove.6B.300d.txt.pt
  cpuset_checked))

  cpuset_checked))

[32m2022-03-05T12:26:40 | mmf.datasets.multi_datamodule: [0mMultitasking disabled by default for single dataset training
[32m2022-03-05T12:26:40 | mmf.datasets.multi_datamodule: [0mMultitasking disabled by default for single dataset training
[32m2022-03-05T12:26:40 | mmf.datasets.multi_datamodule: [0mMultitasking disabled by default for single dataset training
[32m2022-03-05T12:26:40 | mmf.trainers.mmf_trainer: [0mLoading model


Downloading: "https://download.pytorch.org/models/resnet152-394f9c45.pth" to /root/.cache/torch/hub/checkpoints/resnet152-394f9c45.pth


  0%|          | 0.00/230M [00:00<?, ?B/s]

https://huggingface.co/bert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpfgl5kphc


Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

storing https://huggingface.co/bert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
creating metadata file for /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range":

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

storing https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
creating metadata file for /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/a8041bf617d7f94ea26d15e218abd04afc2004805632abc0ed2066aa16d50d04.faf6ea826ae9c5867d12b22257f9877e6b8367890837bd60f7c54a29633f7f2f
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModelJit: ['cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.de

  "No losses are defined in model configuration. You are expected "

  "No losses are defined in model configuration. You are expected "

[32m2022-03-05T12:27:09 | mmf.trainers.mmf_trainer: [0mLoading optimizer


ValueError: ignored