# **Inspriation**

https://medium.com/pytorch/bootstrapping-a-multimodal-project-using-mmf-a-pytorch-powered-multimodal-framework-464f75164af7

# **Installing particular dependencies**

In [2]:
#!pip install yacs cython matplotlib
!pip install --upgrade matplotlib
!pip install sentencepiece
!pip install torch pytorch-lightning

# TODO: can make is present in requirements

Collecting matplotlib
  Using cached matplotlib-3.5.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
Installing collected packages: matplotlib
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.3.4
    Uninstalling matplotlib-3.3.4:
      Successfully uninstalled matplotlib-3.3.4
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.[0m
Successfully installed matplotlib-3.5.1




# **Installing MMF**

**Using drive**

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# if drive
%cd /content/drive/MyDrive/0-VQA
root = '/content/drive/MyDrive/0-VQA'

# if not drive
#%cd /content
#root = '/content'

%rm -rf explainableVQA # just to restart
!git clone https://github.com/albertkjoller/explainableVQA.git explainableVQA

%cd ./explainableVQA/mmf
# Don't modify torch version
!sed -i '/torch/d' requirements.txt
!pip install -e .

import sys
sys.path.append(root+'/mmf')

/content/drive/MyDrive/0-VQA
Cloning into 'explainableVQA'...
remote: Enumerating objects: 1811, done.[K
remote: Counting objects: 100% (1811/1811), done.[K
remote: Compressing objects: 100% (1331/1331), done.[K
remote: Total 1811 (delta 465), reused 1667 (delta 329), pack-reused 0[K
Receiving objects: 100% (1811/1811), 264.90 MiB | 12.46 MiB/s, done.
Resolving deltas: 100% (465/465), done.
Checking out files: 100% (1125/1125), done.
/content/drive/MyDrive/0-VQA/explainableVQA/mmf
Obtaining file:///content/drive/MyDrive/0-VQA/explainableVQA/mmf
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting matplotlib==3.3.4
  Using cached matplotlib-3.3.4-cp37-cp37m-manylinux1_x86_64.whl (11.5 MB)
Installing collected packages: matplotlib, mmf
  Attempting uninstall: matplotlib
    Found existing installation: matplotlib 3.5.1
    Uninstalling matplotlib-3.5.1:
      Succes

**If needed:** refresh git

# **Downloading dataset:** for visualization (not working)

In [5]:
# Importing
# registry is need to register the dataset or our new model so as to be MMF discoverable
from app.mmf.mmf.common.registry import registry

from app.mmf.mmf.models.mmbt import MMBT
from app.mmf.mmf.utils.build import build_dataset
from app.mmf.mmf.utils.env import setup_imports

import matplotlib.pyplot as plt


In [None]:
# downloading
'''
setup_imports()
dataset = build_dataset("okvqa")

# visualizing
plt.rcParams["figure.figsize"] = (20, 20)
dataset.visualize(num_samples=8, size=(512, 512), nrow=4)

!curl -o /content/vqa2.zip "$url" -H 'Referer: http://mscoco.org/dataset/#download' --compressed

setup_imports()
dataset = build_dataset("hateful_memes")
'''

'\nsetup_imports()\ndataset = build_dataset("okvqa")\n\n# visualizing\nplt.rcParams["figure.figsize"] = (20, 20)\ndataset.visualize(num_samples=8, size=(512, 512), nrow=4)\n'

# **Building the model:** present within git

In [6]:
# importing
import torch
# All model using MMF need to inherit BaseModel
from app.mmf.mmf.models.base_model import BaseModel

# registry is need to register the dataset or our new model so as to be MMF discoverable
from app.mmf.mmf.common.registry import registry

# Builder methods for image encoder and classifier
from app.mmf.mmf.utils.build import (
    build_classifier_layer,
    build_image_encoder,
    build_text_encoder,
)


In [7]:


# Register the model for MMF, "concat_bert_tutorial" key would be used to find the model
@registry.register_model("first_model")
class First_Model(BaseModel):
    # All models in MMF get first argument as config which contains all
    # of the information you stored in this model's config (hyperparameters)
    def __init__(self, config):
        # This is not needed in most cases as it just calling parent's init
        # with same parameters. But to explain how config is initialized we
        # have kept this
        super().__init__(config)
        self.build()

    # This classmethod tells MMF where to look for default config of this model
    @classmethod
    def config_path(cls):
        # Relative to user dir root
        return "configs/models/first_model/defaults.yaml"

    # Each method need to define a build method where the model's modules
    # are actually build and assigned to the model
    def build(self):
        """
        Config's image_encoder attribute will be used to build an MMF image
        encoder. This config in yaml will look like:

        # "type" parameter specifies the type of encoder we are using here.
        # In this particular case, we are using resnet152
        type: resnet152
        # Parameters are passed to underlying encoder class by
        # build_image_encoder
        params:
            # Specifies whether to use a pretrained version
            pretrained: true
            # Pooling type, use max to use AdaptiveMaxPool2D
            pool_type: avg
            # Number of output features from the encoder, -1 for original
            # otherwise, supports between 1 to 9
            num_output_features: 1
        """
        self.vision_module = build_image_encoder(self.config.image_encoder)

        """
        For text encoder, configuration would look like:
        # Specifies the type of the langauge encoder, in this case mlp
        type: transformer
        # Parameter to the encoder are passed through build_text_encoder
        params:
            # BERT model type
            bert_model_name: bert-base-uncased
            hidden_size: 768
            # Number of BERT layers
            num_hidden_layers: 12
            # Number of attention heads in the BERT layers
            num_attention_heads: 12
        """
        self.language_module = build_text_encoder(self.config.text_encoder)

        """
        For classifer, configuration would look like:
        # Specifies the type of the classifier, in this case mlp
        type: mlp
        # Parameter to the classifier passed through build_classifier_layer
        params:
            # Dimension of the tensor coming into the classifier
            # Visual feature dim + Language feature dim : 2048 + 768
            in_dim: 2816
            # Dimension of the tensor going out of the classifier
            out_dim: 2
            # Number of MLP layers in the classifier
            num_layers: 2
        """
        self.classifier = build_classifier_layer(self.config.classifier)

    # Each model in MMF gets a dict called sample_list which contains
    # all of the necessary information returned from the image
    def forward(self, sample_list):
        # Text input features will be in "input_ids" key
        text = sample_list["input_ids"]
        # Similarly, image input will be in "image" key
        image = sample_list["image"]

        # Get the text and image features from the encoders
        text_features = self.language_module(text)[1]
        image_features = self.vision_module(image)

        # Flatten the embeddings before concatenation
        image_features = torch.flatten(image_features, start_dim=1)
        text_features = torch.flatten(text_features, start_dim=1)

        # Concatenate the features returned from two modality encoders
        combined = torch.cat([text_features, image_features], dim=1)

        # Pass final tensor to classifier to get scores
        logits = self.classifier(combined)

        # For loss calculations (automatically done by MMF
        # as per the loss defined in the config),
        # we need to return a dict with "scores" key as logits
        output = {"scores": logits}

        # MMF will automatically calculate loss
        return output


# **Training**

In [None]:
from mmf_cli.run import run


opts = opts=[
             "config='mmf/configs/experiments/first_model/defaults.yaml'", 
             "model=first_model", 
             "dataset=okvqa", 
             "run_type=train"
             "training.fp16=True",
             "training.batch_size=32"
             ]
run(opts=opts)


[32m2022-03-05T15:49:51 | mmf.utils.configuration: [0mOverriding option config to 'mmf/configs/experiments/first_model/defaults.yaml'
[32m2022-03-05T15:49:51 | mmf.utils.configuration: [0mOverriding option model to first_model
[32m2022-03-05T15:49:51 | mmf.utils.configuration: [0mOverriding option datasets to okvqa
[32m2022-03-05T15:49:51 | mmf.utils.configuration: [0mOverriding option run_type to traintraining.fp16=True
[32m2022-03-05T15:49:51 | mmf.utils.configuration: [0mOverriding option training.batch_size to 32


  "The `env` resolver is deprecated, see https://github.com/omry/omegaconf/issues/573"
  "Device specified is 'cuda' but cuda is not present. "


[32m2022-03-05T15:49:51 | mmf: [0mLogging to: ./save/train.log
[32m2022-03-05T15:49:52 | mmf_cli.run: [0mNamespace(config_override=None, opts=["config='mmf/configs/experiments/first_model/defaults.yaml'", 'model=first_model', 'dataset=okvqa', 'run_type=traintraining.fp16=True', 'training.batch_size=32'])
[32m2022-03-05T15:49:52 | mmf_cli.run: [0mTorch version: 1.10.0+cu111
[32m2022-03-05T15:49:52 | mmf_cli.run: [0mUsing seed 52024135
[32m2022-03-05T15:49:52 | mmf.trainers.mmf_trainer: [0mLoading datasets


loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.10.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading file https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/

  cpuset_checked))

  cpuset_checked))

[32m2022-03-05T15:49:53 | mmf.datasets.multi_datamodule: [0mMultitasking disabled by default for single dataset training
[32m2022-03-05T15:49:53 | mmf.datasets.multi_datamodule: [0mMultitasking disabled by default for single dataset training
[32m2022-03-05T15:49:53 | mmf.datasets.multi_datamodule: [0mMultitasking disabled by default for single dataset training
[32m2022-03-05T15:49:53 | mmf.trainers.mmf_trainer: [0mLoading model


loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.10.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading weights file https://huggingface.co/bert-base-uncased/resolve/main/pytorch_model.bin from cache 

[32m2022-03-05T15:50:10 | mmf.trainers.mmf_trainer: [0mLoading optimizer
[32m2022-03-05T15:50:10 | mmf.trainers.mmf_trainer: [0mLoading metrics
[32m2022-03-05T15:50:10 | mmf.trainers.mmf_trainer: [0m===== Model =====
[32m2022-03-05T15:50:10 | mmf.trainers.mmf_trainer: [0mFirst_Model(
  (vision_module): ResNet152ImageEncoder(
    (model): Sequential(
      (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): Batch

In [None]:
i = []

while(True):

    i.append('a')