### Modeling DAQUAR
* [Dataset](https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/vision-and-language/visual-turing-challenge)

* [Original Paper](chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://proceedings.neurips.cc/paper_files/paper/2014/file/d516b13671a4179d9b7b458a6ebdeb92-Paper.pdf)

In [1]:
%cd ..

/home/datascience/Data Fusion


### Setup Environment:

In [2]:
import os
import pandas as pd

from src.classifiers import process_labels, split_data
from src.classifiers_base import preprocess_df

from transformers import BertTokenizer

from src.multimodal_data_loader import VQADataset
from torch.utils.data import DataLoader

from src.classifiers_base_cpu_metrics import calculate_memory

In [3]:
PATH = 'datasets/daquar/'

In [4]:
text_path = os.path.join(PATH, 'labels.csv')
images_path = os.path.join(PATH, 'images')

## Get data

In [5]:
df = pd.read_csv(text_path)
df

Unnamed: 0,question,image_id,answer,split
0,what is on the right side of the black telepho...,image3,desk,train
1,what is in front of the white door on the left...,image3,telephone,train
2,what is on the desk in the image3 ?,image3,"book, scissor, papers, tape_dispenser",train
3,what is the largest brown objects in this imag...,image3,carton,train
4,what color is the chair in front of the white ...,image3,red,train
...,...,...,...,...
12463,what is found below the chandelier in the imag...,image1448,table,test
12464,what is on the floor in the image1449 ?,image1449,rug,test
12465,what are around dining table in the image1449 ?,image1449,chair,test
12466,what is at the opposite side of the dining tab...,image1449,decoration_item,test


## Data Perparation

In [6]:
# Select features and labels vectors
text_columns = 'question'
image_columns = 'image_id'
label_columns = 'answer'

df = preprocess_df(df, image_columns, images_path)

# Split the data
train_df, test_df = split_data(df)

# Process and one-hot encode labels for training set
train_labels, mlb, train_columns = process_labels(train_df, col=label_columns)
test_labels = process_labels(test_df, col=label_columns, train_columns=train_columns)

100%|██████████| 12468/12468 [00:00<00:00, 16865.30it/s]
100%|██████████| 12468/12468 [00:08<00:00, 1491.12it/s]


Train Shape: (6795, 4)
Test Shape: (5673, 4)


In [7]:
train_df

Unnamed: 0,question,image_id,answer,split
0,what is on the right side of the black telepho...,datasets/daquar/images/image3.png,desk,train
1,what is in front of the white door on the left...,datasets/daquar/images/image3.png,telephone,train
2,what is on the desk in the image3 ?,datasets/daquar/images/image3.png,"book, scissor, papers, tape_dispenser",train
3,what is the largest brown objects in this imag...,datasets/daquar/images/image3.png,carton,train
4,what color is the chair in front of the white ...,datasets/daquar/images/image3.png,red,train
...,...,...,...,...
6790,what are stuck on the wall in the image1440 ?,datasets/daquar/images/image1440.png,photo,train
6791,what is in the top right corner in the image14...,datasets/daquar/images/image1440.png,window,train
6792,what is in front of the window in the image1440 ?,datasets/daquar/images/image1440.png,cabinet,train
6793,what are the things on the cabinet in the imag...,datasets/daquar/images/image1440.png,"candelabra, book",train


In [8]:
# Instantiate tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

In [9]:
train_dataset = VQADataset(train_df, text_columns, image_columns, label_columns, mlb, train_columns, tokenizer)
test_dataset = VQADataset(test_df, text_columns, image_columns, label_columns, mlb, train_columns, tokenizer)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False, num_workers=2)

### Models

In [10]:
output_size = len(mlb.classes_)
multilabel = True

In [11]:
calculate_memory(train_loader, test_loader, output_size)

Early fusion:
Average Memory per Batch in Train: 36.59 MB
Total Memory Usage per Epoch Train: 3914.75 MB (excluding model parameters)
Test:
Average Memory per Batch in Test: 30.55 MB
Total Memory Usage per Epoch Test: 2718.53 MB (excluding model parameters)
Model: 
Model Memory Usage: 748.19 MB
Late fusion:
Average Memory per Batch in Train: 36.59 MB
Total Memory Usage per Epoch Train: 3914.75 MB (excluding model parameters)
Test:
Average Memory per Batch in Test: 30.55 MB
Total Memory Usage per Epoch Test: 2718.53 MB (excluding model parameters)
Model: 
Model Memory Usage: 747.81 MB
