# Part 0: Dataloader and Visualizations

In [7]:
import torch
import wandb
import scipy.io

import numpy as np

import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
from torchvision import transforms, datasets
from torch.utils.data import DataLoader

from voc_dataset import VOCDataset

from PIL import Image

from utils import *

USE_WANDB = True

## Editing the Dataloader
The first part of the assignment involves editing the dataloader so that we can access bounding-box proposals as well as the ground-truth bounding boxes. The ground truth bounding box can be accessed through the VOC Dataset annotations itself. Unsupervised bounding box proposals are obtained through methods such as [Selective Search](https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013/UijlingsIJCV2013.pdf).

Since Selective Search is slow to run on each image, we have pre-computed the bounding box proposals. You should be able to access the `.mat` files using `scipy.io.loadmat('file.mat')`. Feel free to experiment with the data in the files to figure out the number of proposals per image, their scores, etc.

Your task is to change the dataloader to obtain the ground-truth bounding boxes, as well as the proposed bounding boxes for each image. Returning a dictionary would be convenient here. For the bounding boxes, using the relative positions is usually a better idea since they are invariant to changes in the size of the image.

In [8]:
# Load the Dataset - items at a particular index can be accesed by usual indexing notation (dataset[idx])
dataset = VOCDataset('trainval', top_n=10)

In [9]:
#TODO: get the image information from index 2020
idx = 2020

ret = dataset.__getitem__(idx)

[[  1. 187. 220. 479.]
 [191. 187. 293. 381.]
 [  1.  86. 216. 481.]
 [191. 186. 256. 381.]
 [  1. 257. 192. 479.]
 [177. 186. 332. 363.]
 [116. 127. 217. 363.]
 [181. 189. 321. 400.]
 [  1. 431. 173. 500.]
 [186. 143. 309. 382.]]
[[0.00268097 0.374      0.58981233 0.958     ]
 [0.51206434 0.374      0.78552279 0.762     ]
 [0.00268097 0.172      0.57908847 0.962     ]
 [0.51206434 0.372      0.68632708 0.762     ]
 [0.00268097 0.514      0.51474531 0.958     ]
 [0.47453083 0.372      0.89008043 0.726     ]
 [0.31099196 0.254      0.58176944 0.726     ]
 [0.48525469 0.378      0.86058981 0.8       ]
 [0.00268097 0.862      0.46380697 1.        ]
 [0.49865952 0.286      0.82841823 0.764     ]]


## Wandb Init and Logging
Initialize a Weights and Biases project, and convert the image tensor to a PIL image and plot it (check `utils.py` for helper functions).

You can use [this](https://docs.wandb.ai/library/log) as a reference for logging syntax.

In [10]:
if USE_WANDB:
    wandb.init(project="vlr2", reinit=True)




VBox(children=(Label(value='0.627 MB of 0.627 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

See this block as an example of plotting the ground truth box for an image.

In [11]:
original_image = tensor_to_PIL(ret['image'])
gt_labels = ret['gt_classes']
gt_boxes = ret['gt_boxes']

class_id_to_label = dict(enumerate(dataset.CLASS_NAMES))

img = wandb.Image(original_image, boxes={
    "predictions": {
        "box_data": get_box_data(gt_labels, gt_boxes),
        "class_labels": class_id_to_label,
    },
})

wandb.log({'gt_boxes' : img})

Check the `get_box_data` function in `utils.py` and understand how it is being used. Log the image with the GT bounding box on wandb.
After, this you should be able to easily plot the top 10 bounding proposals as well.

In [12]:
rois = ret['rois']
nums = [i for i in range(len(rois))] # placeholder for names of proposals
class_labels = dict([(i, str(i)) for i in nums])

#TODO: plot top ten proposals (of bounding boxes)
img = wandb.Image(original_image, boxes={
    "predictions": {
        "box_data": get_box_data(nums, rois),
        "class_labels" : class_labels,
    },
})

wandb.log({'proposal_boxes' : img})