#### Useful Links

The following few links might be very useful:
- Why num_classes need to be `91`? https://github.com/facebookresearch/detr/issues/23
- Training DETR on custom data. https://github.com/facebookresearch/detr/issues/9

### Connect Google Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# %cd /content/drive/MyDrive/AIISC-Internship/text-based-object-discovery

/content


### Installing Necessary Libraries

Here we install all the necessary libraries required for Setting Up Vision Transformer with Relative Positional Encoding. We rely on an existing implementation of image RPE [here](https://github.com/microsoft/Cream/tree/main/iRPE/DETR-with-iRPE).

In [2]:
%%capture
# iRPE
!git clone https://github.com/RishiDarkDevil/ViT-RPE.git # clone if already not cloned
%cd ViT-RPE/iRPE/DETR-with-iRPE
!pip install -r ./requirements.txt

# iRPE for CUDA
%cd rpe_ops
!python setup.py install --user
%cd ..

### Import Libraries

In [6]:
# General
import requests
import argparse
from io import BytesIO
import sys
import os
import json
from tqdm import tqdm

# Plotting
import matplotlib.pyplot as plt
from util.plot_utils import plot_logs

# Image Processing
import cv2
from PIL import Image

# Data Handling
from pycocotools.coco import COCO

### Load Data

In [None]:
# %cd /content/drive/MyDrive/AIISC-Internship/text-based-object-discovery

/content/drive/MyDrive/AIISC-Internship/text-based-object-discovery


In [None]:
# ls Data-Generated

[0m[01;34mannotations[0m/  [01;34mcaptions[0m/  [01;34mtrain[0m/


If the `annotaions` folder contains no single json file for all the annotaions merged run the following cell.

In [None]:
# # UNCOMMENT AND RUN IF NEEDED
# print('Starting Annotation Files Merge...')
# # Annotation File Names present in the annotations directory
# ann_file_names = os.listdir('Data-Generated/annotations')
# print('Number of Annotation Files found:', len(ann_file_names))
# print('Annotation Files found:', ' '.join(ann_file_names))
# ann_files = list() # Contains the list of loaded annotation json files
# for ann_file_name in tqdm(ann_file_names): # Loads the annotation json files and appens to ann_files
#   with open(os.path.join('Data-Generated/annotations', ann_file_name)) as json_file:
#     ann_file = json.load(json_file)
#     ann_files.append(ann_file)
# # Creating the single annotation file
# annotation_file = {
#     'info': ann_files[0]['info'],
#     'licenses': ann_files[0]['licenses'],
#     'images': [image for ann_file in ann_files for image in ann_file['images']],
#     'annotations': [ann for ann_file in ann_files for ann in ann_file['annotations']],
#     'categories': [cat for ann_file in ann_files for cat in ann_file['categories']]
# }
# # Serializing json
# print('Serializing...')
# ann_json_file = json.dumps(annotation_file, indent=4)
# # Writing json
# with open(f"Data-Generated/annotations/train_annotations.json", "w") as outfile:
#   outfile.write(ann_json_file)
# print()
# print('Saved Annotation file... train_annotations.json')
# # UNCOMMENT IF YOU WISH TO REMOVE ALL THE ANNOTATION FILES EXCEPT ONE BIG ANNOTATION FILE
# # print('Removing the annotation files other than annotations.json')
# # for ann_file_name in ann_file_names:
# #   os.remove(os.path.join('Data-Generated/annotations', ann_file_name))
# print('A successful merge!')
# # Frees up space
# del ann_file_names, ann_files, annotation_file, ann_json_file

Starting Annotation Files Merge...
Number of Annotation Files found: 99
Annotation Files found: object_detect-1.json object_detect-2.json object_detect-3.json object_detect-4.json object_detect-5.json object_detect-6.json object_detect-7.json object_detect-8.json object_detect-9.json object_detect-10.json object_detect-11.json object_detect-12.json object_detect-13.json object_detect-14.json object_detect-15.json object_detect-16.json object_detect-17.json object_detect-18.json object_detect-19.json object_detect-20.json object_detect-21.json object_detect-22.json object_detect-23.json object_detect-24.json object_detect-25.json object_detect-26.json object_detect-27.json object_detect-28.json object_detect-29.json object_detect-30.json object_detect-31.json object_detect-32.json object_detect-33.json object_detect-34.json object_detect-35.json object_detect-36.json object_detect-37.json object_detect-38.json object_detect-39.json object_detect-40.json object_detect-41.json object_dete

100%|██████████| 99/99 [01:20<00:00,  1.24it/s]



Saved Annotation file... train_annotations.json
A successful merge!


In [8]:
# coco_annotation = COCO(annotation_file='/content/drive/MyDrive/AIISC-Internship/text-based-object-discovery/Data-Generated/annotations/train_annotations.json')

loading annotations into memory...
Done (t=2.56s)
creating index...
index created!


In [10]:
# num_anns = list()
# for image_id in [x['id'] for x in coco_annotation.dataset['images']]:
#   ann_ids = coco_annotation.getAnnIds(imgIds=[image_id], iscrowd=None)
#   num_anns.append(len(ann_ids))

In [13]:
# num_queries for the DETR model should be strictly greater than this
max(num_anns)

167

In [None]:
# id_list = [x['id'] for x in coco_annotation.dataset['categories']]

The following max class id is required for the model.

In [None]:
# max(id_list)

1135

In [None]:
# del ann_file

If the `captions` folder contains no single json file for all the captions merged run the following cell.

In [None]:
# # UNCOMMENT AND RUN IF NEEDED
# print('Starting Caption Files Merge...')
# # Caption File Names present in the captions directory
# cap_file_names = os.listdir('Data-Generated/captions')
# print('Number of Caption Files found:', len(cap_file_names))
# print('Caption Files found:', ' '.join(cap_file_names))
# cap_files = list() # Contains the list of loaded caption json files
# for cap_file_name in tqdm(cap_file_names): # Loads the caption json files and appens to cap_files
#   with open(os.path.join('Data-Generated/captions', cap_file_name)) as json_file:
#     cap_file = json.load(json_file)
#     cap_files.append(cap_file)
# # Creating the single caption file
# caption_file = {
#     'info': cap_files[0]['info'],
#     'licenses': cap_files[0]['licenses'],
#     'images': [image for cap_file in cap_files for image in cap_file['images']],
#     'annotations': [ann for cap_file in cap_files for ann in cap_file['annotations']],
# }
# # Serializing json
# print('Serializing...')
# cap_json_file = json.dumps(caption_file, indent=4)
# # Writing json
# with open(f"Data-Generated/captions/train_captions.json", "w") as outfile:
#   outfile.write(cap_json_file)
# print()
# print('Saved Caption file... train_captions.json')
# # UNCOMMENT IF YOU WISH TO REMOVE ALL THE CAPTION FILES EXCEPT ONE BIG CAPTION FILE
# # print('Removing the caption files other than captions.json')
# # for cap_file_name in cap_file_names:
# #   os.remove(os.path.join('Data-Generated/captions', cap_file_name))
# print('A successful merge!')
# # Frees up space
# del cap_file_names, cap_files, caption_file, cap_json_file

Starting Caption Files Merge...
Number of Caption Files found: 99
Caption Files found: object_caption-1.json object_caption-2.json object_caption-3.json object_caption-4.json object_caption-5.json object_caption-6.json object_caption-7.json object_caption-8.json object_caption-9.json object_caption-10.json object_caption-11.json object_caption-12.json object_caption-13.json object_caption-14.json object_caption-15.json object_caption-16.json object_caption-17.json object_caption-18.json object_caption-19.json object_caption-20.json object_caption-21.json object_caption-22.json object_caption-23.json object_caption-24.json object_caption-25.json object_caption-26.json object_caption-27.json object_caption-28.json object_caption-29.json object_caption-30.json object_caption-31.json object_caption-32.json object_caption-33.json object_caption-34.json object_caption-35.json object_caption-36.json object_caption-37.json object_caption-38.json object_caption-39.json object_caption-40.json ob

100%|██████████| 99/99 [01:04<00:00,  1.53it/s]

Serializing...

Saved Caption file... captions.json
A successful merge!





Here we will load the dataset for training the Object Detection Model, with RPE.

The dataset should be placed in the following format:

```
Data-Generated --> The folder that contains all the data
├── annotations --> The subfolder containing annotations
│   ├── train_annotations.json --> The train images annotations
│   └── val_annotations.json --> The val images annotations
├── train/ <images> --> The train set images
└── val/ <images> --> The val set images
```

### Training

The following code starts the training of the Vision Transformer Object Detection Model `DETR`.

In [None]:
# %mkdir output

In [None]:
ls

[0m[01;34mCaption-Processing[0m/   [01;34mData-Generated[0m/  [01;34mPOS-Tagger-Comparison[0m/
[01;34mCaption-Processing1[0m/  [01;34mLAION[0m/           [01;34mResults[0m/
[01;34mData[0m/                 [01;34moutput[0m/          [01;34mViT-RPE[0m/


In [None]:
ls ViT-RPE/iRPE/DETR-with-iRPE/

[0m[01;34mdatasets[0m/  LICENSE  [01;34mmodels[0m/       README.md         [01;34mrpe_ops[0m/              [01;34mutil[0m/
engine.py  main.py  [01;34m__pycache__[0m/  requirements.txt  run_with_submitit.py


The `max_class_id` should be (max class `id` + 1) present in the `categories` of the annotation file 

In [None]:
!python ViT-RPE/iRPE/DETR-with-iRPE/main.py --lr_drop 40 --epochs 50 --coco_path ./Data-Generated --max_class_id 1136 --num_queries 200 --enc_rpe2d rpe-2.0-product-ctx-1-k --output_dir ./output

Not using distributed mode
git:
  sha: f61ab8b8fec3c3ef1cd4e010028f962108e611db, status: has uncommited changes, branch: main

Namespace(lr=0.0001, lr_backbone=1e-05, batch_size=2, weight_decay=0.0001, epochs=50, lr_drop=40, clip_max_norm=0.1, frozen_weights=None, backbone='resnet50', dilation=False, position_embedding='sine', enc_layers=6, dec_layers=6, dim_feedforward=2048, hidden_dim=256, dropout=0.1, nheads=8, num_queries=100, pre_norm=False, enc_rpe2d='rpe-2.0-product-ctx-1-k', masks=False, aux_loss=True, set_cost_class=1, set_cost_bbox=5, set_cost_giou=2, mask_loss_coef=1, dice_loss_coef=1, bbox_loss_coef=5, giou_loss_coef=2, eos_coef=0.1, dataset_file='coco', coco_path='./Data-Generated', max_class_id=1136, val_present=False, coco_panoptic_path=None, remove_difficult=False, output_dir='./output', device='cuda', seed=42, resume='', start_epoch=0, eval=False, num_workers=4, world_size=1, dist_url='env://', distributed=False)
The number of buckets on rpe_k in encoder: 81
number of 