Dataset: Cars Overhead with Context (COWC) - https://gdo152.llnl.gov/cowc/

Upload the dataste in Google Drive and mount the drive in Google Colab for easy access to the dataset. Uploading the dataset folders in Google Colab is takes a lot of time and will be erased when the session gets terminated.

In [0]:
!unzip /content/drive/"My Drive"/Detection_Patches.zip

Import all the necessary libraries and packages

In [0]:
from __future__ import print_function, division
import os
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import torch.nn.functional as F
import pandas as pd
from skimage import io
import numpy as np
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

There are many image sets in the COWC dataset. We will be working with DetectionPatches set which contains 256 x 256 image patches. It has data from 6 cities. In every city folder, we select the name (like an ID) of each image and append the folder path to the list.

In [0]:
path = './DetectionPatches_256x256/'

cities = os.listdir(path)

files = []

for city in cities:
  all_files = os.listdir(path + city + '/')
  for file in all_files:
    if '.txt' in file:
      files.append(path + city + '/' + file.replace('.txt',''))

print(len(files))

29549


In [0]:
images = []
targets = []

transform = transforms.Compose([transforms.ToTensor()])

Appending the image tensors to the images list. Targets is a list of dictinaries which contains the bounding boxes and the labels information for every image.

In [0]:
for j in range(len(files)):
	f_path = files[j] + '.jpg'
	img = io.imread(f_path)
	img_tensor = transform(img)
	images.append(img_tensor.double())

	f_path = files[j] + '.txt'
	df = pd.read_csv(f_path, sep = " ", header=None)
	target = {}
	boxes = []
	labels = []
	for i in range(len(df)):
		labels.append(3)
		x0 = df.iloc[i,1] - 0.0625
		if x0 < 0:
			x0 = 0
		y0 = df.iloc[i,2] - 0.0625
		if y0 < 0:
			y0 = 0
		x1 = df.iloc[i,1] + 0.0625
		if x1 > 1:
			x1 = 1
		y1 = df.iloc[i,2] + 0.0625
		if y1 > 1:
			y1 = 1
		boxes.append([x0,y0,x1,y1])
	target['boxes'] = torch.tensor(boxes).double()
	target['labels'] = torch.tensor(labels)

	targets.append(target)

	if (j%4000 == 0):
		print(j)

0


Taking Faster R-CNN model with random weights (i.e. without any pre-training). Since it is not a pre-trained model, we can adjust the number of classes as required for our task. The original pre-trained model has 91 classes.

In [0]:
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=False, num_classes=5)

Fine-tuning a pretrained Faster R-CNN model. In this case, we should not alter the output number of classes.

In [0]:
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

model = model.float()
model.cuda()

Training the model with the images and the targets. The loss function used is the sunm of loss classifier and bounding box regression loss. Save the model with the trained weights after every epoch.

In [0]:
optimizer = optim.Adam(model.parameters(), lr=0.001)

model.train()
running_loss = 0.0

for epoch in range(3):
	for i in range(7000):
		optimizer.zero_grad()
		output = model(images[4*i:(4*i)+4], targets[4*i:(4*i)+4])
		loss = output['loss_classifier'] + output['loss_box_reg']
		loss.backward()
		optimizer.step()
		running_loss += loss.item()

		if (i+1) % 1000 == 0:
			print('[%d, %5d] loss: %.3f' %(epoch + 1, i + 1, running_loss / 25))
			running_loss = 0.0

	model_path = './model' + '_' + str(epoch+1) + '.pth'
	torch.save(model.state_dict(), model_path)

Load one of the saved models and predict the output bounding boxes and their labels on the test images.

In [0]:
model = models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.load_state_dict(torch.load('./model_3.pth'))
model = model.float().cuda()

model.eval()

predictions = model(images[2:4])



The first element in the output prediction gives the boxes and the labels information.

In [0]:
print(predictions[0])

{'boxes': tensor([], device='cuda:0', size=(0, 4), grad_fn=<StackBackward>), 'labels': tensor([], device='cuda:0', dtype=torch.int64), 'scores': tensor([], device='cuda:0', grad_fn=<IndexBackward>)}
