### Building an Object Detection Framework

<p style="font-family: times, serif; font-size:14pt; font-style:bold"> To begin with, the snippet below shows how to use exisiting resnet50 model to perform object detection on custom images. Below is example of REST API calls to fetch images from my personal media library https://github.com/Saptarshi-SBU/APIserver</p>

In [None]:
import requests
import json

response = requests.get('http://10.2.59.13:4040/api/v1/listphotos')
data = response.content
print (data)

In [None]:
j_data = json.loads(data)
albums_uuid = []
for kv in j_data
    albums_uuid.append(kv['value']['uuid'])
print (albums_uuid)

In [52]:
payload = {'img' : albums_uuid[0]}
response = requests.get('http://10.2.59.13:4040/api/v1/scaledphoto', params=payload)
print(response)
print (response.headers)

{'Content-Type': 'image/jpg', 'Content-Length': '80698', 'Server': 'Werkzeug/0.16.0 Python/2.7.5', 'Date': 'Mon, 11 Jan 2021 02:07:15 GMT'}


### Torch Installation

<p style="font-family: times, serif; font-size:14pt; font-style:bold">Next, lets use conda to install pytorch on the system. I had a lot of trouble installing opencv with python3.7, there seems to be an issue with conda. Follow the following steps:</p>
<p style="font-family: times, serif; font-size:13pt; font-style:italic">
 <br> 1. conda create -n py36 python=3.6</br>
 <br> 2. conda activate py36 </br>
 <br> 3. conda install pytorch torchvision torchaudio cudatoolkit=10.2 opencv -c pytorch </br>
</p>

In [53]:
# import necessary libraries
from PIL import Image
import matplotlib.pyplot as plt
import torch
import torchvision.transforms as T
import torchvision
import torch
import numpy as np


# get the pretrained model from torchvision.models
# Note: pretrained=True will get the pretrained weights for the model.
# model.eval() to use the model for inference
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Class labels from official PyTorch documentation for the pretrained model
# Note that there are some N/A's 
# for complete list check https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/
# we will use the same list for this notebook
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


def get_prediction(img_path, threshold):
  """
  get_prediction
    parameters:
      - img_path - path of the input image
      - threshold - threshold value for prediction score
    method:
      - Image is obtained from the image path
      - the image is converted to image tensor using PyTorch's Transforms
      - image is passed through the model to get the predictions
      - class, box coordinates are obtained, but only prediction score > threshold
        are chosen.
    
  """
  img = Image.open(img_path)
  transform = T.Compose([T.ToTensor()])
  img = transform(img)
  pred = model([img])
  pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())]
  pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())]
  pred_score = list(pred[0]['scores'].detach().numpy())
  pred_t = [pred_score.index(x) for x in pred_score if x>threshold][-1]
  pred_boxes = pred_boxes[:pred_t+1]
  pred_class = pred_class[:pred_t+1]
  return pred_boxes, pred_class
  

In [None]:
from PIL import Image, ImageFont, ImageDraw
from io import BytesIO

def object_detection_api(img_path, threshold=0.5):
  """
  object_detection_api
    parameters:
      - img_path - path of the input image
    method:
      - prediction is obtained from get_prediction method
      - for each prediction, bounding box is drawn and text is written 
        with opencv
      - the final image is displayed
  """
  boxes, pred_cls = get_prediction(img_path, threshold)
  source_img = Image.open(img_path).convert("RGB")
  draw = ImageDraw.Draw(source_img)
  out_file = "torch.jpg"
  for i in range(len(boxes)):
    draw.rectangle(boxes[i], fill=None, outline="yellow", width=5)
    draw.text(boxes[i][0], pred_cls[i], font=ImageFont.truetype("/usr/share/fonts/gnu-free/FreeMono.ttf", 32), fill="red", width=32)
  source_img.save(out_file) 
  im = Image.open(out_file)
  im.show()  
  print (boxes, pred_cls)
  return pred_cls

### Sample Inference and Labeling

<p style="font-family: times, serif; font-size:14pt; font-style:bold">
Now that we had our object detection api ready, we can run inference on a sample image
using the above object detection API</p>

In [None]:
payload = {'img' : albums_uuid[0]}
response = requests.get('http://10.2.59.13:4040/api/v1/scaledphoto', params=payload)
file_jpgdata = BytesIO(response.content)
object_detection_api(file_jpgdata)

### Concurrent Labelling using DASK

<p style="font-family: times, serif; font-size:14pt; font-style:bold">
Next Below is a sample to run object detection and perform labeling concurrently on a batch of images
python big data framework DASK. Dask is a work scheduler which acts by paritioning dataset and assign
workers to a partition. Since we have thousands of images, the below few lines of code helps us achieve
the required throughput</p>

In [81]:
import requests
import json
import dask
import dask.bag as db
from io import BytesIO

API_HOST = '10.2.59.13'
LISTPHOTOS_API = 'http://{}:4040/api/v1/listphotos'.format(API_HOST)
GETPHOTO_API = 'http://{}:4040/api/v1/scaledphoto'.format(API_HOST)
LABELPHOTO_API = 'http://{}:4040/api/v1/label'.format(API_HOST)

def fetch_all_images():
    albums_uuid = []
    response = requests.get(LISTPHOTOS_URL)
    data = response.content
    j_data = json.loads(data)
    for kv in j_data:
        albums_uuid.append(kv['value']['uuid'])
    return albums_uuid

def label_image(img_uuid):
    response = requests.get(GETPHOTO_URL, params={'img':img_uuid})
    jpgdata = BytesIO(response.content)
    pred_cls = object_detection_api(jpgdata)
    labels = ' '.join(pred_cls)
    response = requests.post(LABELPHOTO_URL, data={'img':img_uuid, 'labels':labels})
    response = requests.get(LABELPHOTO_URL, params={'img':img_uuid})
    print (response.content)

#fetch_all_images()
b = db.from_sequence(albums_uuid[0:16], npartitions=4).map(label_image)
r = b.compute()
    

[[(49.699875, 61.867325), (527.6324, 750.3205)]] ['person']
b'"person"'
[[(310.35916, 18.110456), (665.9717, 459.42868)], [(310.77026, 609.9912), (435.44623, 711.0056)], [(10.206599, 444.7967), (1008.0, 756.0)], [(177.51633, 490.07703), (314.01196, 572.94586)], [(557.04407, 589.51434), (678.3497, 686.18567)], [(588.5785, 521.7588), (689.3299, 600.59076)], [(142.30682, 506.6534), (251.05882, 545.9602)], [(433.95364, 623.8414), (554.6413, 723.4723)], [(506.91428, 490.52795), (559.3247, 527.59546)]] ['person', 'bowl', 'dining table', 'bowl', 'bowl', 'bowl', 'spoon', 'bowl', 'donut']
b'"person bowl dining table bowl bowl bowl spoon bowl donut"'
[[(164.95193, 247.10779), (583.1691, 999.88477)], [(0.0, 1.0117171), (86.66291, 268.19583)], [(619.72235, 686.3288), (728.5662, 811.3008)], [(168.11859, 3.3667922), (628.5089, 186.56113)], [(39.11149, 34.989086), (715.9636, 874.48486)]] ['person', 'person', 'remote', 'person', 'bed']
[[(15.77534, 152.84142), (731.1579, 871.7225)], [(0.0, 525.70044),

### Datasets, DataLoaders

<p style="font-family: times, serif; font-size:14pt; font-style:bold">
Next Lets focus on building a pytorch based dataset for our Component Counting project
In order to train a PyTorch neural network you must write code to read training data into memory,
convert the data to PyTorch tensors, and serve the data up in batches. </p>

<p style="font-family: times, serif; font-size:14pt; font-style:bold">
The below code snippet creates a PyTorch based DataSet Class for our PID Dataset.
Images are annotated using the labelImg too. Next they are converted to a DataFrame using the below listed recipes.
</p>

<p style="font-family: times, serif; font-size:14pt; font-style:italic">
<br>pascal_voc_xml_to_csv method : Prepare a DataFrame Object from the xml files</br>
<br>CustomImageDataset class : Prepare a PyTorch Dataset Object from the DataFrame</br>
<br>CustomImageDataLoader class : Prepare a PyTorch DataLoader which is fed to the Model for training</br>
</p>

In [9]:
import os
import pandas as pd
from glob import glob
import xml.etree.ElementTree as ET

def pascal_voc_xml_to_csv(data_dir):
    '''
      The recipe is used to convert images annotations
      in pascal voc format to the csv format.
    '''
    xml_list = []
    counter = 0
    for xml_file in glob(data_dir + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        filename = root.find('filename').text
        size = root.find('size')
        width = size.find('width').text
        height = size.find('height').text
        for member in root.findall('object'):
            box = member.find('bndbox')
            label = member.find('name').text
            row = (filename, width, height, label, int(float(box.find('xmin').text)), int(float(box.find('ymin').text)),
               int(float(box.find('xmax').text)), int(float(box.find('ymax').text)), counter)
            xml_list.append(row)
        counter += 1
        column_names = ['filename', 'width', 'height', 'label', 'xmin', 'ymin', 'xmax', 'ymax', 'image_id']
        xml_df = pd.DataFrame(xml_list, columns=column_names)
        print (xml_df)
    #xml_df.to_csv('xml2csv.csv')
    return xml_df

In [37]:
import numpy
import torch
import torchvision
import cv2
from torchvision import transforms
from PIL import Image

class CustomImageDataset(torch.utils.data.Dataset):
    '''
        Map Style Dataset
        https://pytorch.org/docs/stable/data.html#dataset-types
    '''
    
    def __init__(self, data_dir):
        self.data_dir = data_dir
        self.df = pascal_voc_xml_to_csv(data_dir) # dataframe
        self.transform = transforms.Compose([transforms.ToTensor(), \
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])   
        self.classes = []
        
    def __len__(self):
        return len(self.df['image_id'].unique().tolist())
    
    def __img2tensor(self, img):
        if self.transform:
            for t in self.transform.transforms:
                img = t(img)
        return img
    
    def __bboxes2tensor(self, bboxes):
        return torch.tensor(bboxes).view(-1, 4)
    
    def __labels2tensor(self, labels):
        for label in set(labels):
            if label not in self.classes:
                self.classes.append(label)
        label_intarray = [self.classes.index(label) for label in labels]
        return torch.tensor(label_intarray)
    
    def __getitem__(self, idx):
        '''
            https://pandas.pydata.org/docs/user_guide/dsintro.html
            This function implement's Torch Dataset __getitem__
        '''
        boxes = []
        labels = []
        targets = {}
        if torch.is_tensor(idx):
            idx = idx.tolist()
        object_entries = self.df.loc[self.df['image_id'] == idx]
        filename = object_entries.iloc[0, 0]
        for object_idx, row in object_entries.iterrows():
            box = object_entries.iloc[object_idx, 4:8]
            boxes.append(box)
            label = object_entries.iloc[object_idx, 3]
            labels.append(label)
        imgpath = '{}/{}'.format(self.data_dir, filename)
        img = cv2.imread(imgpath)
        #img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = self.__img2tensor(img)
        targets['boxes'] = self.__bboxes2tensor(boxes)
        targets['labels'] = self.__labels2tensor(labels)
        print (imgpath, type(img))
        return img, targets
      
class CustomImageDataLoader(torch.utils.data.DataLoader):
    
    def __init__(self, dataset, **kwargs):
        super().__init__(dataset, collate_fn=CustomImageDataLoader.collate_data, **kwargs)
        
    @staticmethod    
    def collate_data(batch):
        images, targets = zip(*batch)
        return list(images), list(targets)

# Example Usage
data_dir='data'
pascal_voc_xml_to_csv(data_dir)

dataset = CustomImageDataset(data_dir)
N = dataset.__len__()
print('image count :', N)
for i in range(N):
    img, targets = dataset.__getitem__(i)
    #print (targets['boxes'])
        
my_ldr = CustomImageDataLoader(dataset, batch_size=10, shuffle=True)
for (idx, batch) in enumerate(my_ldr):
    print (idx, batch)

                   filename width height       label  xmin  ymin  xmax  ymax  \
0   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   540   267   563   284   
1   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   472   301   493   314   
2   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   410   418   428   436   
3   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   375   247   400   268   
4   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   831   401   846   422   
5   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   814   445   832   462   
6   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   837   444   858   463   
7   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   829   488   846   507   
8   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   329   315   344   332   
9   6987_61_1112_REV0_1.jpg  1123   1588  ball valve   330   413   345   431   
10  6987_61_1112_REV0_1.jpg  1123   1588  ball valve   326   300   344   312   
11  6987_61_1112_REV0_1.jpg  1123   1588

### Apply Our DataLoader to a Model for Training

<p style="font-family: times, serif; font-size:14pt; font-style:bold">
 Next, lets feed our CustomDataset to a pre-trained model, using our customDataLoader
</p>

In [43]:
import torchvision

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.train()
learning_rate=0.005
momentum=0.9
weight_decay=0.0005
gamma=0.1
lr_step_size=3
# Get parameters that have grad turned on (i.e. parameters that should be trained)
parameters = [p for p in model.parameters() if p.requires_grad]
# Create an optimizer that uses SGD (stochastic gradient descent) to train the parameters
optimizer = torch.optim.SGD(parameters, lr=learning_rate, momentum=momentum, weight_decay=weight_decay)
# Create a learning rate scheduler that decreases learning rate by gamma every lr_step_size epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=lr_step_size, gamma=gamma)
loader = CustomImageDataLoader(dataset, batch_size=1, shuffle=True)
epochs = 10
losses = []
for epoch in range(epochs):
    for images, targets in loader:
        #print(images[0].shape, targets[0])
        loss_dict = model(images, targets)
        #print (loss_dict)
        total_loss = sum(loss for loss in loss_dict.values())
        print (total_loss)
        # Zero any old/existing gradients on the model's parameters
        optimizer.zero_grad()
        # Compute gradients for each parameter based on the current loss calculation
        total_loss.backward()
        # Update model parameters from gradients: param -= learning_rate * param.grad
        optimizer.step()
    #lr_scheduler.step()

data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(1.6344, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(345.6148, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(2.3462e+08, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(nan, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(nan, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(nan, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(nan, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(nan, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(nan, grad_fn=<AddBackward0>)
data/6987_61_1112_REV0_1.jpg <class 'torch.Tensor'>
tensor(nan, grad_fn=<AddBackward0>)
