### This notebook requires a GPU runtime to run.
### Please select the menu option "Runtime" -> "Change runtime type", select "Hardware Accelerator" -> "GPU" and click "SAVE"

----------------------------------------------------------------------

# SSD

*Author: NVIDIA*

**Single Shot MultiBox Detector model for object detection**

_ | _
- | -
![alt](https://pytorch.org/assets/images/ssd_diagram.png) | ![alt](https://pytorch.org/assets/images/ssd.png)



### Model Description

This SSD300 model is based on the
[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper, which
describes SSD as “a method for detecting objects in images using a single deep neural network".
The input size is fixed to 300x300.

The main difference between this model and the one described in the paper is in the backbone.
Specifically, the VGG model is obsolete and is replaced by the ResNet-50 model.

From the
[Speed/accuracy trade-offs for modern convolutional object detectors](https://arxiv.org/abs/1611.10012)
paper, the following enhancements were made to the backbone:
*   The conv5_x, avgpool, fc and softmax layers were removed from the original classification model.
*   All strides in conv4_x are set to 1x1.

The backbone is followed by 5 additional convolutional layers.
In addition to the convolutional layers, we attached 6 detection heads:
*   The first detection head is attached to the last conv4_x layer.
*   The other five detection heads are attached to the corresponding 5 additional layers.

Detector heads are similar to the ones referenced in the paper, however,
they are enhanced by additional BatchNorm layers after each convolution.

### Example

In the example below we will use the pretrained SSD model to detect objects in sample images and visualize the result.

To run the example you need some extra python packages installed. These are needed for preprocessing images and visualization.

In [None]:
%%bash
pip install numpy scipy scikit-image matplotlib

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Load an SSD model pretrained on COCO dataset, as well as a set of utility methods for convenient and comprehensive formatting of input and output of the model.

In [None]:
import torch
ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')

Using cache found in /root/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub
  "pytorch_quantization module not found, quantization will not be available"
  "pytorch_quantization module not found, quantization will not be available"
Using cache found in /root/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


Now, prepare the loaded model for inference

In [None]:
ssd_model.to('cuda')
ssd_model.eval()

SSD300(
  (feature_extractor): ResNet(
    (feature_extractor): Sequential(
      (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (4): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (relu): ReLU(inplac

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)

Mounted at /content/gdrive/


In [None]:
from os import listdir
from matplotlib import image
uris = list()
fan = list()
for filename in listdir('gdrive/MyDrive/resize'):
	# load image
	img_data = image.imread('gdrive/MyDrive/resize/' + filename)
	# store loaded image
    #image = Image.open(filename)
    #image.show()
	uris.append('gdrive/MyDrive/resize/' + filename)
	fan.append('samples/CAM_FRONT/'+filename)
	print('> loaded %s %s' % (filename, img_data.shape))

> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151604512404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151605012404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151603512404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151604012404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151606012404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151611862404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151609512404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151608512404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151609912404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151605512404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__1533151608012404.jpg (300, 300, 3)
> loaded n008-2018-08-01-15-16-36-0400__CAM_FRONT__153

Prepare input images for object detection.
(Example links below correspond to first few test images from the COCO dataset, but you can also specify paths to your local images here)

In [None]:
# uris = [
#     'http://images.cocodataset.org/val2017/000000397133.jpg',
#     'http://images.cocodataset.org/val2017/000000037777.jpg',
#     'http://images.cocodataset.org/val2017/000000252219.jpg'
# ]
#uris=loaded_images

Format the images to comply with the network input and convert them to tensor.

In [None]:
inputs = [utils.prepare_input(uri) for uri in uris]
tensor = utils.prepare_tensor(inputs)

Run the SSD network to perform object detection.

In [None]:
with torch.no_grad():
    detections_batch = ssd_model(tensor)

By default, raw output from SSD network per input image contains
8732 boxes with localization and class probability distribution.
Let's filter this output to only get reasonable detections (confidence>40%) in a more comprehensive format.

In [None]:
results_per_input = utils.decode_results(detections_batch)
best_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input]

The model was trained on COCO dataset, which we need to access in order to translate class IDs into object names.
For the first time, downloading annotations may take a while.

In [None]:
classes_to_labels = utils.get_coco_object_dictionary()

In [None]:
best_results_per_input[0]

[array([[0.45423692, 0.5422543 , 0.51764435, 0.6245825 ],
        [0.888757  , 0.5337701 , 0.91178346, 0.648736  ],
        [0.8524687 , 0.53735894, 0.8794756 , 0.6496492 ],
        [0.2688211 , 0.54748785, 0.29050454, 0.65290594]], dtype=float32),
 array([3, 1, 1, 1]),
 array([0.6689399 , 0.7345909 , 0.77332634, 0.7820357 ], dtype=float32)]

Finally, let's visualize our detections

In [None]:
# textfile = list()
# for image_idx in range(len(best_results_per_input)):
#   #fan[0]
#   tr=''
#   k=0
#   for i in fan[image_idx]:
#     if(k==0):
#       tr=tr+i;
#     else:
#       tr=tr+'txt'
#       break
#     if(i=='.'):
#       k=1;
#   textfile.append(tr)

In [None]:
#textfile[9]

In [None]:
pip install xlsxwriter

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import xlsxwriter

In [None]:
workbook = xlsxwriter.Workbook('predictions.xlsx')

In [None]:
worksheet = workbook.add_worksheet()

In [None]:
len(best_results_per_input)

404

In [None]:
count=0

In [None]:
for image_idx in range(len(best_results_per_input)):
  bboxes1, classes1, confidences1 = best_results_per_input[image_idx]
  for idx in range(1,len(bboxes1)+1):
    count=count+1
    left1, bot1, right1, top1 = bboxes1[idx-1]
    x1=(right1+left1)*800
    x1=round(x1,2)
    y1=(top1+bot1)*450
    y1=round(y1,2)
    w1=(right1-left1)*1600
    w1=round(w1,2)
    h1=(top1-bot1)*900
    h1=round(h1,2)
    l1=classes_to_labels[classes1[idx-1] - 1]
    c1=confidences1[idx-1]*100
    c1=round(c1,2)
    worksheet.write('A'+str(count), '['+str(x1)+','+str(y1)+','+str(w1)+','+str(h1)+']')
    worksheet.write('B'+str(count), str(c1))
    worksheet.write('C'+str(count), l1)
    worksheet.write('D'+str(count), str(fan[image_idx]))
    

In [None]:
workbook.close()

In [None]:
# for image_idx in range(len(best_results_per_input)):
#   f=open('gdrive/MyDrive/text/'+textfile[image_idx], mode="w")
#   bboxes1, classes1, confidences1 = best_results_per_input[image_idx]
#   for idx in range(len(bboxes1)):
#     left1, bot1, right1, top1 = bboxes1[idx]
#     x1, y1, w1, h1 = [val * 300 for val in [left1, bot1, right1 - left1, top1 - bot1]]
#     l1=classes_to_labels[classes1[idx] - 1]
#     c1=confidences1[idx]*100
#     f.write(l1)
#     f.write(" ")
#     f.write(str(c1))
#     f.write(" ")
#     f.write(str(x1))
#     f.write(" ")
#     f.write(str(y1))
#     f.write(" ")
#     f.write(str(w1))
#     f.write(" ")
#     f.write(str(h1))
#     f.write('\n')
#   f.close()


In [None]:
#f.write('Create a new text file!')

In [None]:
# from matplotlib import pyplot as plt
# import matplotlib.patches as patches

# for image_idx in range(len(best_results_per_input)):
#     fig, ax = plt.subplots(1)
#     # Show original, denormalized image...
#     image = inputs[image_idx] / 2 + 0.5
#     ax.imshow(image)
#     # ...with detections
#     bboxes, classes, confidences = best_results_per_input[image_idx]
#     for idx in range(len(bboxes)):
#         left, bot, right, top = bboxes[idx]
#         x, y, w, h = [val * 300 for val in [left, bot, right - left, top - bot]]
#         rect = patches.Rectangle((x, y), w, h, linewidth=1, edgecolor='r', facecolor='none')
#         ax.add_patch(rect)
#         ax.text(x, y, "{} {:.0f}%".format(classes_to_labels[classes[idx] - 1], confidences[idx]*100), bbox=dict(facecolor='white', alpha=0.5))
#     plt.savefig('gdrive/MyDrive/result/{0}'.format(fan[image_idx]))
# #plt.show()


### Details
For detailed information on model input and output,
training recipies, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)

### References

 - [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper
 - [Speed/accuracy trade-offs for modern convolutional object detectors](https://arxiv.org/abs/1611.10012) paper
 - [SSD on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)
 - [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)