### This notebook requires a GPU runtime to run.
### Please select the menu option "Runtime" -> "Change runtime type", select "Hardware Accelerator" -> "GPU" and click "SAVE"

----------------------------------------------------------------------

# SSD

*Author: NVIDIA*

**Single Shot MultiBox Detector model for object detection**

_ | _
- | -
![alt](https://pytorch.org/assets/images/ssd_diagram.png) | ![alt](https://pytorch.org/assets/images/ssd.png)



### Model Description

This SSD300 model is based on the
[SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper, which
describes SSD as “a method for detecting objects in images using a single deep neural network".
The input size is fixed to 300x300.

The main difference between this model and the one described in the paper is in the backbone.
Specifically, the VGG model is obsolete and is replaced by the ResNet-50 model.

From the
[Speed/accuracy trade-offs for modern convolutional object detectors](https://arxiv.org/abs/1611.10012)
paper, the following enhancements were made to the backbone:
*   The conv5_x, avgpool, fc and softmax layers were removed from the original classification model.
*   All strides in conv4_x are set to 1x1.

The backbone is followed by 5 additional convolutional layers.
In addition to the convolutional layers, we attached 6 detection heads:
*   The first detection head is attached to the last conv4_x layer.
*   The other five detection heads are attached to the corresponding 5 additional layers.

Detector heads are similar to the ones referenced in the paper, however,
they are enhanced by additional BatchNorm layers after each convolution.

### Example

In the example below we will use the pretrained SSD model to detect objects in sample images and visualize the result.

To run the example you need some extra python packages installed. These are needed for preprocessing images and visualization.

In [1]:
%%bash
pip install numpy scipy scikit-image matplotlib

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Load an SSD model pretrained on COCO dataset, as well as a set of utility methods for convenient and comprehensive formatting of input and output of the model.

In [2]:
import torch
ssd_model = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd')
utils = torch.hub.load('NVIDIA/DeepLearningExamples:torchhub', 'nvidia_ssd_processing_utils')

Downloading: "https://github.com/NVIDIA/DeepLearningExamples/archive/torchhub.zip" to /root/.cache/torch/hub/torchhub.zip
  "pytorch_quantization module not found, quantization will not be available"
  "pytorch_quantization module not found, quantization will not be available"
Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /root/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth


  0%|          | 0.00/97.8M [00:00<?, ?B/s]

Downloading checkpoint from https://api.ngc.nvidia.com/v2/models/nvidia/ssd_pyt_ckpt_amp/versions/20.06.0/files/nvidia_ssdpyt_amp_200703.pt
Using cache found in /root/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


In [54]:
!wget https://api.ngc.nvidia.com/v2/models/nvidia/ssd_pyt_ckpt_amp/versions/20.06.0/files/nvidia_ssdpyt_amp_200703.pt

--2022-06-21 14:40:12--  https://api.ngc.nvidia.com/v2/models/nvidia/ssd_pyt_ckpt_amp/versions/20.06.0/files/nvidia_ssdpyt_amp_200703.pt
Resolving api.ngc.nvidia.com (api.ngc.nvidia.com)... 34.208.191.90, 44.241.224.68
Connecting to api.ngc.nvidia.com (api.ngc.nvidia.com)|34.208.191.90|:443... connected.
HTTP request sent, awaiting response... 302 
Location: https://prod-model-registry-ngc-bucket.s3.us-west-2.amazonaws.com/org/nvidia/models/ssd_pyt_ckpt_amp/versions/20.06.0/files/nvidia_ssdpyt_amp_200703.pt?response-content-disposition=attachment%3B%20filename%3D%22nvidia_ssdpyt_amp_200703.pt%22&response-content-type=application%2Foctet-stream&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEC4aCXVzLXdlc3QtMiJHMEUCIGk8sLVyN6YOKnIG3uPrmUNSIBhAg2kfrswObnYwWAy4AiEAlVUB9rzGoHD%2B5crRX8BZXIzFl9fCdULZcsdc1NcZH1wq0gQIRxAEGgw3ODkzNjMxMzUwMjciDNW7XMifiz1M5Uvv%2FCqvBIEHLjQgkd99pjaCrrAwCVptvoCLzHy5zzxe0%2FHz1eiBrv0FTOLtnRX8zSBxGiYXuoNtjQ1FUlf6jgUrOD76jK%2BZA78D98mMpBxorDFLs3XyePWXFC0oCcPKYewqo%2FgK5C%2BuJ8f

In [47]:
cd /root/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub

/root/.cache/torch/hub/NVIDIA_DeepLearningExamples_torchhub


In [50]:
ll

total 40
drwxr-xr-x 5 root 4096 Jun 21 13:49 [0m[01;34mClassification[0m/
drwxr-xr-x 5 root 4096 Jun 21 13:49 [01;34mDetection[0m/
drwxr-xr-x 3 root 4096 Jun 21 13:49 [01;34mDrugDiscovery[0m/
drwxr-xr-x 3 root 4096 Jun 21 13:49 [01;34mForecasting[0m/
drwxr-xr-x 5 root 4096 Jun 21 13:49 [01;34mLanguageModeling[0m/
drwxr-xr-x 4 root 4096 Jun 21 13:49 [01;34mRecommendation[0m/
drwxr-xr-x 5 root 4096 Jun 21 13:49 [01;34mSegmentation[0m/
drwxr-xr-x 5 root 4096 Jun 21 13:49 [01;34mSpeechRecognition[0m/
drwxr-xr-x 6 root 4096 Jun 21 13:49 [01;34mSpeechSynthesis[0m/
drwxr-xr-x 5 root 4096 Jun 21 13:49 [01;34mTranslation[0m/


In [5]:
!git clone https://github.com/NVIDIA/DeepLearningExamples

Cloning into 'DeepLearningExamples'...
remote: Enumerating objects: 30207, done.[K
remote: Counting objects: 100% (1168/1168), done.[K
remote: Compressing objects: 100% (733/733), done.[K
remote: Total 30207 (delta 396), reused 1082 (delta 377), pack-reused 29039[K
Receiving objects: 100% (30207/30207), 93.66 MiB | 31.43 MiB/s, done.
Resolving deltas: 100% (21488/21488), done.
Checking out files: 100% (4737/4737), done.


In [8]:
pip install git+https://github.com/NVIDIA/dllogger.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/NVIDIA/dllogger.git
  Cloning https://github.com/NVIDIA/dllogger.git to /tmp/pip-req-build-hgyq4yim
  Running command git clone -q https://github.com/NVIDIA/dllogger.git /tmp/pip-req-build-hgyq4yim
Building wheels for collected packages: DLLogger
  Building wheel for DLLogger (setup.py) ... [?25l[?25hdone
  Created wheel for DLLogger: filename=DLLogger-1.0.0-py3-none-any.whl size=5670 sha256=649bb26d3ff7aebf4e63f3f11fbe3eecba13c85cf3287d7b3fda1748bfc74310
  Stored in directory: /tmp/pip-ephem-wheel-cache-sirqokfo/wheels/db/ba/1b/87515aba93adffc7caccc21c0e93f80b70a857188790ce0436
Successfully built DLLogger
Installing collected packages: DLLogger
Successfully installed DLLogger-1.0.0


In [18]:
!git clone https://github.com/NVIDIA/apex


fatal: destination path 'apex' already exists and is not an empty directory.


In [19]:
cd apex

/content/DeepLearningExamples/PyTorch/Detection/SSD/apex


In [20]:
!python setup.py install



torch.__version__  = 1.11.0+cu113


running install
running bdist_egg
running egg_info
creating apex.egg-info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
writing manifest file 'apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'apex.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib
creating build/lib/apex
copying apex/_autocast_utils.py -> build/lib/apex
copying apex/__init__.py -> build/lib/apex
creating build/lib/apex/reparameterization
copying apex/reparameterization/__init__.py -> build/lib/apex/reparameterization
copying apex/reparameterization/weight_norm.py -> build/lib/apex/reparameterization
copying apex/reparameterization/reparameterization.py -> build/lib/apex/reparameterization
creating build/lib/apex/pyprof
copying apex/pyprof/__init__.py -> 

In [57]:
cd /content/DeepLearningExamples/PyTorch/Detection/SSD/

/content/DeepLearningExamples/PyTorch/Detection/SSD


In [58]:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/, https://developer.download.nvidia.com/compute/redist


In [63]:
!python ./main.py --backbone resnet50 --checkpoint /content/DeepLearningExamples/nvidia_ssdpyt_amp_200703.pt --data /content/DeepLearningExamples/train/

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  images_torch_type = to_torch_type[np.dtype(images[0].dtype())]
  bboxes_torch_type = to_torch_type[np.dtype(bboxes[0][0].dtype())]
  labels_torch_type = to_torch_type[np.dtype(labels[0][0].dtype())]
  images_torch_type = to_torch_type[np.dtype(images[0].dtype())]
  bboxes_torch_type = to_torch_type[np.dtype(bboxes[0][0].dtype())]
  labels_torch_type = to_torch_type[np.dtype(labels[0][0].dtype())]
  images_torch_type = to_torch_type[np.dtype(images[0].dtype())]
  bboxes_torch_type = to_torch_type[np.dtype(bboxes[0][0].dtype())]
  labels_torch_type = to_torch_type[np.dtype(labels[0][0].dtype())]
DLL 2022-06-21 16:33:04.121331 - (64, 17660) loss : 1.4777299165725708 
  images_torch_type = to_torch_type[np.dtype(images[0].dtype())]
  bboxes_torch_type = to_torch_type[np.dtype(bboxes[0][0].dtype())]
  labels_torch_type = to_torch_type[np.dtype(labels[0][0].dtype())]
  images_torch_type = to_torch_type[np.dtype(images[0].dtyp

Now, prepare the loaded model for inference

In [None]:
ssd_model.to('cuda')
ssd_model.eval()

Prepare input images for object detection.
(Example links below correspond to first few test images from the COCO dataset, but you can also specify paths to your local images here)

In [None]:
uris = [
    'http://images.cocodataset.org/val2017/000000397133.jpg',
    'http://images.cocodataset.org/val2017/000000037777.jpg',
    'http://images.cocodataset.org/val2017/000000252219.jpg'
]

Format the images to comply with the network input and convert them to tensor.

In [None]:
inputs = [utils.prepare_input(uri) for uri in uris]
tensor = utils.prepare_tensor(inputs)

Run the SSD network to perform object detection.

In [None]:
with torch.no_grad():
    detections_batch = ssd_model(tensor)

By default, raw output from SSD network per input image contains
8732 boxes with localization and class probability distribution.
Let's filter this output to only get reasonable detections (confidence>40%) in a more comprehensive format.

In [None]:
results_per_input = utils.decode_results(detections_batch)
best_results_per_input = [utils.pick_best(results, 0.40) for results in results_per_input]

The model was trained on COCO dataset, which we need to access in order to translate class IDs into object names.
For the first time, downloading annotations may take a while.

In [None]:
classes_to_labels = utils.get_coco_object_dictionary()

Finally, let's visualize our detections

In [None]:
from matplotlib import pyplot as plt
import matplotlib.patches as patches

for image_idx in range(len(best_results_per_input)):
    fig, ax = plt.subplots(1)
    # Show original, denormalized image...
    image = inputs[image_idx] / 2 + 0.5
    ax.imshow(image)
    # ...with detections
    bboxes, classes, confidences = best_results_per_input[image_idx]
    for idx in range(len(bboxes)):
        left, bot, right, top = bboxes[idx]
        x, y, w, h = [val * 300 for val in [left, bot, right - left, top - bot]]
        rect = patches.Rectangle((x, y), w, h, linewidth=1, edgecolor='r', facecolor='none')
        ax.add_patch(rect)
        ax.text(x, y, "{} {:.0f}%".format(classes_to_labels[classes[idx] - 1], confidences[idx]*100), bbox=dict(facecolor='white', alpha=0.5))
plt.show()

### Details
For detailed information on model input and output,
training recipies, inference and performance visit:
[github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)
and/or [NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)

### References

 - [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper
 - [Speed/accuracy trade-offs for modern convolutional object detectors](https://arxiv.org/abs/1611.10012) paper
 - [SSD on NGC](https://ngc.nvidia.com/catalog/resources/nvidia:ssd_for_pytorch)
 - [SSD on github](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)