# Tensorflow Object Detection API and AWS Sagemaker

In this notebook, you will train and evaluate different models using the [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and [AWS Sagemaker](https://aws.amazon.com/sagemaker/). 

If you ever feel stuck, you can refer to this [tutorial](https://aws.amazon.com/blogs/machine-learning/training-and-deploying-models-using-tensorflow-2-with-the-object-detection-api-on-amazon-sagemaker/).

## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [1]:
%%capture
%pip install tensorflow_io sagemaker -U

In [2]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

Save the IAM role in a variable called `role`. This would be useful when training the model.

In [3]:
role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::789453636658:role/service-role/AmazonSageMaker-ExecutionRole-20230817T141326


In [4]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
        'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://object-detection-project-1/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the the repository
* build the docker image and push it 
* print the container name

In [5]:
%%bash

# clone the repo and get the scripts
git clone https://github.com/tensorflow/models.git docker/models

# get model_main and exporter_main files from TF2 Object Detection GitHub repository
cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
cp docker/models/research/object_detection/model_main_tf2.py source_dir

fatal: destination path 'docker/models' already exists and is not an empty directory.


In [6]:
# build and push the docker image. This code can be commented after being ran once.
# This will take around 10 mins.
image_name = 'tf2-object-detection'
!sh ./docker/build_and_push.sh $image_name

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Building image with name tf2-object-detection
Sending build context to Docker daemon  728.8MB
Step 1/17 : FROM tensorflow/tensorflow:2.9.0-gpu
 ---> c8d9ee2a0ff4
Step 2/17 : ARG DEBIAN_FRONTEND=noninteractive
 ---> Running in b0cb87fec27e
Removing intermediate container b0cb87fec27e
 ---> d18308495fcc
Step 3/17 : RUN rm /etc/apt/sources.list.d/cuda.list
 ---> Running in 3ccfb1dc4de2
Removing intermediate container 3ccfb1dc4de2
 ---> 26ebce743696
Step 4/17 : RUN apt-key del 7fa2af80
 ---> Running in 99ee93da62ce
OK
Removing intermediate container 99ee93da62ce
 ---> 428ae12b932f
Step 5/17 : RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
 ---> Running in 09e7eaa14ef4
[0mExecuting: /tmp/apt-key-gpghome.eqixb6qh7O/gpg.1.sh --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
[91m

Get:3 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-wks-server amd64 2.2.19-3ubuntu2.2 [90.2 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gnupg-utils amd64 2.2.19-3ubuntu2.2 [481 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg-agent amd64 2.2.19-3ubuntu2.2 [232 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpg amd64 2.2.19-3ubuntu2.2 [482 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpgconf amd64 2.2.19-3ubuntu2.2 [124 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gnupg-l10n all 2.2.19-3ubuntu2.2 [51.7 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gnupg all 2.2.19-3ubuntu2.2 [259 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpgsm amd64 2.2.19-3ubuntu2.2 [217 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpgv amd64 2.2.19-3ubuntu2.2 [200 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-upda

Get:83 http://archive.ubuntu.com/ubuntu focal/main amd64 x11proto-core-dev all 2019.2-1ubuntu1 [2620 B]
Get:84 http://archive.ubuntu.com/ubuntu focal/main amd64 libxau-dev amd64 1:1.0.9-0ubuntu1 [9552 B]
Get:85 http://archive.ubuntu.com/ubuntu focal/main amd64 libxdmcp-dev amd64 1:1.1.3-0ubuntu1 [25.3 kB]
Get:86 http://archive.ubuntu.com/ubuntu focal/main amd64 xtrans-dev all 1.4.0-1 [68.9 kB]
Get:87 http://archive.ubuntu.com/ubuntu focal/main amd64 libpthread-stubs0-dev amd64 0.4-1 [5384 B]
Get:88 http://archive.ubuntu.com/ubuntu focal/main amd64 libxcb1-dev amd64 1.14-2 [80.5 kB]
Get:89 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libx11-dev amd64 2:1.6.9-2ubuntu1.5 [647 kB]
Get:90 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libglx-dev amd64 1.3.2-1~ubuntu0.20.04.2 [14.0 kB]
Get:91 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgl-dev amd64 1.3.2-1~ubuntu0.20.04.2 [97.8 kB]
Get:92 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libeg

Selecting previously unselected package libdrm2:amd64.
Preparing to unpack .../007-libdrm2_2.4.107-8ubuntu1~20.04.2_amd64.deb ...
Unpacking libdrm2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Selecting previously unselected package libedit2:amd64.
Preparing to unpack .../008-libedit2_3.1-20191231-1_amd64.deb ...
Unpacking libedit2:amd64 (3.1-20191231-1) ...
Selecting previously unselected package libfido2-1:amd64.
Preparing to unpack .../009-libfido2-1_1.3.1-1ubuntu2_amd64.deb ...
Unpacking libfido2-1:amd64 (1.3.1-1ubuntu2) ...
Selecting previously unselected package libxau6:amd64.
Preparing to unpack .../010-libxau6_1%3a1.0.9-0ubuntu1_amd64.deb ...
Unpacking libxau6:amd64 (1:1.0.9-0ubuntu1) ...
Selecting previously unselected package libxdmcp6:amd64.
Preparing to unpack .../011-libxdmcp6_1%3a1.1.3-0ubuntu1_amd64.deb ...
Unpacking libxdmcp6:amd64 (1:1.1.3-0ubuntu1) ...
Selecting previously unselected package libxcb1:amd64.
Preparing to unpack .../012-libxcb1_1.14-2_amd64.deb ...
Unpacking lib

Selecting previously unselected package libxcb-xfixes0:amd64.
Preparing to unpack .../054-libxcb-xfixes0_1.14-2_amd64.deb ...
Unpacking libxcb-xfixes0:amd64 (1.14-2) ...
Selecting previously unselected package libxshmfence1:amd64.
Preparing to unpack .../055-libxshmfence1_1.3-1_amd64.deb ...
Unpacking libxshmfence1:amd64 (1.3-1) ...
Selecting previously unselected package libegl-mesa0:amd64.
Preparing to unpack .../056-libegl-mesa0_21.2.6-0ubuntu0.1~20.04.2_amd64.deb ...
Unpacking libegl-mesa0:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Selecting previously unselected package libegl1:amd64.
Preparing to unpack .../057-libegl1_1.3.2-1~ubuntu0.20.04.2_amd64.deb ...
Unpacking libegl1:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Selecting previously unselected package libxcb-glx0:amd64.
Preparing to unpack .../058-libxcb-glx0_1.14-2_amd64.deb ...
Unpacking libxcb-glx0:amd64 (1.14-2) ...
Selecting previously unselected package libxfixes3:amd64.
Preparing to unpack .../059-libxfixes3_1%3a5.0.3-2_amd64.deb 

Selecting previously unselected package libxcb-randr0:amd64.
Preparing to unpack .../099-libxcb-randr0_1.14-2_amd64.deb ...
Unpacking libxcb-randr0:amd64 (1.14-2) ...
Selecting previously unselected package libxslt1.1:amd64.
Preparing to unpack .../100-libxslt1.1_1.1.34-4ubuntu0.20.04.1_amd64.deb ...
Unpacking libxslt1.1:amd64 (1.1.34-4ubuntu0.20.04.1) ...
Selecting previously unselected package mesa-vulkan-drivers:amd64.
Preparing to unpack .../101-mesa-vulkan-drivers_21.2.6-0ubuntu0.1~20.04.2_amd64.deb ...
Unpacking mesa-vulkan-drivers:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Selecting previously unselected package python3-soupsieve.
Preparing to unpack .../102-python3-soupsieve_1.9.5+dfsg-1_all.deb ...
Unpacking python3-soupsieve (1.9.5+dfsg-1) ...
Selecting previously unselected package python3-bs4.
Preparing to unpack .../103-python3-bs4_4.8.2-1_all.deb ...
Unpacking python3-bs4 (4.8.2-1) ...
Selecting previously unselected package python3-ply.
Preparing to unpack .../104-python3-ply

Setting up libdrm-amdgpu1:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up libxcb-dri3-0:amd64 (1.14-2) ...
Setting up mesa-vulkan-drivers:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libdrm-nouveau2:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up libxcb1-dev:amd64 (1.14-2) ...
Setting up gpg-wks-client (2.2.19-3ubuntu2.2) ...
Setting up libxrender1:amd64 (1:0.9.10-1) ...
Setting up libgbm1:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libdrm-radeon1:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up openssh-client (1:8.2p1-4ubuntu0.9) ...
Setting up libdrm-intel1:amd64 (2.4.107-8ubuntu1~20.04.2) ...
Setting up libgl1-mesa-dri:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libx11-dev:amd64 (2:1.6.9-2ubuntu1.5) ...
Setting up libxext6:amd64 (2:1.3.4-0ubuntu1) ...
Setting up libcairo2:amd64 (1.16.0-4ubuntu1) ...
Setting up libxxf86vm1:amd64 (1:1.1.4-1build1) ...
Setting up libegl-mesa0:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Setting up libxfixes3:amd64 (1:5.0.3-2) ...
Setting up libgd

  Downloading portalocker-2.7.0-py2.py3-none-any.whl (15 kB)
Collecting regex (from sacrebleu<=2.2.0->object-detection==0.1)
  Obtaining dependency information for regex from https://files.pythonhosted.org/packages/1f/5c/374ac3fa3c7ed9a967ad273a5e841897ef6b10aa6aad938ff10717a3e2a3/regex-2023.8.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading regex-2023.8.8-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 kB 7.1 MB/s eta 0:00:00
Collecting tabulate>=0.8.9 (from sacrebleu<=2.2.0->object-detection==0.1)
  Downloading tabulate-0.9.0-py3-none-any.whl (35 kB)
Collecting colorama (from sacrebleu<=2.2.0->object-detection==0.1)
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Collecting gin-config (from tf-models-official>=2.5.1->object-detection==0.1)
  Downloading gin_config-0.5.0-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.3/61.3 kB 12.9 MB/

Collecting fastavro<2,>=0.23.6 (from apache-beam->object-detection==0.1)
  Obtaining dependency information for fastavro<2,>=0.23.6 from https://files.pythonhosted.org/packages/bc/24/a0e07113b1f26e7707f0c0a3923b6e2bfda743ad65da2104e9ef8d985aa6/fastavro-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading fastavro-1.8.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.5 kB)
Collecting fasteners<1.0,>=0.3 (from apache-beam->object-detection==0.1)
  Downloading fasteners-0.18-py3-none-any.whl (18 kB)
Collecting hdfs<3.0.0,>=2.1.0 (from apache-beam->object-detection==0.1)
  Downloading hdfs-2.7.2.tar.gz (43 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.4/43.4 kB 6.3 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting httplib2<0.23.0,>=0.8 (from apache-beam->object-detection==0.1)
  Downloading httplib2-0.22.0-py3-none-any.whl (96 kB)
     ━━━━━━━━━━━━━━━

Collecting python-slugify (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading python_slugify-8.0.1-py2.py3-none-any.whl (9.7 kB)
Collecting bleach (from kaggle>=1.3.9->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading bleach-6.0.0-py3-none-any.whl (162 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 162.5/162.5 kB 19.3 MB/s eta 0:00:00
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo<5.0.0,>=3.8.0->apache-beam->object-detection==0.1)
  Obtaining dependency information for dnspython<3.0.0,>=1.16.0 from https://files.pythonhosted.org/packages/f6/b4/0a9bee52c50f226a3cbfb54263d02bb421c7f2adc136520729c2c689c1e5/dnspython-2.4.2-py3-none-any.whl.metadata
  Downloading dnspython-2.4.2-py3-none-any.whl.metadata (4.9 kB)
Collecting charset-normalizer<4,>=2 (from requests<3.0.0,>=2.24.0->apache-beam->object-detection==0.1)
  Obtaining dependency information for charset-normalizer<4,>=2 from https://files.pythonhosted.org/packages/cb/e7/5e43745003bf1

Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0.dev0,>=1.31.5->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Obtaining dependency information for googleapis-common-protos<2.0.dev0,>=1.56.2 from https://files.pythonhosted.org/packages/a7/bc/416a1ffeba4dcd072bc10523dac9ed97f2e7fc4b760580e2bdbdc1e2afdd/googleapis_common_protos-1.60.0-py2.py3-none-any.whl.metadata
  Downloading googleapis_common_protos-1.60.0-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting google-auth<3.0.0.dev0,>=1.19.0 (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Obtaining dependency information for google-auth<3.0.0.dev0,>=1.19.0 from https://files.pythonhosted.org/packages/9c/8d/bff87fc722553a5691d8514da5523c23547f3894189ba03b57592e37bdc2/google_auth-2.22.0-py2.py3-none-any.whl.metadata
  Downloading google_auth-2.22.0-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting 

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 48.4 MB/s eta 0:00:00
Downloading tensorflow_hub-0.14.0-py2.py3-none-any.whl (90 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90.3/90.3 kB 18.6 MB/s eta 0:00:00
Downloading tensorflow_model_optimization-0.7.5-py2.py3-none-any.whl (241 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 241.2/241.2 kB 34.8 MB/s eta 0:00:00
Downloading tensorflow_text-2.13.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 52.8 MB/s eta 0:00:00
Downloading immutabledict-3.0.0-py3-none-any.whl (4.0 kB)
Downloading opencv_python_headless-4.8.0.76-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (49.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.1/49.1 MB 27.0 MB/s eta 0:00:00
Downloading charset_normalizer-3.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (199 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.1/199.1 kB 33.9 MB/s eta 0:00:00
Downloading 

  Attempting uninstall: flatbuffers
    Found existing installation: flatbuffers 1.12
    Uninstalling flatbuffers-1.12:
      Successfully uninstalled flatbuffers-1.12
  Attempting uninstall: tensorflow-io-gcs-filesystem
    Found existing installation: tensorflow-io-gcs-filesystem 0.25.0
    Uninstalling tensorflow-io-gcs-filesystem-0.25.0:
      Successfully uninstalled tensorflow-io-gcs-filesystem-0.25.0
  Attempting uninstall: tensorflow-estimator
    Found existing installation: tensorflow-estimator 2.9.0
    Uninstalling tensorflow-estimator-2.9.0:
      Successfully uninstalled tensorflow-estimator-2.9.0
  Attempting uninstall: tensorboard-data-server
    Found existing installation: tensorboard-data-server 0.6.1
    Uninstalling tensorboard-data-server-0.6.1:
      Successfully uninstalled tensorboard-data-server-0.6.1
  Attempting uninstall: pyparsing
    Found existing installation: pyparsing 3.0.9
    Uninstalling pyparsing-3.0.9:
      Successfully uninstalled pyparsing-3.

Collecting jmespath<2.0.0,>=0.7.1 (from boto3->sagemaker-training)
  Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.7.0,>=0.6.0 (from boto3->sagemaker-training)
  Obtaining dependency information for s3transfer<0.7.0,>=0.6.0 from https://files.pythonhosted.org/packages/d9/17/a3b666f5ef9543cfd3c661d39d1e193abb9649d0cfbbfee3cf3b51d5af02/s3transfer-0.6.2-py3-none-any.whl.metadata
  Downloading s3transfer-0.6.2-py3-none-any.whl.metadata (1.8 kB)
Collecting zope.event (from gevent->sagemaker-training)
  Obtaining dependency information for zope.event from https://files.pythonhosted.org/packages/fe/42/f8dbc2b9ad59e927940325a22d6d3931d630c3644dae7e2369ef5d9ba230/zope.event-5.0-py3-none-any.whl.metadata
  Downloading zope.event-5.0-py3-none-any.whl.metadata (4.4 kB)
Collecting zope.interface (from gevent->sagemaker-training)
  Downloading zope.interface-6.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (249 kB)
    

[24B4d250ed: Pushing  914.1MB/3.458GB[24A[2K[23A[2K[26A[2K[23A[2K[25A[2K[23A[2K[24A[2K[25A[2K[23A[2K[24A[2K[26A[2K[23A[2K[22A[2K[26A[2K[24A[2K[25A[2K[23A[2K[25A[2K[24A[2K[26A[2K[25A[2K[21A[2K[24A[2K[23A[2K[24A[2K[25A[2K[23A[2K[26A[2K[23A[2K[26A[2K[23A[2K[23A[2K[26A[2K[25A[2K[26A[2K[25A[2K[24A[2K[25A[2K[26A[2K[23A[2K[26A[2K[25A[2K[26A[2K[24A[2K[26A[2K[24A[2K[26A[2K[25A[2K[23A[2K[25A[2K[24A[2K[26A[2K[23A[2K[24A[2K[26A[2K[24A[2K[25A[2K[23A[2K[26A[2K[25A[2K[24A[2K[25A[2K[26A[2K[23A[2K[25A[2K[23A[2K[23A[2K[25A[2K[23A[2K[24A[2K[25A[2K[23A[2K[24A[2K[20A[2K[25A[2K[20A[2K[23A[2K[24A[2K[23A[2K[20A[2K[25A[2K[26A[2K[20A[2K[26A[2K[20A[2K[26A[2K[25A[2K[24A[2K[26A[2K[20A[2K[26A[2K[24A[2K[20A[2K[26A[2K[20A[2K[24A[2K[25A[2K[23A[2K[20A[2K[25A[2K[20A[2K[24A[2K[20A[2K[25A[2K[24A[2K[20A[2K[26A[2

To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [7]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

789453636658.dkr.ecr.us-east-1.amazonaws.com/tf2-object-detection:20230817200316


## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be ajusted if you were to experiment with other architectures.

In [8]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint
wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

--2023-08-17 20:16:51--  http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 142.251.167.128, 2607:f8b0:4004:c1b::80
Connecting to download.tensorflow.org (download.tensorflow.org)|142.251.167.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51839363 (49M) [application/x-tar]
Saving to: ‘/tmp/efficientdet.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 1.45M 34s
    50K .......... .......... .......... .......... ..........  0% 3.04M 25s
   100K .......... .......... .......... .......... ..........  0% 3.12M 22s
   150K .......... .......... .......... .......... ..........  0% 3.12M 20s
   200K .......... .......... .......... .......... ..........  0% 9.36M 17s
   250K .......... .......... .......... .......... ..........  0% 9.86M 15s
   300K .......... .......... .......... .......... ..........  0% 12.5M 14s
 

  5100K .......... .......... .......... .......... .......... 10%  186M 2s
  5150K .......... .......... .......... .......... .......... 10%  231M 1s
  5200K .......... .......... .......... .......... .......... 10%  347M 1s
  5250K .......... .......... .......... .......... .......... 10%  222M 1s
  5300K .......... .......... .......... .......... .......... 10%  229M 1s
  5350K .......... .......... .......... .......... .......... 10%  215M 1s
  5400K .......... .......... .......... .......... .......... 10%  223M 1s
  5450K .......... .......... .......... .......... .......... 10%  215M 1s
  5500K .......... .......... .......... .......... .......... 10%  203M 1s
  5550K .......... .......... .......... .......... .......... 11%  288M 1s
  5600K .......... .......... .......... .......... .......... 11%  296M 1s
  5650K .......... .......... .......... .......... .......... 11%  187M 1s
  5700K .......... .......... .......... .......... .......... 11%  298M 1s
  5750K ....

 10500K .......... .......... .......... .......... .......... 20%  348M 1s
 10550K .......... .......... .......... .......... .......... 20%  349M 1s
 10600K .......... .......... .......... .......... .......... 21% 15.4M 1s
 10650K .......... .......... .......... .......... .......... 21%  317M 1s
 10700K .......... .......... .......... .......... .......... 21%  353M 1s
 10750K .......... .......... .......... .......... .......... 21%  202M 1s
 10800K .......... .......... .......... .......... .......... 21%  187M 1s
 10850K .......... .......... .......... .......... .......... 21%  295M 1s
 10900K .......... .......... .......... .......... .......... 21%  318M 1s
 10950K .......... .......... .......... .......... .......... 21%  349M 1s
 11000K .......... .......... .......... .......... .......... 21% 23.5M 1s
 11050K .......... .......... .......... .......... .......... 21% 3.51M 1s
 11100K .......... .......... .......... .......... .......... 22%  262M 1s
 11150K ....

 15900K .......... .......... .......... .......... .......... 31%  356M 1s
 15950K .......... .......... .......... .......... .......... 31%  165M 1s
 16000K .......... .......... .......... .......... .......... 31%  320M 1s
 16050K .......... .......... .......... .......... .......... 31%  299M 1s
 16100K .......... .......... .......... .......... .......... 31%  276M 1s
 16150K .......... .......... .......... .......... .......... 32%  277M 1s
 16200K .......... .......... .......... .......... .......... 32%  299M 1s
 16250K .......... .......... .......... .......... .......... 32%  304M 1s
 16300K .......... .......... .......... .......... .......... 32%  257M 1s
 16350K .......... .......... .......... .......... .......... 32%  290M 1s
 16400K .......... .......... .......... .......... .......... 32%  297M 1s
 16450K .......... .......... .......... .......... .......... 32%  357M 1s
 16500K .......... .......... .......... .......... .......... 32%  299M 1s
 16550K ....

 21300K .......... .......... .......... .......... .......... 42%  317M 1s
 21350K .......... .......... .......... .......... .......... 42% 72.4M 1s
 21400K .......... .......... .......... .......... .......... 42%  266M 1s
 21450K .......... .......... .......... .......... .......... 42%  324M 1s
 21500K .......... .......... .......... .......... .......... 42% 58.0M 1s
 21550K .......... .......... .......... .......... .......... 42%  328M 1s
 21600K .......... .......... .......... .......... .......... 42% 61.6M 1s
 21650K .......... .......... .......... .......... .......... 42%  206M 1s
 21700K .......... .......... .......... .......... .......... 42%  327M 1s
 21750K .......... .......... .......... .......... .......... 43% 37.4M 1s
 21800K .......... .......... .......... .......... .......... 43%  288M 1s
 21850K .......... .......... .......... .......... .......... 43% 47.7M 1s
 21900K .......... .......... .......... .......... .......... 43%  321M 1s
 21950K ....

 26700K .......... .......... .......... .......... .......... 52%  311M 1s
 26750K .......... .......... .......... .......... .......... 52%  202M 1s
 26800K .......... .......... .......... .......... .......... 53%  214M 1s
 26850K .......... .......... .......... .......... .......... 53%  168M 1s
 26900K .......... .......... .......... .......... .......... 53%  319M 1s
 26950K .......... .......... .......... .......... .......... 53%  227M 1s
 27000K .......... .......... .......... .......... .......... 53% 84.6M 1s
 27050K .......... .......... .......... .......... .......... 53%  123M 1s
 27100K .......... .......... .......... .......... .......... 53%  321M 1s
 27150K .......... .......... .......... .......... .......... 53%  285M 1s
 27200K .......... .......... .......... .......... .......... 53%  355M 1s
 27250K .......... .......... .......... .......... .......... 53%  301M 1s
 27300K .......... .......... .......... .......... .......... 54%  188M 1s
 27350K ....

 32100K .......... .......... .......... .......... .......... 63%  285M 0s
 32150K .......... .......... .......... .......... .......... 63%  269M 0s
 32200K .......... .......... .......... .......... .......... 63%  351M 0s
 32250K .......... .......... .......... .......... .......... 63% 73.3M 0s
 32300K .......... .......... .......... .......... .......... 63%  241M 0s
 32350K .......... .......... .......... .......... .......... 64% 5.93M 0s
 32400K .......... .......... .......... .......... .......... 64%  316M 0s
 32450K .......... .......... .......... .......... .......... 64%  334M 0s
 32500K .......... .......... .......... .......... .......... 64%  347M 0s
 32550K .......... .......... .......... .......... .......... 64%  313M 0s
 32600K .......... .......... .......... .......... .......... 64%  331M 0s
 32650K .......... .......... .......... .......... .......... 64%  218M 0s
 32700K .......... .......... .......... .......... .......... 64%  335M 0s
 32750K ....

 37500K .......... .......... .......... .......... .......... 74%  359M 0s
 37550K .......... .......... .......... .......... .......... 74%  275M 0s
 37600K .......... .......... .......... .......... .......... 74%  362M 0s
 37650K .......... .......... .......... .......... .......... 74%  349M 0s
 37700K .......... .......... .......... .......... .......... 74%  317M 0s
 37750K .......... .......... .......... .......... .......... 74%  310M 0s
 37800K .......... .......... .......... .......... .......... 74%  293M 0s
 37850K .......... .......... .......... .......... .......... 74%  350M 0s
 37900K .......... .......... .......... .......... .......... 74%  358M 0s
 37950K .......... .......... .......... .......... .......... 75%  300M 0s
 38000K .......... .......... .......... .......... .......... 75%  332M 0s
 38050K .......... .......... .......... .......... .......... 75%  129M 0s
 38100K .......... .......... .......... .......... .......... 75% 4.67M 0s
 38150K ....

 42900K .......... .......... .......... .......... .......... 84%  204M 0s
 42950K .......... .......... .......... .......... .......... 84%  256M 0s
 43000K .......... .......... .......... .......... .......... 85% 81.0M 0s
 43050K .......... .......... .......... .......... .......... 85%  239M 0s
 43100K .......... .......... .......... .......... .......... 85%  193M 0s
 43150K .......... .......... .......... .......... .......... 85%  144M 0s
 43200K .......... .......... .......... .......... .......... 85%  251M 0s
 43250K .......... .......... .......... .......... .......... 85%  137M 0s
 43300K .......... .......... .......... .......... .......... 85%  217M 0s
 43350K .......... .......... .......... .......... .......... 85%  219M 0s
 43400K .......... .......... .......... .......... .......... 85% 71.7M 0s
 43450K .......... .......... .......... .......... .......... 85%  248M 0s
 43500K .......... .......... .......... .......... .......... 86%  228M 0s
 43550K ....

 48300K .......... .......... .......... .......... .......... 95%  219M 0s
 48350K .......... .......... .......... .......... .......... 95%  304M 0s
 48400K .......... .......... .......... .......... .......... 95%  361M 0s
 48450K .......... .......... .......... .......... .......... 95%  230M 0s
 48500K .......... .......... .......... .......... .......... 95%  297M 0s
 48550K .......... .......... .......... .......... .......... 96%  148M 0s
 48600K .......... .......... .......... .......... .......... 96%  196M 0s
 48650K .......... .......... .......... .......... .......... 96%  208M 0s
 48700K .......... .......... .......... .......... .......... 96%  176M 0s
 48750K .......... .......... .......... .......... .......... 96%  143M 0s
 48800K .......... .......... .......... .......... .......... 96%  110M 0s
 48850K .......... .......... .......... .......... .......... 96%  169M 0s
 48900K .......... .......... .......... .......... .......... 96%  286M 0s
 48950K ....

efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.data-00000-of-00001
efficientdet_d1_coco17_tpu-32/checkpoint/checkpoint
efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.index


## Edit pipeline.config file

The [`pipeline.config`](source_dir/pipeline.config) in the `source_dir` folder should be updated when you experiment with different models. The different config files are available [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).

>Note: The provided `pipeline.config` file works well with the `EfficientDet` model. You would need to modify it when working with other models.

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [10]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir":"/opt/training",        
        "pipeline_config_path": "pipeline.config",
        "num_train_steps": "2000",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.m5.2xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


Using provided s3_resource


INFO:sagemaker:Creating training-job with name: tf2-object-detection-2023-08-17-20-30-31-039


2023-08-17 20:30:33 Starting - Starting the training job...
2023-08-17 20:30:50 Starting - Preparing the instances for training......
2023-08-17 20:31:59 Downloading - Downloading input data...
2023-08-17 20:32:29 Training - Downloading the training image.........
2023-08-17 20:34:00 Training - Training image download completed. Training in progress....[34m2023-08-17 20:34:29,078 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-08-17 20:34:29,081 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-08-17 20:34:29,094 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-08-17 20:34:29,096 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-08-17 20:34:29,110 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-08-17 20:34:29,112 sagemaker-training-toolk

[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mW0817 20:34:43.572549 139674134546240 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mCreate a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mW0817 20:34:46.877819 139674134546240 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mI0817 20:34:54.379162 139669564020480 api.py:460] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (

[34mINFO:tensorflow:Step 300 per-step time 7.014s[0m
[34mI0817 21:11:33.905983 139674134546240 model_lib_v2.py:705] Step 300 per-step time 7.014s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.28107458,
 'Loss/localization_loss': 0.025791163,
 'Loss/regularization_loss': 0.029547118,
 'Loss/total_loss': 0.33641288,
 'learning_rate': 0.010480001}[0m
[34mI0817 21:11:33.906237 139674134546240 model_lib_v2.py:708] {'Loss/classification_loss': 0.28107458,
 'Loss/localization_loss': 0.025791163,
 'Loss/regularization_loss': 0.029547118,
 'Loss/total_loss': 0.33641288,
 'learning_rate': 0.010480001}[0m
[34mINFO:tensorflow:Step 400 per-step time 6.998s[0m
[34mI0817 21:23:13.689984 139674134546240 model_lib_v2.py:705] Step 400 per-step time 6.998s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.30356172,
 'Loss/localization_loss': 0.016631898,
 'Loss/regularization_loss': 0.029553022,
 'Loss/total_loss': 0.34974664,
 'learning_rate': 0.0136400005}[0m
[34mI0817 21:23:

[34mINFO:tensorflow:Step 1700 per-step time 7.015s[0m
[34mI0817 23:54:46.943836 139674134546240 model_lib_v2.py:705] Step 1700 per-step time 7.015s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.2497263,
 'Loss/localization_loss': 0.014288208,
 'Loss/regularization_loss': 0.030242968,
 'Loss/total_loss': 0.2942575,
 'learning_rate': 0.05472}[0m
[34mI0817 23:54:46.944103 139674134546240 model_lib_v2.py:708] {'Loss/classification_loss': 0.2497263,
 'Loss/localization_loss': 0.014288208,
 'Loss/regularization_loss': 0.030242968,
 'Loss/total_loss': 0.2942575,
 'learning_rate': 0.05472}[0m
[34mINFO:tensorflow:Step 1800 per-step time 7.010s[0m
[34mI0818 00:06:27.899603 139674134546240 model_lib_v2.py:705] Step 1800 per-step time 7.010s[0m
[34mINFO:tensorflow:{'Loss/classification_loss': 0.35540766,
 'Loss/localization_loss': 0.022979518,
 'Loss/regularization_loss': 0.03034816,
 'Loss/total_loss': 0.40873533,
 'learning_rate': 0.05788}[0m
[34mI0818 00:06:27.899876 1396

[34mI0818 00:30:16.495590 140436034864960 api.py:460] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m
[34mI0818 00:30:29.766751 140436034864960 api.py:460] feature_map_spatial_dims: [(80, 80), (40, 40), (20, 20), (10, 10), (5, 5)][0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mW0818 00:30:35.397368 140436034864960 deprecation.py:364] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.[0m
[34mInstructions for updating:[0m
[34mUse `tf.cast` instead.[0m
[34mINFO:tensorflow:Finished eval step 0[0m
[34mI0818 00:30:35.414900 140436034864960 model_lib_v2.py:966] Finished eval step 0[0m
[34mInstructions for updating:[0m
[34mtf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    ten

[34mI0818 00:30:40.528520 140226765850432 efficientnet_model.py:143] round_filter input=40 output=40[0m
[34mI0818 00:30:40.528690 140226765850432 efficientnet_model.py:143] round_filter input=80 output=80[0m
[34mI0818 00:30:40.879019 140226765850432 efficientnet_model.py:143] round_filter input=80 output=80[0m
[34mI0818 00:30:40.879152 140226765850432 efficientnet_model.py:143] round_filter input=112 output=112[0m
[34mI0818 00:30:41.222218 140226765850432 efficientnet_model.py:143] round_filter input=112 output=112[0m
[34mI0818 00:30:41.222360 140226765850432 efficientnet_model.py:143] round_filter input=192 output=192[0m
[34mI0818 00:30:41.656105 140226765850432 efficientnet_model.py:143] round_filter input=192 output=192[0m
[34mI0818 00:30:41.656241 140226765850432 efficientnet_model.py:143] round_filter input=320 output=320[0m
[34mI0818 00:30:41.839905 140226765850432 efficientnet_model.py:143] round_filter input=1280 output=1280[0m
[34mI0818 00:30:41.886730 14022

You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)
