# Tensorflow Object Detection API and AWS Sagemaker

In this notebook, you will train and evaluate different models using the [Tensorflow Object Detection API](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and [AWS Sagemaker](https://aws.amazon.com/sagemaker/). 

If you ever feel stuck, you can refer to this [tutorial](https://aws.amazon.com/blogs/machine-learning/training-and-deploying-models-using-tensorflow-2-with-the-object-detection-api-on-amazon-sagemaker/).

## Dataset

We are using the [Waymo Open Dataset](https://waymo.com/open/) for this project. The dataset has already been exported using the tfrecords format. The files have been created following the format described [here](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). You can find data stored on [AWS S3](https://aws.amazon.com/s3/), AWS Object Storage. The images are saved with a resolution of 640x640.

In [1]:
%%capture
%pip install tensorflow_io sagemaker -U

In [2]:
import os
import sagemaker
from sagemaker.estimator import Estimator
from framework import CustomFramework

Save the IAM role in a variable called `role`. This would be useful when training the model.

In [3]:
role = sagemaker.get_execution_role()
print(role)

arn:aws:iam::406551161089:role/service-role/AmazonSageMaker-ExecutionRole-20230623T150985


In [4]:
# The train and val paths below are public S3 buckets created by Udacity for this project
inputs = {'train': 's3://cd2688-object-detection-tf2/train/', 
        'val': 's3://cd2688-object-detection-tf2/val/'} 

# Insert path of a folder in your personal S3 bucket to store tensorboard logs.
tensorboard_s3_prefix = 's3://object-detection-project-subodh/logs/'

## Container

To train the model, you will first need to build a [docker](https://www.docker.com/) container with all the dependencies required by the TF Object Detection API. The code below does the following:
* clone the Tensorflow models repository
* get the exporter and training scripts from the the repository
* build the docker image and push it 
* print the container name

In [5]:
%%bash

# clone the repo and get the scripts
git clone https://github.com/tensorflow/models.git docker/models

# get model_main and exporter_main files from TF2 Object Detection GitHub repository
cp docker/models/research/object_detection/exporter_main_v2.py source_dir 
cp docker/models/research/object_detection/model_main_tf2.py source_dir

Cloning into 'docker/models'...


In [6]:
# build and push the docker image. This code can be commented after being ran once.
# This will take around 10 mins.
image_name = 'tf2-object-detection'
!sh ./docker/build_and_push.sh $image_name

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
Building image with name tf2-object-detection
Sending build context to Docker daemon  727.8MB
Step 1/17 : FROM tensorflow/tensorflow:2.9.0-gpu
2.9.0-gpu: Pulling from tensorflow/tensorflow

[1B17ec1767: Pulling fs layer 
[1B9ecd2bff: Pulling fs layer 
[1B4ae53552: Pulling fs layer 
[1B2d09b8c4: Pulling fs layer 
[1B0d530989: Pulling fs layer 
[1B81af025b: Pulling fs layer 
[1Bc129f45e: Pulling fs layer 
[1B8fcb70c6: Pulling fs layer 
[1B9aa4a247: Pulling fs layer 
[1B3100c8d1: Pulling fs layer 
[1B3a6b487b: Pulling fs layer 
[1Be8773234: Pulling fs layer 
[1B36c9476c: Pulling fs layer 


[1B1b420cea: Pull complete 087kB/1.087kBB[14A[2K[12A[2K[14A[2K[10A[2K[7A[2K[14A[2K[8A[2K[9A[2K[8A[2K[14A[2K[8A[2K[14A[2K[9A[2K[6A[2K[9A[2K[14A[2K[8A[2K[9A[2K[6A[2K[6A[2K[14A[2K[9A[2K[9A[2K[14A[2K[5A[2K[14A[2K[8A[2K[9A[2K[14A[2K[9A[2K[14A[2K[9A[2K[14A[2K[8A[2K[3A[2K[9A[2K[3A[2K[9A[2K[3A[2K[14A[2K[14A[2K[9A[2K[8A[2K[3A[2K[14A[2K[3A[2K[14A[2K[3A[2K[8A[2K[3A[2K[9A[2K[14A[2K[9A[2K[3A[2K[14A[2K[8A[2K[9A[2K[14A[2K[9A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[8A[2K[3A[2K[8A[2K[9A[2K[8A[2K[3A[2K[9A[2K[8A[2K[3A[2K[8A[2K[9A[2K[8A[2K[9A[2K[8A[2K[14A[2K[3A[2K[3A[2K[9A[2K[3A[2K[8A[2K[14A[2K[9A[2K[14A[2K[8A[2K[3A[2K[9A[2K[9A[2K[8A[2K[8A[2K[9A[2K[3A[2K[14A[2K[13A[2K[9A[2K[13A[2K[9A[2K[13A[2K[3A[2K[13A[2K[3A[2K[13A[2K[3A[2K[9A[2K[3A[2K[13A[2K[9A[2K[3A[2K[9A[2K[3A[2K[13A[2K[9A[2K[9A[

 ---> Running in 2a1d01b8e2e2
Removing intermediate container 2a1d01b8e2e2
 ---> 53a1de98ee40
Step 4/17 : RUN apt-key del 7fa2af80
 ---> Running in 9b2e937cc091
OK
Removing intermediate container 9b2e937cc091
 ---> 69bf85c45110
Step 5/17 : RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
 ---> Running in b19875fb1362
[0mExecuting: /tmp/apt-key-gpghome.VoKJcXarE2/gpg.1.sh --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
[91mgpg: requesting key from 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub'
[0m[91mgpg: key A4B469963BF863CC: "cudatools <cudatools@nvidia.com>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
[0mRemoving intermediate container b19875fb1362
 ---> 1882cceb8bef
Step 6/17 : RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x8

Get:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 gpgv amd64 2.2.19-3ubuntu2.2 [200 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 less amd64 551-1ubuntu0.1 [123 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal/main amd64 libbsd0 amd64 0.10.0-1 [45.4 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal/main amd64 python3-cffi-backend amd64 1.14.0-1build1 [68.7 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal/main amd64 shared-mime-info amd64 1.15-1 [430 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal/main amd64 ucf all 3.0038+nmu1 [51.6 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal/main amd64 libcbor0.6 amd64 0.6.0-0ubuntu1 [21.1 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libdrm-common all 2.4.107-8ubuntu1~20.04.2 [5396 B]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libdrm2 amd64 2.4.107-8ubuntu1~20.04.2 [34.1 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal/main amd64 libedit2 amd64 3.1-20191231-

Get:96 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgdk-pixbuf2.0-common all 2.40.0+dfsg-3ubuntu0.4 [4592 B]
Get:97 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgdk-pixbuf2.0-0 amd64 2.40.0+dfsg-3ubuntu0.4 [168 kB]
Get:98 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgdk-pixbuf2.0-bin amd64 2.40.0+dfsg-3ubuntu0.4 [14.1 kB]
Get:99 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgles1 amd64 1.3.2-1~ubuntu0.20.04.2 [10.3 kB]
Get:100 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgles2 amd64 1.3.2-1~ubuntu0.20.04.2 [15.6 kB]
Get:101 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libgles-dev amd64 1.3.2-1~ubuntu0.20.04.2 [47.9 kB]
Get:102 http://archive.ubuntu.com/ubuntu focal/main amd64 libimagequant0 amd64 2.12.2-1.1 [31.4 kB]
Get:103 http://archive.ubuntu.com/ubuntu focal/main amd64 liblcms2-2 amd64 2.9-4 [140 kB]
Get:104 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libopengl0 amd64 1.3.2-1~

Selecting previously unselected package libxext6:amd64.
Preparing to unpack .../015-libxext6_2%3a1.3.4-0ubuntu1_amd64.deb ...
Unpacking libxext6:amd64 (2:1.3.4-0ubuntu1) ...
Selecting previously unselected package libxmuu1:amd64.
Preparing to unpack .../016-libxmuu1_2%3a1.1.3-0ubuntu1_amd64.deb ...
Unpacking libxmuu1:amd64 (2:1.1.3-0ubuntu1) ...
Selecting previously unselected package openssh-client.
Preparing to unpack .../017-openssh-client_1%3a8.2p1-4ubuntu0.7_amd64.deb ...
Unpacking openssh-client (1:8.2p1-4ubuntu0.7) ...
Selecting previously unselected package wget.
Preparing to unpack .../018-wget_1.20.3-1ubuntu2_amd64.deb ...
Unpacking wget (1.20.3-1ubuntu2) ...
Selecting previously unselected package xauth.
Preparing to unpack .../019-xauth_1%3a1.1-0ubuntu1_amd64.deb ...
Unpacking xauth (1:1.1-0ubuntu1) ...
Selecting previously unselected package libtcl8.6:amd64.
Preparing to unpack .../020-libtcl8.6_8.6.10+dfsg-1_amd64.deb ...
Unpacking libtcl8.6:amd64 (8.6.10+dfsg-1) ...
Sele

Selecting previously unselected package libsensors-config.
Preparing to unpack .../062-libsensors-config_1%3a3.6.0-2ubuntu1.1_all.deb ...
Unpacking libsensors-config (1:3.6.0-2ubuntu1.1) ...
Selecting previously unselected package libsensors5:amd64.
Preparing to unpack .../063-libsensors5_1%3a3.6.0-2ubuntu1.1_amd64.deb ...
Unpacking libsensors5:amd64 (1:3.6.0-2ubuntu1.1) ...
Selecting previously unselected package libvulkan1:amd64.
Preparing to unpack .../064-libvulkan1_1.2.131.2-1_amd64.deb ...
Unpacking libvulkan1:amd64 (1.2.131.2-1) ...
Selecting previously unselected package libgl1-mesa-dri:amd64.
Preparing to unpack .../065-libgl1-mesa-dri_21.2.6-0ubuntu0.1~20.04.2_amd64.deb ...
Unpacking libgl1-mesa-dri:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Selecting previously unselected package libglx-mesa0:amd64.
Preparing to unpack .../066-libglx-mesa0_21.2.6-0ubuntu0.1~20.04.2_amd64.deb ...
Unpacking libglx-mesa0:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Selecting previously unselected package l

Selecting previously unselected package python3-cffi.
Preparing to unpack .../106-python3-cffi_1.14.0-1build1_all.deb ...
Unpacking python3-cffi (1.14.0-1build1) ...
Selecting previously unselected package python3-xcffib.
Preparing to unpack .../107-python3-xcffib_0.8.1-0.8_amd64.deb ...
Unpacking python3-xcffib (0.8.1-0.8) ...
Selecting previously unselected package python3-cairocffi.
Preparing to unpack .../108-python3-cairocffi_0.9.0-4_all.deb ...
Unpacking python3-cairocffi (0.9.0-4) ...
Selecting previously unselected package python3-webencodings.
Preparing to unpack .../109-python3-webencodings_0.5.1-1ubuntu1_all.deb ...
Unpacking python3-webencodings (0.5.1-1ubuntu1) ...
Selecting previously unselected package python3-html5lib.
Preparing to unpack .../110-python3-html5lib_1.0.1-2_all.deb ...
Unpacking python3-html5lib (1.0.1-2) ...
Selecting previously unselected package python3-lxml:amd64.
Preparing to unpack .../111-python3-lxml_4.5.0-1ubuntu0.5_amd64.deb ...
Unpacking python3

Setting up libglx0:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up libtk8.6:amd64 (8.6.10-1) ...
Setting up libgl1:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up libglx-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up libgl-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up tk8.6-blt2.5 (2.5.3+dfsg-4) ...
Setting up libegl-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up blt (2.5.3+dfsg-4) ...
Setting up python3-tk:amd64 (3.8.10-0ubuntu1~20.04) ...
Setting up libgles-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up libglvnd-dev:amd64 (1.3.2-1~ubuntu0.20.04.2) ...
Setting up libgl1-mesa-dev:amd64 (21.2.6-0ubuntu0.1~20.04.2) ...
Processing triggers for libc-bin (2.31-0ubuntu9.7) ...
Processing triggers for systemd (245.4-4ubuntu3.17) ...
Processing triggers for mime-support (3.64ubuntu1) ...
Removing intermediate container 3203cb03af11
 ---> e310e7f36285
Step 8/17 : COPY models/research/object_detection /home/tensorflow/models/research/object_detection
 ---> ed46132658be
Step 9/17 :

Collecting pyyaml<6.0,>=5.1 (from tf-models-official>=2.5.1->object-detection==0.1)
  Downloading PyYAML-5.4.1-cp38-cp38-manylinux1_x86_64.whl (662 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 662.4/662.4 kB 85.6 MB/s eta 0:00:00
Collecting sentencepiece (from tf-models-official>=2.5.1->object-detection==0.1)
  Downloading sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 104.3 MB/s eta 0:00:00
Collecting seqeval (from tf-models-official>=2.5.1->object-detection==0.1)
  Downloading seqeval-1.2.2.tar.gz (43 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.6/43.6 kB 13.0 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting tensorflow-addons (from tf-models-official>=2.5.1->object-detection==0.1)
  Downloading tensorflow_addons-0.20.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (591 kB)
     ━━━━━━━━━

     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 24.9 MB/s eta 0:00:00
Collecting google-auth-httplib2>=0.1.0 (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0.dev0,>=1.31.5 (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading google_api_core-2.11.1-py3-none-any.whl (120 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 120.5/120.5 kB 26.3 MB/s eta 0:00:00
Collecting uritemplate<5,>=3.0.1 (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading uritemplate-4.1.1-py2.py3-none-any.whl (10 kB)
Collecting docopt (from hdfs<3.0.0,>=2.1.0->apache-beam->object-detection==0.1)
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Col

  Preparing metadata (setup.py): finished with status 'done'
Collecting tensorflow-metadata (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading tensorflow_metadata-1.13.1-py3-none-any.whl (28 kB)
Collecting toml (from tensorflow-datasets->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting googleapis-common-protos<2.0.dev0,>=1.56.2 (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0.dev0,>=1.31.5->google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading googleapis_common_protos-1.59.1-py2.py3-none-any.whl (224 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.5/224.5 kB 4.8 MB/s eta 0:00:00
Collecting google-auth<3.0.0.dev0,>=1.19.0 (from google-api-python-client>=1.6.7->tf-models-official>=2.5.1->object-detection==0.1)
  Downloading google_auth-2.20.0-py2.py3-none-any.whl (181 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.5/181

Installing collected packages: text-unidecode, sentencepiece, pytz, py-cpuinfo, gin-config, flatbuffers, docopt, dm-tree, crcmod, zstandard, uritemplate, tzdata, typeguard, tqdm, toml, threadpoolctl, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, tabulate, regex, pyyaml, python-slugify, python-dateutil, pyparsing, psutil, protobuf, promise, portalocker, orjson, objsize, numpy, kiwisolver, keras, joblib, importlib-resources, immutabledict, grpcio, fonttools, fasteners, fastavro, etils, dnspython, dill, Cython, cycler, contextlib2, colorama, cloudpickle, click, charset-normalizer, avro-python3, absl-py, tf-slim, tensorflow-model-optimization, tensorflow_io, tensorflow-hub, scipy, sacrebleu, requests, pymongo, pydot, pyarrow, proto-plus, pandas, opencv-python-headless, opencv-python, ml-dtypes, httplib2, googleapis-common-protos, google-auth, contourpy, tensorflow-metadata, tensorflow-addons, scikit-learn, oauth2client, matplotlib, kaggle, jax, hdfs, google-a

Collecting bcrypt>=3.2 (from paramiko>=2.4.2->sagemaker-training)
  Downloading bcrypt-4.0.1-cp36-abi3-manylinux_2_28_x86_64.whl (593 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 593.7/593.7 kB 88.4 MB/s eta 0:00:00
Collecting cryptography>=3.3 (from paramiko>=2.4.2->sagemaker-training)
  Downloading cryptography-41.0.1-cp37-abi3-manylinux_2_28_x86_64.whl (4.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 120.9 MB/s eta 0:00:00
Collecting pynacl>=1.5 (from paramiko>=2.4.2->sagemaker-training)
  Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 856.7/856.7 kB 103.7 MB/s eta 0:00:00
Collecting botocore<1.30.0,>=1.29.160 (from boto3->sagemaker-training)
  Downloading botocore-1.29.160-py3-none-any.whl (10.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.9/10.9 MB 111.9 MB/s eta 0:00:00
Collecting jmespath<2.0.0,>=0.7.1 (from boto3->sagemaker-training)
  Download

[24Bf34b3fc: Pushing  1.341GB/3.605GB[24A[2K[23A[2K[25A[2K[23A[2K[26A[2K[25A[2K[24A[2K[25A[2K[23A[2K[24A[2K[25A[2K[26A[2K[24A[2K[22A[2K[25A[2K[23A[2K[26A[2K[25A[2K[24A[2K[25A[2K[23A[2K[21A[2K[25A[2K[24A[2K[25A[2K[26A[2K[25A[2K[21A[2K[24A[2K[21A[2K[26A[2K[23A[2K[26A[2K[23A[2K[26A[2K[23A[2K[26A[2K[24A[2K[23A[2K[24A[2K[24A[2K[25A[2K[26A[2K[24A[2K[25A[2K[24A[2K[25A[2K[24A[2K[26A[2K[23A[2K[25A[2K[26A[2K[24A[2K[26A[2K[24A[2K[23A[2K[26A[2K[24A[2K[26A[2K[24A[2K[23A[2K[26A[2K[20A[2K[23A[2K[24A[2K[23A[2K[24A[2K[23A[2K[26A[2K[24A[2K[23A[2K[24A[2K[26A[2K[24A[2K[26A[2K[25A[2K[24A[2K[20A[2K[23A[2K[24A[2K[26A[2K[24A[2K[25A[2K[26A[2K[20A[2K[26A[2K[24A[2K[23A[2K[20A[2K[23A[2K[24A[2K[20A[2K[26A[2K[25A[2K[24A[2K[25A[2K[25A[2K[23A[2K[24A[2K[23A[2K[20A[2K[24A[2K[25A[2K[26A[2K[20A[2K[26A[2K[24A[2

[24Bf34b3fc: Pushing  3.594GB/3.605GB[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2KPushing  1.398GB/3.605GB[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[

[24Bf34b3fc: Pushed   3.641GB/3.605GB[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K[24A[2K20230623221853: digest: sha256:8fef98163032d1766e5cf31bc8bd6b40ecc7638c578e99be393bc6cb803b44ed size: 5782
Saving ECR image URI into ecr_image_fullname.txt


To verify that the image was correctly pushed to the [Elastic Container Registry](https://aws.amazon.com/ecr/), you can look at it in the AWS webapp. For example, below you can see that three different images have been pushed to ECR. You should only see one, called `tf2-object-detection`.
![ECR Example](../data/example_ecr.png)


In [7]:
# display the container name
with open (os.path.join('docker', 'ecr_image_fullname.txt'), 'r') as f:
    container = f.readlines()[0][:-1]

print(container)

406551161089.dkr.ecr.us-east-1.amazonaws.com/tf2-object-detection:20230623221853


## Pre-trained model from model zoo

As often, we are not training from scratch and we will be using a pretrained model from the TF Object Detection model zoo. You can find pretrained checkpoints [here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md). Because your time is limited for this project, we recommend to only experiment with the following models:
* SSD MobileNet V2 FPNLite 640x640	
* SSD ResNet50 V1 FPN 640x640 (RetinaNet50)	
* Faster R-CNN ResNet50 V1 640x640	
* EfficientDet D1 640x640	
* Faster R-CNN ResNet152 V1 640x640	

In the code below, the EfficientDet D1 model is downloaded and extracted. This code should be ajusted if you were to experiment with other architectures.

In [8]:
%%bash
mkdir /tmp/checkpoint
mkdir source_dir/checkpoint
wget -O /tmp/efficientdet.tar.gz http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
tar -zxvf /tmp/efficientdet.tar.gz --strip-components 2 --directory source_dir/checkpoint efficientdet_d1_coco17_tpu-32/checkpoint

--2023-06-23 22:41:09--  http://download.tensorflow.org/models/object_detection/tf2/20200711/efficientdet_d1_coco17_tpu-32.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.253.115.128, 2607:f8b0:4004:c08::80
Connecting to download.tensorflow.org (download.tensorflow.org)|172.253.115.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 51839363 (49M) [application/x-tar]
Saving to: ‘/tmp/efficientdet.tar.gz’

     0K .......... .......... .......... .......... ..........  0% 9.43M 5s
    50K .......... .......... .......... .......... ..........  0% 8.32M 6s
   100K .......... .......... .......... .......... ..........  0% 3.52M 8s
   150K .......... .......... .......... .......... ..........  0% 44.4M 7s
   200K .......... .......... .......... .......... ..........  0%  300M 5s
   250K .......... .......... .......... .......... ..........  0%  284M 4s
   300K .......... .......... .......... .......... ..........  0% 32.3M 4s
   350K 

  5100K .......... .......... .......... .......... .......... 10% 37.7M 1s
  5150K .......... .......... .......... .......... .......... 10% 5.89M 1s
  5200K .......... .......... .......... .......... .......... 10%  302M 1s
  5250K .......... .......... .......... .......... .......... 10%  374M 1s
  5300K .......... .......... .......... .......... .......... 10%  232M 1s
  5350K .......... .......... .......... .......... .......... 10%  371M 1s
  5400K .......... .......... .......... .......... .......... 10%  334M 1s
  5450K .......... .......... .......... .......... .......... 10%  369M 1s
  5500K .......... .......... .......... .......... .......... 10%  332M 1s
  5550K .......... .......... .......... .......... .......... 11%  382M 1s
  5600K .......... .......... .......... .......... .......... 11%  300M 1s
  5650K .......... .......... .......... .......... .......... 11%  345M 1s
  5700K .......... .......... .......... .......... .......... 11%  382M 1s
  5750K ....

 10500K .......... .......... .......... .......... .......... 20%  294M 1s
 10550K .......... .......... .......... .......... .......... 20%  321M 1s
 10600K .......... .......... .......... .......... .......... 21%  337M 1s
 10650K .......... .......... .......... .......... .......... 21%  187M 1s
 10700K .......... .......... .......... .......... .......... 21%  267M 1s
 10750K .......... .......... .......... .......... .......... 21%  233M 1s
 10800K .......... .......... .......... .......... .......... 21%  324M 1s
 10850K .......... .......... .......... .......... .......... 21%  250M 1s
 10900K .......... .......... .......... .......... .......... 21%  251M 1s
 10950K .......... .......... .......... .......... .......... 21%  346M 1s
 11000K .......... .......... .......... .......... .......... 21%  280M 1s
 11050K .......... .......... .......... .......... .......... 21%  359M 1s
 11100K .......... .......... .......... .......... .......... 22%  322M 1s
 11150K ....

 15900K .......... .......... .......... .......... .......... 31%  189M 1s
 15950K .......... .......... .......... .......... .......... 31%  335M 1s
 16000K .......... .......... .......... .......... .......... 31%  329M 1s
 16050K .......... .......... .......... .......... .......... 31%  264M 1s
 16100K .......... .......... .......... .......... .......... 31%  308M 1s
 16150K .......... .......... .......... .......... .......... 32%  333M 1s
 16200K .......... .......... .......... .......... .......... 32%  372M 1s
 16250K .......... .......... .......... .......... .......... 32%  357M 1s
 16300K .......... .......... .......... .......... .......... 32%  284M 1s
 16350K .......... .......... .......... .......... .......... 32% 2.21M 1s
 16400K .......... .......... .......... .......... .......... 32%  289M 1s
 16450K .......... .......... .......... .......... .......... 32%  375M 1s
 16500K .......... .......... .......... .......... .......... 32%  322M 1s
 16550K ....

 21300K .......... .......... .......... .......... .......... 42%  329M 1s
 21350K .......... .......... .......... .......... .......... 42% 7.67M 1s
 21400K .......... .......... .......... .......... .......... 42%  273M 1s
 21450K .......... .......... .......... .......... .......... 42%  349M 1s
 21500K .......... .......... .......... .......... .......... 42%  371M 1s
 21550K .......... .......... .......... .......... .......... 42%  310M 1s
 21600K .......... .......... .......... .......... .......... 42%  342M 1s
 21650K .......... .......... .......... .......... .......... 42%  376M 1s
 21700K .......... .......... .......... .......... .......... 42%  350M 1s
 21750K .......... .......... .......... .......... .......... 43%  339M 1s
 21800K .......... .......... .......... .......... .......... 43%  334M 1s
 21850K .......... .......... .......... .......... .......... 43%  386M 1s
 21900K .......... .......... .......... .......... .......... 43%  350M 1s
 21950K ....

 26700K .......... .......... .......... .......... .......... 52% 69.9M 1s
 26750K .......... .......... .......... .......... .......... 52%  185M 1s
 26800K .......... .......... .......... .......... .......... 53% 67.5M 1s
 26850K .......... .......... .......... .......... .......... 53%  210M 1s
 26900K .......... .......... .......... .......... .......... 53% 39.5M 1s
 26950K .......... .......... .......... .......... .......... 53%  345M 1s
 27000K .......... .......... .......... .......... .......... 53%  330M 1s
 27050K .......... .......... .......... .......... .......... 53% 64.8M 1s
 27100K .......... .......... .......... .......... .......... 53%  332M 1s
 27150K .......... .......... .......... .......... .......... 53%  353M 1s
 27200K .......... .......... .......... .......... .......... 53% 72.3M 1s
 27250K .......... .......... .......... .......... .......... 53%  370M 1s
 27300K .......... .......... .......... .......... .......... 54% 62.7M 1s
 27350K ....

 32100K .......... .......... .......... .......... .......... 63% 73.0M 0s
 32150K .......... .......... .......... .......... .......... 63%  326M 0s
 32200K .......... .......... .......... .......... .......... 63%  331M 0s
 32250K .......... .......... .......... .......... .......... 63%  373M 0s
 32300K .......... .......... .......... .......... .......... 63% 69.5M 0s
 32350K .......... .......... .......... .......... .......... 64%  371M 0s
 32400K .......... .......... .......... .......... .......... 64%  304M 0s
 32450K .......... .......... .......... .......... .......... 64% 70.3M 0s
 32500K .......... .......... .......... .......... .......... 64%  363M 0s
 32550K .......... .......... .......... .......... .......... 64%  335M 0s
 32600K .......... .......... .......... .......... .......... 64%  324M 0s
 32650K .......... .......... .......... .......... .......... 64% 67.4M 0s
 32700K .......... .......... .......... .......... .......... 64%  298M 0s
 32750K ....

 37500K .......... .......... .......... .......... .......... 74%  326M 0s
 37550K .......... .......... .......... .......... .......... 74%  316M 0s
 37600K .......... .......... .......... .......... .......... 74%  351M 0s
 37650K .......... .......... .......... .......... .......... 74%  340M 0s
 37700K .......... .......... .......... .......... .......... 74%  350M 0s
 37750K .......... .......... .......... .......... .......... 74%  317M 0s
 37800K .......... .......... .......... .......... .......... 74%  327M 0s
 37850K .......... .......... .......... .......... .......... 74%  347M 0s
 37900K .......... .......... .......... .......... .......... 74%  351M 0s
 37950K .......... .......... .......... .......... .......... 75%  124M 0s
 38000K .......... .......... .......... .......... .......... 75%  322M 0s
 38050K .......... .......... .......... .......... .......... 75%  188M 0s
 38100K .......... .......... .......... .......... .......... 75%  242M 0s
 38150K ....

 42900K .......... .......... .......... .......... .......... 84%  163M 0s
 42950K .......... .......... .......... .......... .......... 84%  249M 0s
 43000K .......... .......... .......... .......... .......... 85%  329M 0s
 43050K .......... .......... .......... .......... .......... 85%  250M 0s
 43100K .......... .......... .......... .......... .......... 85%  149M 0s
 43150K .......... .......... .......... .......... .......... 85%  366M 0s
 43200K .......... .......... .......... .......... .......... 85% 98.6M 0s
 43250K .......... .......... .......... .......... .......... 85%  197M 0s
 43300K .......... .......... .......... .......... .......... 85%  345M 0s
 43350K .......... .......... .......... .......... .......... 85%  196M 0s
 43400K .......... .......... .......... .......... .......... 85%  238M 0s
 43450K .......... .......... .......... .......... .......... 85%  278M 0s
 43500K .......... .......... .......... .......... .......... 86%  194M 0s
 43550K ....

 48300K .......... .......... .......... .......... .......... 95%  335M 0s
 48350K .......... .......... .......... .......... .......... 95%  379M 0s
 48400K .......... .......... .......... .......... .......... 95%  180M 0s
 48450K .......... .......... .......... .......... .......... 95%  334M 0s
 48500K .......... .......... .......... .......... .......... 95%  392M 0s
 48550K .......... .......... .......... .......... .......... 96%  223M 0s
 48600K .......... .......... .......... .......... .......... 96%  317M 0s
 48650K .......... .......... .......... .......... .......... 96%  356M 0s
 48700K .......... .......... .......... .......... .......... 96%  387M 0s
 48750K .......... .......... .......... .......... .......... 96%  203M 0s
 48800K .......... .......... .......... .......... .......... 96%  280M 0s
 48850K .......... .......... .......... .......... .......... 96%  357M 0s
 48900K .......... .......... .......... .......... .......... 96%  277M 0s
 48950K ....

efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.data-00000-of-00001
efficientdet_d1_coco17_tpu-32/checkpoint/checkpoint
efficientdet_d1_coco17_tpu-32/checkpoint/ckpt-0.index


## Edit pipeline.config file

The [`pipeline.config`](source_dir/pipeline.config) in the `source_dir` folder should be updated when you experiment with different models. The different config files are available [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).

>Note: The provided `pipeline.config` file works well with the `EfficientDet` model. You would need to modify it when working with other models.

## Launch Training Job

Now that we have a dataset, a docker image and some pretrained model weights, we can launch the training job. To do so, we create a [Sagemaker Framework](https://sagemaker.readthedocs.io/en/stable/frameworks/index.html), where we indicate the container name, name of the config file, number of training steps etc.

The `run_training.sh` script does the following:
* train the model for `num_train_steps` 
* evaluate over the val dataset
* export the model

Different metrics will be displayed during the evaluation phase, including the mean average precision. These metrics can be used to quantify your model performances and compare over the different iterations.

You can also monitor the training progress by navigating to **Training -> Training Jobs** from the Amazon Sagemaker dashboard in the Web UI.

In [9]:
tensorboard_output_config = sagemaker.debugger.TensorBoardOutputConfig(
    s3_output_path=tensorboard_s3_prefix,
    container_local_output_path='/opt/training/'
)

estimator = CustomFramework(
    role=role,
    image_uri=container,
    entry_point='run_training.sh',
    source_dir='source_dir/',
    hyperparameters={
        "model_dir":"/opt/training",        
        "pipeline_config_path": "pipeline.config",
        "num_train_steps": "1",    
        "sample_1_of_n_eval_examples": "1"
    },
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    tensorboard_output_config=tensorboard_output_config,
    disable_profiler=True,
    base_job_name='tf2-object-detection'
)

estimator.fit(inputs)

Using provided s3_resource


INFO:sagemaker:Creating training-job with name: tf2-object-detection-2023-06-23-22-41-45-481


2023-06-23 22:41:48 Starting - Starting the training job.........
2023-06-23 22:42:56 Starting - Preparing the instances for training......
2023-06-23 22:44:01 Downloading - Downloading input data...
2023-06-23 22:44:26 Training - Downloading the training image...............
2023-06-23 22:47:02 Training - Training image download completed. Training in progress...[34m2023-06-23 22:47:27,357 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-06-23 22:47:27,390 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-06-23 22:47:27,422 sagemaker-training-toolkit INFO     No Neurons detected (normal if no neurons installed)[0m
[34m2023-06-23 22:47:27,435 sagemaker-training-toolkit INFO     Invoking user script[0m
[34mTraining Env:[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "train": "/opt/ml/input/data/train",
        "val": "/opt/ml/input/da

[34mI0623 22:47:39.349212 139715833239360 efficientnet_model.py:143] round_filter input=192 output=192[0m
[34mI0623 22:47:39.349378 139715833239360 efficientnet_model.py:143] round_filter input=320 output=320[0m
[34mI0623 22:47:39.645745 139715833239360 efficientnet_model.py:143] round_filter input=1280 output=1280[0m
[34mI0623 22:47:39.711168 139715833239360 efficientnet_model.py:453] Building model efficientnet with params ModelConfig(width_coefficient=1.0, depth_coefficient=1.1, resolution=240, dropout_rate=0.2, blocks=(BlockConfig(input_filters=32, output_filters=16, kernel_size=3, num_repeat=1, expand_ratio=1, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=16, output_filters=24, kernel_size=3, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=24, output_filters=40, kernel_size=5, num_repeat=2, expand_ratio=6, strides=

[34mTensorFlow Addons (TFA) has ended development and introduction of new features.[0m
[34mTFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.[0m
[34mPlease modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). [0m
[34mFor more information see: https://github.com/tensorflow/addons/issues/2807 
[34mW0623 22:48:28.126019 139944693475136 model_lib_v2.py:1089] Forced number of epochs for all eval validations to be 1.[0m
[34mINFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None[0m
[34mI0623 22:48:28.126250 139944693475136 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None[0m
[34mINFO:tensorflow:Maybe overwriting use_bfloat16: False[0m
[34mI0623 22:48:28.126353 139944693475136 config_util.py:552] Maybe overwriting use_bfloat16: False[0m
[34mINFO:tensorflow:Maybe overwriting eval_num_epochs: 1[0m
[34mI0623 22:4


2023-06-23 22:48:34 Uploading - Uploading generated training model
2023-06-23 22:48:34 Failed - Training job failed
[34mINFO:tensorflow:Timed-out waiting for a checkpoint.[0m
[34mI0623 22:48:49.664507 139944693475136 checkpoint_utils.py:231] Timed-out waiting for a checkpoint.[0m
[34m==EXPORTING THE MODEL==[0m
[34mTensorFlow Addons (TFA) has ended development and introduction of new features.[0m
[34mTFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.[0m
[34mPlease modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). [0m
[34mFor more information see: https://github.com/tensorflow/addons/issues/2807 
[34mI0623 22:48:55.354015 139777878959936 ssd_efficientnet_bifpn_feature_extractor.py:150] EfficientDet EfficientNet backbone version: efficientnet-b1[0m
[34mI0623 22:48:55.354203 139777878959936 ssd_efficientnet_bifpn_feature_extractor.py:152] E

UnexpectedStatusException: Error for Training job tf2-object-detection-2023-06-23-22-41-45-481: Failed. Reason: ClientError: Artifact upload failed:Unable to upload file to s3://object-detection-project/logs/tf2-object-detection-2023-06-23-22-41-45-481/tensorboard-output/train/events.out.tfevents.1687560473.ip-10-2-119-26.ec2.internal.43.0.v2 (AccessDenied: Access Denied)

You should be able to see your model training in the AWS webapp as shown below:
![ECR Example](../data/example_trainings.png)


## Improve on the intial model

Most likely, this initial experiment did not yield optimal results. However, you can make multiple changes to the `pipeline.config` file to improve this model. One obvious change consists in improving the data augmentation strategy. The [`preprocessor.proto`](https://github.com/tensorflow/models/blob/master/research/object_detection/protos/preprocessor.proto) file contains the different data augmentation method available in the Tf Object Detection API. Justify your choices of augmentations in the writeup.

Keep in mind that the following are also available:
* experiment with the optimizer: type of optimizer, learning rate, scheduler etc
* experiment with the architecture. The Tf Object Detection API model zoo offers many architectures. Keep in mind that the pipeline.config file is unique for each architecture and you will have to edit it.
* visualize results on the test frames using the `2_deploy_model` notebook available in this repository.

In the cell below, write down all the different approaches you have experimented with, why you have chosen them and what you would have done if you had more time and resources. Justify your choices using the tensorboard visualizations (take screenshots and insert them in your writeup), the metrics on the evaluation set and the generated animation you have created with [this tool](../2_run_inference/2_deploy_model.ipynb).

In [None]:
# your writeup goes here.