# Performing Object Detection in GovCloud

## GDMS USE Case

The AI/ML use case is to be able to process video from cameras mounted in the autoclaves and to be able to recognize loading and unloading of Radomes. To be able to do this, we will need to do the following

![](./GDMS-obj-detection-flow.png)

 - Image Extraction - We will extract image frames from an uploaded video.
 - *Image Labeling* - We will not be doing the labeling step as GroundTruth labeling service is not currently available in GovCloud. So for our workshop will use pre-labelled images
 - Train Object Detection Model - train an Object Detection builtin algorithm in SageMaker
 - Deploy to and Endpoint - Deploy the trained model to an endpoint
 - Perform Predictions -  do predictions against the endpoint and list the objects detected
 
 The initial approach was to use an AI Service called Rekognition Custom Labels, however this is not yet available in GovCloud so this approach will use SageMaker Object Detection instead. Lets see how the two apporaches would have differed

![wombat](./obj-detection-complete-flow.png)

As you can see, the flow is not too much different when using Rekognition vs SageMaker. <br> So lets first concentrate on being able to split a large video into images before we can perform image labelling

## Process Video into Images using a basic SageMaker Processing Script

### Introduction

Object Detection requires images to be extracted from a video file. The images then need to be labelled with bounding boxes are each of the objects in each image. Once this process is complete, the builtin algorithm for Object Detection in SageMaker will be able to take the images and labels and start training an object detection model.

![](./stage1-gov-obj-detect.png)

This notebook shows a very basic example of using SageMaker Processing to create images from a video file to create an image dataset. SageMaker Processing is used to create this dataset, which then are written back to S3.

First, let’s create an SKLearnProcessor object, passing the scikit-learn version we want to use, as well as our managed infrastructure requirements.

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import SKLearnProcessor
import json

region = boto3.session.Session().region_name

staging_bucket = 'PUT YOUR STAGING BUCKET NAME HERE'
staging_prefix = 'input_data'

output_bucket = 'PUT YOUR OUTPUT BUCKET NAME HERE'
output_prefix = 'training_images'

source_video = 'Loading_Trucks_At_The_Warehouse.mp4'
config_file = 'config_file.json'

role = get_execution_role()
sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1", role=role, instance_type="ml.m5.xlarge", instance_count=1
)

Copy the video to your staging bucket

In [None]:
! aws s3 cp s3://ml-materials/loading_warehouse/Loading_Trucks_At_The_Warehouse.mp4 s3://{staging_bucket}/{staging_prefix}/

Now we create a config file that will get the attributes of how we want to analyze the video file to images

In [None]:
config_dict = {
    'video_creation_time':'2022-01-29 08:00:00',
    'capture_start_time': '2022-01-29 08:00:50',
    'capture_end_time': '2022-01-29 08:04:30',
    'capture_interval_in_seconds': 0.5
}

Now we will save the config file locally and transfer it to your staging bucket

In [None]:
with open('config_file.json', 'w') as f:
    json.dump(config_dict, f)

In [None]:
! aws s3 cp "config_file.json" s3://{staging_bucket}/{staging_prefix}/

Create a python file on the local file system which will be used by SageMaker Processing.  
This is the code that actually does the work of extracting images from the video file.  


In [None]:
%%writefile preprocessing.py
import pandas as pd
import argparse
import os
import json
import datetime as dt
from datetime import datetime
from io import BytesIO
import gc

# Install some libraries that we are going to use for image extraction and formatting
os.system('pip3 install decord Pillow')

import decord as de
from PIL import Image, ImageOps

def convert_video_to_images(input_filename, config_file):
    
    cfg_end_time = None
    end_time_in_secs = 0
    # Setup the path to the video and config file that SageMaker Processing maps locally
    input_data_path = os.path.join("/opt/ml/processing/input", input_filename)
    input_config_path = os.path.join("/opt/ml/processing/input", config_file)

    print(f"\n\nLoading Config File {input_config_path}")
    
    # Load config file
    with open(input_config_path, "r") as cfd:
        job_config = json.load(cfd)
    
    # Convert the attributes fron the config file into the values we will use for this job
    cfg_video_base_time = dt.datetime.strptime(job_config['video_creation_time'], '%Y-%m-%d %H:%M:%S')
    cfg_start_time = dt.datetime.strptime(job_config['capture_start_time'], '%Y-%m-%d %H:%M:%S')
    if job_config['capture_end_time']:
        cfg_end_time = dt.datetime.strptime(job_config['capture_end_time'], '%Y-%m-%d %H:%M:%S')
        end_time_in_secs = (cfg_end_time - cfg_video_base_time).total_seconds()

    interval_time = float(job_config['capture_interval_in_seconds'])
    start_time_in_secs = (cfg_start_time - cfg_video_base_time).total_seconds()
    
    print(f"Reading Video File {input_data_path}\n")
    
    fid=open(input_data_path, 'rb')
    vrd = de.VideoReader(input_data_path, width=512, height=384)
    num_frames = len(vrd)
    end_frame_number = num_frames-1
    #print('Video frames #:', len(vrd))
    print('\nVideo frames #:', num_frames)
    print('First frame shape:', vrd[0].shape)
    fps_vid = int(vrd.get_avg_fps())
    print(f'Average Frame Rate: {fps_vid}\n')
    # Split data set into training, validation, and test
    start_frame_number = int(start_time_in_secs*fps_vid)
    start_frame = vrd[start_frame_number].asnumpy()
    if end_time_in_secs:
        end_frame_number = int(end_time_in_secs*fps_vid)
    
    end_frame = vrd[end_frame_number].asnumpy()

    frame_list = []
    frame_sample_interval = interval_time
    frame_interval = int(frame_sample_interval * fps_vid)
    frame_req_block = frame_interval * 15
    
    job_total_images = 0
    if frame_req_block>end_frame_number:
        frame_req_block=end_frame_number
    
    for y in range(start_frame_number,end_frame_number,frame_req_block):
        block_list=[]
        for x in range(y,y+frame_req_block,frame_interval):
            block_list.append(x)
            job_total_images += 1
        frame_list.append(block_list)

    output_prefix = '/opt/ml/processing/output'
    image_count = 0

    print(f"\nGenerating {job_total_images} Images ...")
    print(f"Completed 0% ...")
    for block in frame_list:
        image_frames = vrd.get_batch(block).asnumpy()
        num_sec = float(block[0]/fps_vid)
        for images in image_frames: 
            buffer = BytesIO()
            image_array = Image.fromarray(images)
            # Resize Image to fit within 512x512 pixels and maintain aspect ratio
            # and save new PNG image into buffer 
            scaled_image = ImageOps.contain(image_array,(512,512))
            scaled_image.save(buffer, format="png")
            
            # Use frame to seconds calculated to generate meaningful image filename 
            current_frame_ts = cfg_video_base_time + dt.timedelta(0,num_sec)
            image_base_fname = datetime.fromtimestamp(current_frame_ts.timestamp()).strftime("%Y-%m-%d-%H:%M:%S.%f")
            image_base_fname = image_base_fname[0:21]
            
            # Write out buffer to image filename
            with open(f'{output_prefix}/image_{image_base_fname}.png','wb') as imgfd:
                imgfd.write(buffer.getvalue())
                image_count += 1
            num_sec += frame_sample_interval
            
            # Print completion percentage status
            complete_percent = image_count/job_total_images*100
            if (complete_percent % 10 == 0):
                print(f"Completed {int(complete_percent)}% ...")
                
            # delete temporary image buffers and force garbage collect to keep memory footprint low  
            del buffer
            del scaled_image
            del image_array
            gc.collect()
        # delete temporary image buffers and force garbage collect to keep memory footprint low
        del image_frames
        gc.collect()
    fid.close()
    return image_count

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--video-filename", type=str, default="NONE")
    parser.add_argument("--config-file", type=str, default="NONE")
    args, _ = parser.parse_known_args()

    input_video_filename = args.video_filename
    input_config_file = args.config_file
    if input_video_filename == "NONE" or input_config_file == "NONE":
        print("Must provide --video-filename and --config-file. Exiting")
        raise Exception("Must provide --video-filename and --config-file. Exiting")
        exit()
    
    print(f"Received arguments {args}")
    
    now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print (f"Started Processing Job at : {now}")
    
    images_created = convert_video_to_images(input_video_filename, input_config_file)
    
    now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    print (f"Finished Processing Job at : {now}")
    
    print(f"Images Created {images_created}")
    print("Finished running processing job")

Now execute the SageMaker Processing job which will run the *preprocessing.py* code

In [None]:
from sagemaker.processing import ProcessingInput, ProcessingOutput

sklearn_processor.run(
    code="./preprocessing.py",
    arguments = ['--video-filename', source_video, '--config-file', config_file],
    inputs=[
        ProcessingInput(source=f's3://{staging_bucket}/{staging_prefix}',
                        destination='/opt/ml/processing/input')
    ],
    outputs=[
        ProcessingOutput(source='/opt/ml/processing/output',
                         destination=f's3://{output_bucket}/{output_prefix}'),
    ],
)

Once this process has finished you can go and inspect the extracted images in your S3 bucket.

## Conclusion - First Step

We just accomplished how to convert a large video file to seperate images in a fully managed way by leveraging SageMaker Processing