## Amazon Augmented AI (Amazon A2I) integration with Amazon SageMaker Hosted Endpoint for Audio Classification and Model Retraining

### Architecture 

<img src="./images/part2.png" alt="architecture" width="800"/>


### 5. A2I Setup 

a. [Introduction](#Introduction)

b. [Setup](#Setup)

c. [Create Control Plane Resources](#Create-Control-Plane-Resources)

    
### 6. Setup workforce and Labeling Manually    
a. [Starting Human Loops](#Starting-Human-Loops)

b. [Configure a2i status change to SQS](#sqs_a2i)

c. [Wait For Workers to Complete Task](#Wait-For-Workers-to-Complete-Task)

d. [Check Status of Human Loop](#Check-Status-of-Human-Loop)

e. [View Task Results](#View-Task-Results)
   
### 7. Retrain and Redeploy    
[Incremental training with SageMaker](#Incremental-training-with-SageMaker)

### 8. Configure Lambda and Api gateway
[Create Lambda Function triggering a2i process](#lambda)


## Introduction

Amazon Augmented AI (Amazon A2I) makes it easy to build the workflows required for human review of ML predictions. Amazon A2I brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers. 

You can create your own workflows for ML models built on Amazon SageMaker or any other tools. Using Amazon A2I, you can allow human reviewers to step in when a model is unable to make a high confidence prediction or to audit its predictions on an on-going basis. 

Learn more here: https://aws.amazon.com/augmented-ai/

In this tutorial, we will show how you can use **Amazon A2I with an Amazon SageMaker Hosted Endpoint.** We will be using an exisiting audio classification model in this notebook. We will also demonstrate how to manipulate the A2I output to perform incremental training to improve the model accuracy with the newly labeled data using A2I.

For more in depth instructions, visit https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-getting-started.html

To incorporate Amazon A2I into your human review workflows, you need three resources:

* A **worker task template** to create a worker UI. The worker UI displays your input data, such as documents or images, and instructions to workers. It also provides interactive tools that the worker uses to complete your tasks. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-instructions-overview.html

* A **human review workflow**, also referred to as a flow definition. You use the flow definition to configure your human workforce and provide information about how to accomplish the human review task. You can create a flow definition in the Amazon Augmented AI console or with Amazon A2I APIs. To learn more about both of these options, see https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html

* A **human loop** to start your human review workflow. When you use one of the built-in task types, the corresponding AWS service creates and starts a human loop on your behalf when the conditions specified in your flow definition are met or for each object if no conditions were specified. When a human loop is triggered, human review tasks are sent to the workers as specified in the flow definition.

When using a custom task type, as this tutorial will show, you start a human loop using the Amazon Augmented AI Runtime API. When you call `start_human_loop()` in your custom application, a task is sent to human reviewers.

## Setup
This notebook is developed and tested in a SageMaker Notebook Instance with a `ml.t2.medium` instance with SageMaker Python SDK v2. It is recommended to execute the notebook in the same environment for best experience.
### Install Latest SDK

In [1]:
!pip install -U sagemaker==2.23.1

Collecting sagemaker==2.23.1
  Downloading sagemaker-2.23.1.tar.gz (400 kB)
[K     |████████████████████████████████| 400 kB 8.7 MB/s eta 0:00:01
Building wheels for collected packages: sagemaker
  Building wheel for sagemaker (setup.py) ... [?25ldone
[?25h  Created wheel for sagemaker: filename=sagemaker-2.23.1-py2.py3-none-any.whl size=559547 sha256=b769a316736b9f83d730ba3a9da532805a76c557e2d05a98f37fa958f5dade43
  Stored in directory: /home/ec2-user/.cache/pip/wheels/f6/ea/42/c6241b7aef8d2f4cbe4af5672ecb3889f95fc3df8c599239a4
Successfully built sagemaker
Installing collected packages: sagemaker
  Attempting uninstall: sagemaker
    Found existing installation: sagemaker 2.45.0
    Uninstalling sagemaker-2.45.0:
      Successfully uninstalled sagemaker-2.45.0
Successfully installed sagemaker-2.23.1
You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.[0m


In [40]:
import sagemaker
from pkg_resources import parse_version

assert parse_version(sagemaker.__version__) >= parse_version('2'), \
    '''This notebook is only compatible with sagemaker python SDK >= 2. 
Current version is %s. Please make sure you upgrade the library.''' % sagemaker.__version__

print('SageMaker python SDK version: %s' % sagemaker.__version__)

SageMaker python SDK version: 2.23.1


We need to set up the following data:
* `region` - Region to call A2I.
* `BUCKET` - A S3 bucket accessible by the given role
    * Used to store the sample images & output results
    * Must be within the same region A2I is called from
* `role` - The IAM role used as part of StartHumanLoop. By default, this notebook will use the execution role
* `workteam` - Group of people to send the work to

In [6]:

import boto3 

my_session = boto3.session.Session()
region = my_session.region_name

In [7]:
%store -r endpoint_name 

In [8]:
endpoint_name

'audio-20210716'

### Role and Permissions

The AWS IAM Role used to execute the notebook needs to have the following permissions:

* SagemakerFullAccess
* AmazonSageMakerMechanicalTurkAccess (if using MechanicalTurk as your Workforce)

In [9]:
from sagemaker import get_execution_role
import sagemaker

# Setting Role to the default SageMaker Execution Role
role = get_execution_role()
display(role)

'arn:aws:iam::355444812467:role/service-role/AmazonSageMaker-ExecutionRole-20210702T211675'

In [10]:
import os
import boto3
import botocore

sess = sagemaker.Session()
BUCKET = sess.default_bucket()
TRAIN_PATH = f's3://{BUCKET}/tomofun'
OUTPUT_PATH = f's3://{BUCKET}/a2i-results'


#### Setup Bucket and Paths

**Important**: The bucket you specify for `BUCKET` must have CORS enabled. You can enable CORS by adding a policy similar to the following to your Amazon S3 bucket. To learn how to add CORS to an S3 bucket, see [CORS Permission Requirement](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-permissions-security.html#a2i-cors-update) in the Amazon A2I documentation. 


```
[{
   "AllowedHeaders": [],
   "AllowedMethods": ["GET"],
   "AllowedOrigins": ["*"],
   "ExposeHeaders": []
}]
```

If you do not add a CORS configuration to the S3 buckets that contains your image input data, human review tasks for those input data objects will fail. 


In [11]:
cors_configuration = {
    'CORSRules': [{
       "AllowedHeaders": [],
       "AllowedMethods": ["GET"],
       "AllowedOrigins": ["*"],
       "ExposeHeaders": []
    }]
}

# Set the CORS configuration
s3 = boto3.client('s3')
s3.put_bucket_cors(Bucket=BUCKET,
                   CORSConfiguration=cors_configuration)

{'ResponseMetadata': {'RequestId': 'B52KSHW0A0AVTJBT',
  'HostId': 'MGoNyrXSUw9IbfT6zr56qUxqkKPA6p9ECblLThcY+zcdY65s/qt+0rgTiUUr4EC6yb9ZD2HNraM=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'MGoNyrXSUw9IbfT6zr56qUxqkKPA6p9ECblLThcY+zcdY65s/qt+0rgTiUUr4EC6yb9ZD2HNraM=',
   'x-amz-request-id': 'B52KSHW0A0AVTJBT',
   'date': 'Sat, 17 Jul 2021 03:30:18 GMT',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0}}

### Audio Classification with Amazon SageMaker

To demonstrate A2I with Amazon SageMaker hosted endpoint, we will take a trained audio classification model from a S3 bucket and host it on the SageMaker endpoint for real-time prediction. 

#### Load the model and create an endpoint
The next cell will setup an endpoint from a trained model. It will take about 3 minutes.

In [12]:
import boto3 

my_session = boto3.session.Session()
client = boto3.client("sts")
account_id = client.get_caller_identity()["Account"]
# algorithm_name = "vgg16-audio"
algorithm_name = "vgg-audio"
image_uri=f"{account_id}.dkr.ecr.{region}.amazonaws.com/{algorithm_name}"

In [13]:
image_uri

'355444812467.dkr.ecr.us-west-2.amazonaws.com/vgg-audio'

#### Helper functions

In [14]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches    
import matplotlib.image as mpimg
import random
import numpy as np
import json

runtime_client = boto3.client('runtime.sagemaker')


    
def load_and_predict(file_name):
    """
    load an audio file, make audio classification to an predictor
    Parameters:
    ----------
    file_name : str
        image file location, in str format
    predictor : sagemaker.predictor.RealTimePredictor
        a predictor loaded from hosted endpoint
    threshold : float
        score threshold for bounding box display
    """
    with open(file_name, 'rb') as image:
        f = image.read()
        b = bytearray(f)
    response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, 
                                   ContentType='application/octet-stream', 
                                   Body=b)
    results = response['Body'].read().decode('utf-8')

    print(results)

    detections = json.loads(results)
    return results, detections

In [15]:
# object_categories = ["Barking", "Howling", "Crying", "COSmoke","GlassBreaking","Other"]
object_categories = ["Barking", "Howling", "Crying", "COSmoke","GlassBreaking","Other", 
                     "Doorbell", 'Bird', 'Music_Instrument', 'Laugh_Shout_Scream']

#### Sample Data
Let's take a look how the audio classification model looks like using some audio clips on our hands. The predicted class and the prediction probability is presented.

In [16]:
# !mkdir audios 
!cp ../01-byoc/train_data/train_00001.wav audios 
!cp ../01-byoc/train_data/train_00010.wav audios 
!cp ../01-byoc/train_data/train_00021.wav audios 

In [17]:
test_audios = ['audios/train_00001.wav', # motorcycle
               'audios/train_00010.wav', # bicycle
               'audios/train_00021.wav'] # sofa

In [18]:
import IPython.display as ipd
ipd.Audio(test_audios[0], autoplay=True)

In [19]:
for audio in test_audios: 
    results, detections = load_and_predict(audio)
    print(detections) 

{"label": 0, "probability": [0.9903320670127869, 0.0001989333686651662, 0.0007428666576743126, 0.00016775673429947346, 3.222770828870125e-05, 3.8894515455467626e-05, 0.0003140326589345932, 0.0003089535457547754, 3.332734195282683e-05, 0.007830953225493431]}
{'label': 0, 'probability': [0.9903320670127869, 0.0001989333686651662, 0.0007428666576743126, 0.00016775673429947346, 3.222770828870125e-05, 3.8894515455467626e-05, 0.0003140326589345932, 0.0003089535457547754, 3.332734195282683e-05, 0.007830953225493431]}
{"label": 0, "probability": [0.8528938293457031, 0.0019418800948187709, 0.0011375222820788622, 0.0007829791866242886, 0.013402803801000118, 0.0073211463168263435, 0.03736455738544464, 0.00028037233278155327, 0.0017033849144354463, 0.083171546459198]}
{'label': 0, 'probability': [0.8528938293457031, 0.0019418800948187709, 0.0011375222820788622, 0.0007829791866242886, 0.013402803801000118, 0.0073211463168263435, 0.03736455738544464, 0.00028037233278155327, 0.0017033849144354463, 0.

Probability of 0.465 is considered quite low in modern computer vision and there is a mislabeling. This is due to the fact that the SSD model was under-trained for demonstration purposes in the [training notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb). However this under-trained model serves as a perfect example of brining human reviewers when a model is unable to make a high confidence prediction.

### Creating human review Workteam or Workforce

A workforce is the group of workers that you have selected to label your dataset. You can choose either the Amazon Mechanical Turk workforce, a vendor-managed workforce, or you can create your own private workforce for human reviews. Whichever workforce type you choose, Amazon Augmented AI takes care of sending tasks to workers. 

When you use a private workforce, you also create work teams, a group of workers from your workforce that are assigned to Amazon Augmented AI human review tasks. You can have multiple work teams and can assign one or more work teams to each job.

To create your Workteam, visit the instructions here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management.html

After you have created your workteam, replace YOUR_WORKTEAM_ARN below

In [41]:
my_session = boto3.session.Session()
my_region = my_session.region_name
client = boto3.client("sts")
account_id = client.get_caller_identity()["Account"]

# WORKTEAM_ARN = "arn:aws:sagemaker:{}:{}:workteam/private-crowd/seal-squad".format(my_region, account_id)
WORKTEAM_ARN = "arn:aws:sagemaker:{}:{}:workteam/private-crowd/fish-squad".format(my_region, account_id)

WORKTEAM_ARN

'arn:aws:sagemaker:us-west-2:355444812467:workteam/private-crowd/fish-squad'

Visit: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-permissions-security.html to add the necessary permissions to your role

## Client Setup

Here we are going to setup the rest of our clients. 

In [42]:
import io
import uuid
import time 

timestamp = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
# Amazon SageMaker client
sagemaker_client = boto3.client('sagemaker', region)
s3_client = boto3.client('s3')

# Amazon Augment AI (A2I) client
a2i = boto3.client('sagemaker-a2i-runtime')

# Amazon S3 client 
s3 = boto3.client('s3', region)

# Flow definition name - this value is unique per account and region. You can also provide your own value here.
flowDefinitionName = 'fd-sagemaker-audio-classification-demo-' + timestamp

# Task UI name - this value is unique per account and region. You can also provide your own value here.
taskUIName = 'ui-sagemaker-audio-classification-demo-' + timestamp

## Create Control Plane Resources

### Create Human Task UI

Create a human task UI resource, giving a UI template in liquid html. This template will be rendered to the human workers whenever human loop is required.

For over 70 pre built UIs, check: https://github.com/aws-samples/amazon-a2i-sample-task-uis.

We will be taking an [audio classification UI](https://github.com/aws-samples/amazon-sagemaker-ground-truth-task-uis/blob/master/audio/audio-classification.liquid.html) and filling in the object categories in the `labels` variable in the template.

In [22]:
# task.input.taskObject

template = r"""
<script src="https://assets.crowd.aws/crowd-html-elements.js"></script>

<crowd-form>
    <crowd-classifier
      name="sentiment"
      categories="['Barking', 'Howling', 'Crying', 'COSmoke','GlassBreaking','Other','Doorbell', 'Bird', 'Music_Instrument', 'Laugh_Shout_Scream']"
      header="What class does this audio represent?"
    >
      <classification-target>
          <audio controls>
              <source src="{{ task.input.taskObject | grant_read_access }}" type="audio/wav">
              Your browser does not support the audio element.
          </audio>
      </classification-target>
      
      <full-instructions header="Audio Classification Analysis Instructions">
        <p><strong>Barking</strong>Barking </p>
        <p><strong>Howling</strong>Howling</p>
        <p><strong>Crying</strong>Crying</p>
        <p><strong>COSmoke</strong>COSmoke</p>
        <p><strong>GlassBreaking</strong>GlassBreaking</p>
        <p><strong>Other</strong>Other</p>
        <p><strong>Other</strong>Doorbell</p>
        <p><strong>Other</strong>Bird</p>
        <p><strong>Other</strong>Music_Instrument</p>
        <p><strong>Other</strong>Laugh_Shout_Scream</p>
      </full-instructions>

      <short-instructions>
        <p>Choose the primary sentiment that is expressed by the audio.</p>
      </short-instructions>
    </crowd-classifier>
</crowd-form>
"""

def create_task_ui():
    '''
    Creates a Human Task UI resource.

    Returns:
    struct: HumanTaskUiArn
    '''
    response = sagemaker_client.create_human_task_ui(
        HumanTaskUiName=taskUIName,
        UiTemplate={'Content': template})
    return response

In [43]:
# Create task UI
humanTaskUiResponse = create_task_ui()
humanTaskUiArn = humanTaskUiResponse['HumanTaskUiArn']
print(humanTaskUiArn)

arn:aws:sagemaker:us-west-2:355444812467:human-task-ui/ui-sagemaker-audio-classification-demo-2021-07-17-03-32-38


### Create the Flow Definition

In this section, we're going to create a flow definition definition. Flow Definitions allow us to specify:

* The workforce that your tasks will be sent to.
* The instructions that your workforce will receive. This is called a worker task template.
* The configuration of your worker tasks, including the number of workers that receive a task and time limits to complete tasks.
* Where your output data will be stored.

This demo is going to use the API, but you can optionally create this workflow definition in the console as well. 

For more details and instructions, see: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html.

In [44]:
create_workflow_definition_response = sagemaker_client.create_flow_definition(
        FlowDefinitionName= flowDefinitionName,
        RoleArn= role,
        HumanLoopConfig= {
            "WorkteamArn": WORKTEAM_ARN,
            "HumanTaskUiArn": humanTaskUiArn,
            "TaskCount": 1,
            "TaskDescription": "Classify the audio category.",
            "TaskTitle": "Audio Classification"
        },
        OutputConfig={
            "S3OutputPath" : OUTPUT_PATH
        }
    )
flowDefinitionArn = create_workflow_definition_response['FlowDefinitionArn'] # let's save this ARN for future use

In [45]:
# Describe flow definition - status should be active
for x in range(60):
    describeFlowDefinitionResponse = sagemaker_client.describe_flow_definition(FlowDefinitionName=flowDefinitionName)
    print(describeFlowDefinitionResponse['FlowDefinitionStatus'])
    if (describeFlowDefinitionResponse['FlowDefinitionStatus'] == 'Active'):
        print("Flow Definition is active")
        break
    time.sleep(2)

Initializing
Active
Flow Definition is active


### Create SQS queue and pass a2i task status change event to the queue
<a id="sqs_a2i"></a>

In [28]:
sqs = boto3.resource('sqs')
queue_name = 'a2itasks'
queue_arn = "arn:aws:sqs:{}:{}:{}".format(region, account_id, queue_name)

policy = '''{
            "Version": "2012-10-17",
            "Id": "MyQueuePolicy",
            "Statement": [{                     
                    "Effect": "Allow",
                    "Principal": {
                            "Service": ["events.amazonaws.com",
                            "sqs.amazonaws.com"]
                    },
                    "Action": "sqs:SendMessage"
            }]}'''
policy_obj = json.loads(policy)
policy_obj['Statement'][0]['Resource'] = queue_arn
policy = json.dumps(policy_obj)

# queue = sqs.create_queue(QueueName=queue_name, Attributes={'DelaySeconds': '0',
#                                                                 'Policy': policy})
queue = sqs.get_queue_by_name(QueueName=queue_name)
print(queue.url)


https://us-west-2.queue.amazonaws.com/355444812467/a2itasks


In [29]:
sqs_client = boto3.client('sqs')

sqs_client.add_permission(
    QueueUrl=queue.url,
    Label="a2i",
    AWSAccountIds=[
        account_id,
    ],
    Actions=[
        'SendMessage',
    ]
)

{'ResponseMetadata': {'RequestId': '415f78a2-2d9d-5e85-8bf7-21ce7ca9d0ad',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '415f78a2-2d9d-5e85-8bf7-21ce7ca9d0ad',
   'date': 'Sat, 17 Jul 2021 03:31:00 GMT',
   'content-type': 'text/xml',
   'content-length': '215'},
  'RetryAttempts': 0}}

In [31]:
iam = boto3.client("iam")

role_name = "AmazonSageMaker-SageMakerExecutionRole"
assume_role_policy_document = {
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "Service": ["sagemaker.amazonaws.com", "events.amazonaws.com"]
          },
          "Action": "sts:AssumeRole"
        }
    ]
}

# create_role_response = iam.create_role(
#     RoleName = role_name,
#     AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
# )


# Now add S3 support
iam.attach_role_policy(
    PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
    RoleName=role_name
)
time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate

# sm_role_arn = create_role_response["Role"]["Arn"]

sm_role_arn = 'arn:aws:iam::355444812467:role/AmazonSageMaker-SageMakerExecutionRole'
print(sm_role_arn)

arn:aws:iam::355444812467:role/AmazonSageMaker-SageMakerExecutionRole


In [33]:
%%bash  -s "$sm_role_arn" "$my_region" 
aws events put-rule --name "A2IHumanLoopStatusChanges" \
    --event-pattern "{\"source\":[\"aws.sagemaker\"],\"detail-type\":[\"SageMaker A2I HumanLoop Status Change\"]}" \
    --role-arn "$1" \
    --region $2 

{
    "RuleArn": "arn:aws:events:us-west-2:355444812467:rule/A2IHumanLoopStatusChanges"
}


In [35]:
!sed "s/<account_id>/$account_id/g" targets-template.json > targets-tmp.json 
!sed "s/<region>/$my_region/g" targets-tmp.json  > targets.json 

In [37]:
!aws events put-targets --rule A2IHumanLoopStatusChanges \
--targets file://$PWD/targets.json

{
    "FailedEntryCount": 0,
    "FailedEntries": []
}


#### Have newly created SQS queue as a target of the rule we just defined 

## Starting Human Loops

Now that we have setup our Flow Definition, we are ready to call our object detection endpoint on SageMaker and start our human loops. In this tutorial, we are interested in starting a HumanLoop only if the highest prediction probability score returned by our model for objects detected is less than 50%. 

So, with a bit of logic, we can check the response for each call to the SageMaker endpoint using `load_and_predict` helper function, and if the highest score is less than 50%, we will kick off a HumanLoop to engage our workforce for a human review. 

In [46]:
# Get the sample images to s3 bucket for a2i UI to display
!aws s3 sync ./audios/ s3://{BUCKET}/audios/
    
human_loops_started = []
SCORE_THRESHOLD = 1.0
import json
for fname in test_audios:
    # Call SageMaker endpoint and not display any object detected with probability lower than 0.4.

    # Sort by prediction score so that the first item has the highest probability
    result, detections = load_and_predict(audio)
    max_p = max(detections['probability']) 

    # Our condition for triggering a human review
    if max_p < SCORE_THRESHOLD:
        s3_fname='s3://%s/%s' % (BUCKET, fname)
        print(s3_fname)
        humanLoopName = str(uuid.uuid4())
        inputContent = {
            "initialValue": max_p,
            "taskObject": s3_fname # the s3 object will be passed to the worker task UI to render
        }
        # start an a2i human review loop with an input
        start_loop_response = a2i.start_human_loop(
            HumanLoopName=humanLoopName,
            FlowDefinitionArn=flowDefinitionArn,
            HumanLoopInput={
                "InputContent": json.dumps(inputContent)
            }
        )
        print(start_loop_response)
        human_loops_started.append(humanLoopName)
        print(f'Object detection Confidence Score of %s is less than the threshold of %.2f' % (max_p, SCORE_THRESHOLD))
        print(f'Starting human loop with name: {humanLoopName}  \n')
    else:
        print(f'Object detection Confidence Score of %s is above than the threshold of %.2f' % (max_p, SCORE_THRESHOLD))
        print('No human loop created. \n')

upload: audios/train_00010.wav to s3://sagemaker-us-west-2-355444812467/audios/train_00010.wav
upload: audios/train_00021.wav to s3://sagemaker-us-west-2-355444812467/audios/train_00021.wav
upload: audios/train_00001.wav to s3://sagemaker-us-west-2-355444812467/audios/train_00001.wav
{"label": 0, "probability": [0.999336838722229, 1.1203339454368688e-05, 4.069593342137523e-05, 1.0619601198413875e-05, 0.000325192348100245, 8.879670531314332e-06, 1.770690687408205e-05, 2.0911184037686326e-05, 4.967172117176233e-06, 0.0002231001853942871]}
s3://sagemaker-us-west-2-355444812467/audios/train_00001.wav
{'ResponseMetadata': {'RequestId': 'fdb945a1-0244-49cc-9067-08bcf03e15b6', 'HTTPStatusCode': 201, 'HTTPHeaders': {'date': 'Sat, 17 Jul 2021 03:33:01 GMT', 'content-type': 'application/json; charset=UTF-8', 'content-length': '248', 'connection': 'keep-alive', 'x-amzn-requestid': 'fdb945a1-0244-49cc-9067-08bcf03e15b6', 'access-control-allow-origin': '*', 'x-amz-apigw-id': 'CmFZEF1rvHcFyvQ=', 'x-

### Check Status of Human Loop

In [47]:
completed_human_loops = []
for human_loop_name in human_loops_started:
    resp = a2i.describe_human_loop(HumanLoopName=human_loop_name)
    print(resp) 
    print(f'HumanLoop Name: {human_loop_name}')
    print(f'HumanLoop Status: {resp["HumanLoopStatus"]}')
    print(f'HumanLoop Output Destination: {resp["HumanLoopOutput"]}')
    print('\n')
    
    if resp["HumanLoopStatus"] == "Completed":
        completed_human_loops.append(resp)

{'ResponseMetadata': {'RequestId': 'd85816aa-f812-4321-ab1b-5d28ca2746ca', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 17 Jul 2021 03:33:34 GMT', 'content-type': 'application/json; charset=UTF-8', 'content-length': '819', 'connection': 'keep-alive', 'x-amzn-requestid': 'd85816aa-f812-4321-ab1b-5d28ca2746ca', 'access-control-allow-origin': '*', 'x-amz-apigw-id': 'CmFePETzPHcF-9A=', 'x-amzn-trace-id': 'Root=1-60f24f8e-24b06e004d2681ce493ef411'}, 'RetryAttempts': 0}, 'CreationTime': datetime.datetime(2021, 7, 17, 3, 33, 1, 185000, tzinfo=tzlocal()), 'HumanLoopStatus': 'Completed', 'HumanLoopName': '761d3373-e905-4143-ab74-32d6761652e2', 'HumanLoopArn': 'arn:aws:sagemaker:us-west-2:355444812467:human-loop/761d3373-e905-4143-ab74-32d6761652e2', 'FlowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-03-32-38', 'HumanLoopOutput': {'OutputS3Uri': 's3://sagemaker-us-west-2-355444812467/a2i-results/fd-sagemaker-au

### Wait For Workers to Complete Task
Since we are using private workteam, we should go to the labling UI to perform the inspection ourselves.

In [48]:
workteamName = WORKTEAM_ARN[WORKTEAM_ARN.rfind('/') + 1:]
print("Navigate to the private worker portal and do the tasks. Make sure you've invited yourself to your workteam!")
print('https://' + sagemaker_client.describe_workteam(WorkteamName=workteamName)['Workteam']['SubDomain'])

Navigate to the private worker portal and do the tasks. Make sure you've invited yourself to your workteam!
https://80w77ljpao.labeling.us-west-2.sagemaker.aws


In [49]:
completed_human_loops = []
for human_loop_name in human_loops_started:
    resp = a2i.describe_human_loop(HumanLoopName=human_loop_name)
    print(resp) 
    print(f'HumanLoop Name: {human_loop_name}')
    print(f'HumanLoop Status: {resp["HumanLoopStatus"]}')
    print(f'HumanLoop Output Destination: {resp["HumanLoopOutput"]}')
    print('\n')
    
    if resp["HumanLoopStatus"] == "Completed":
        completed_human_loops.append(resp)

{'ResponseMetadata': {'RequestId': 'b9d62bd2-d2d3-419b-ad02-2ff96c7448d1', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 17 Jul 2021 03:33:38 GMT', 'content-type': 'application/json; charset=UTF-8', 'content-length': '819', 'connection': 'keep-alive', 'x-amzn-requestid': 'b9d62bd2-d2d3-419b-ad02-2ff96c7448d1', 'access-control-allow-origin': '*', 'x-amz-apigw-id': 'CmFe8ElJvHcF1rA=', 'x-amzn-trace-id': 'Root=1-60f24f92-3254402831e06a5432b71405'}, 'RetryAttempts': 0}, 'CreationTime': datetime.datetime(2021, 7, 17, 3, 33, 1, 185000, tzinfo=tzlocal()), 'HumanLoopStatus': 'Completed', 'HumanLoopName': '761d3373-e905-4143-ab74-32d6761652e2', 'HumanLoopArn': 'arn:aws:sagemaker:us-west-2:355444812467:human-loop/761d3373-e905-4143-ab74-32d6761652e2', 'FlowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-03-32-38', 'HumanLoopOutput': {'OutputS3Uri': 's3://sagemaker-us-west-2-355444812467/a2i-results/fd-sagemaker-au

### Collect data from a2i to build the training data for the next round 

In [50]:
queue.url

'https://us-west-2.queue.amazonaws.com/355444812467/a2itasks'

In [106]:
sqs = boto3.client('sqs')
completed_human_loops = []
while True: 
    response = sqs.receive_message(
        QueueUrl=queue.url,

        MaxNumberOfMessages=10,
        MessageAttributeNames=[
            'All'
        ],
        VisibilityTimeout=10,
        WaitTimeSeconds=0
    )
    if 'Messages' not in response: 
        break 
    messages = response['Messages']

    for m in messages: 
        task = json.loads(m['Body'])['detail']
        name = task['humanLoopName']
        output_s3 = task['humanLoopOutput']['outputS3Uri']
        completed_human_loops.append((name, output_s3))
        receipt_handle = m['ReceiptHandle']

        # Delete received message from queue
        sqs.delete_message(
            QueueUrl=queue.url,
            ReceiptHandle=receipt_handle
        )
    
print(completed_human_loops)

[('bb4f0a36-20d7-4003-b1af-a00239af85a4', 's3://sagemaker-us-west-2-355444812467/a2i-results/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03/2021/07/17/03/40/37/bb4f0a36-20d7-4003-b1af-a00239af85a4/output.json'), ('086bef32-0281-4e1f-bca7-54ae3e0ceb8e', 's3://sagemaker-us-west-2-355444812467/a2i-results/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03/2021/07/17/03/18/41/086bef32-0281-4e1f-bca7-54ae3e0ceb8e/output.json'), ('6c656b22-b15f-4bce-8bcf-766087042042', 's3://sagemaker-us-west-2-355444812467/a2i-results/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03/2021/07/17/03/47/13/6c656b22-b15f-4bce-8bcf-766087042042/output.json'), ('88697bd6-cf8e-4584-9cc8-b0c81473232c', 's3://sagemaker-us-west-2-355444812467/a2i-results/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03/2021/07/17/03/00/32/88697bd6-cf8e-4584-9cc8-b0c81473232c/output.json'), ('ed608df8-9f26-4e42-ae69-c9eddcad84c3', 's3://sagemaker-us-west-2-355444812467/a2i-results/fd-sagemaker-au

### View Task Results  

Once work is completed, Amazon A2I stores results in your S3 bucket and sends a Cloudwatch event. Your results should be available in the S3 OUTPUT_PATH when all work is completed. Note that the human answer, the label and the bounding box, is returned and saved in the json file.

In [107]:
import re
import pprint

pp = pprint.PrettyPrinter(indent=4)

for name, s3_output_path in completed_human_loops:
    splitted_string = re.split('s3://' +  BUCKET + '/',s3_output_path)
    output_bucket_key = splitted_string[1]

    response = s3.get_object(Bucket=BUCKET, Key=output_bucket_key)
    content = response["Body"].read()
    json_output = json.loads(content)
    pp.pprint(json_output)
    print('\n')

{   'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03',
    'humanAnswers': [   {   'acceptanceTime': '2021-07-17T03:47:58.469Z',
                            'answerContent': {'sentiment': {'label': 'Other'}},
                            'submissionTime': '2021-07-17T03:48:04.426Z',
                            'timeSpentInSeconds': 5.957,
                            'workerId': 'ec24e95d77c7c040',
                            'workerMetadata': {   'identityData': {   'identityProviderType': 'Cognito',
                                                                      'issuer': 'https://cognito-idp.us-west-2.amazonaws.com/us-west-2_9iI2T8Z22',
                                                                      'sub': 'c138e2f0-d140-4ed8-8c98-16ed048e51ed'}}}],
    'humanLoopName': 'bb4f0a36-20d7-4003-b1af-a00239af85a4',
    'inputContent': {   'initialValue': 0.9991849064826965,
                 

{   'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03',
    'humanAnswers': [   {   'acceptanceTime': '2021-07-17T03:50:18.478Z',
                            'answerContent': {'sentiment': {'label': 'Crying'}},
                            'submissionTime': '2021-07-17T03:50:23.573Z',
                            'timeSpentInSeconds': 5.095,
                            'workerId': 'ec24e95d77c7c040',
                            'workerMetadata': {   'identityData': {   'identityProviderType': 'Cognito',
                                                                      'issuer': 'https://cognito-idp.us-west-2.amazonaws.com/us-west-2_9iI2T8Z22',
                                                                      'sub': 'c138e2f0-d140-4ed8-8c98-16ed048e51ed'}}}],
    'humanLoopName': '902ff03a-65a1-4f73-8094-c0e4dff13bb1',
    'inputContent': {   'initialValue': 0.9900847673416138,
                

{   'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03',
    'humanAnswers': [   {   'acceptanceTime': '2021-07-17T03:49:56.362Z',
                            'answerContent': {   'sentiment': {   'label': 'Barking'}},
                            'submissionTime': '2021-07-17T03:50:01.746Z',
                            'timeSpentInSeconds': 5.384,
                            'workerId': 'ec24e95d77c7c040',
                            'workerMetadata': {   'identityData': {   'identityProviderType': 'Cognito',
                                                                      'issuer': 'https://cognito-idp.us-west-2.amazonaws.com/us-west-2_9iI2T8Z22',
                                                                      'sub': 'c138e2f0-d140-4ed8-8c98-16ed048e51ed'}}}],
    'humanLoopName': '426b7113-8d9b-45e4-b91f-03474e62db59',
    'inputContent': {   'initialValue': 0.7294201254844666,
         

{   'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03',
    'humanAnswers': [   {   'acceptanceTime': '2021-07-17T03:49:37.396Z',
                            'answerContent': {'sentiment': {'label': 'Other'}},
                            'submissionTime': '2021-07-17T03:49:48.804Z',
                            'timeSpentInSeconds': 11.408,
                            'workerId': 'ec24e95d77c7c040',
                            'workerMetadata': {   'identityData': {   'identityProviderType': 'Cognito',
                                                                      'issuer': 'https://cognito-idp.us-west-2.amazonaws.com/us-west-2_9iI2T8Z22',
                                                                      'sub': 'c138e2f0-d140-4ed8-8c98-16ed048e51ed'}}}],
    'humanLoopName': '15eb082d-14d8-4a36-afab-9d5a9ead2062',
    'inputContent': {   'initialValue': 0.3480794131755829,
                

## Incremental training with SageMaker
Now that we have used the model to generate prediction on some random out-of-sample images and got unsatisfactory prediction (low probability). We also demonstrated how to use Amazon Augmented AI to review and label the image based on custom criteria. Next step in a typical machine learning life cycle is to include these cases with which the model has trouble in the next batch of training data for retraining purposes so that the model can now learn from a set of new training data to improve the model. In machine learning we call it [incremental training](https://docs.aws.amazon.com/sagemaker/latest/dg/incremental-training.html).

Now we can obtain the result of a2i tasks and formulated the information into the format of our training data - 
* the meta data in csv file format
```
Filename,Label,Remark
train_00021,1,Howling
```
* and associating audio files on s3 

In [108]:
object_categories_dict = {j: i for i, j in enumerate(object_categories)}

def convert_a2i_to_augmented_manifest(a2i_output):
    label = a2i_output['humanAnswers'][0]['answerContent']['sentiment']['label']
    s3_path = a2i_output['inputContent']['taskObject']
    filename = s3_path.split('/')[-1][:-4]
    label_id = str(object_categories_dict[label]) 
    return '{},{},{}'.format(filename, label_id, label), s3_path



In [109]:
object_categories_dict

{'Barking': 0,
 'Howling': 1,
 'Crying': 2,
 'COSmoke': 3,
 'GlassBreaking': 4,
 'Other': 5,
 'Doorbell': 6,
 'Bird': 7,
 'Music_Instrument': 8,
 'Laugh_Shout_Scream': 9}

This function will take an A2I output json and result in a json object that is compatible to how Amazon SageMaker Ground Truth outputs the result and how SageMaker built-in object detection algorithm expects from the input. In order to create a cohort of training images from all the images re-labeled by human reviewers in A2I console. You can loop through all the A2I output, convert the json file, and concatenate them into a JSON Lines file, with each line represents results of one image. 

In [110]:
s3_paths=[]
with open('augmented.manifest', 'w') as outfile:
    outfile.write("Filename,Label,Remark\n")
    # convert the a2i json to augmented manifest for each human loop output
    for name, s3_output_path in completed_human_loops:
        splitted_string = re.split('s3://' +  BUCKET + '/', s3_output_path)
        output_bucket_key = splitted_string[1]

        response = s3.get_object(Bucket=BUCKET, Key=output_bucket_key)
        content = response["Body"].read()
        json_output = json.loads(content)
        print(json_output)
        # convert using the function
        augmented_manifest, s3_path = convert_a2i_to_augmented_manifest(json_output)
        s3_paths.append(s3_path)
        outfile.write(augmented_manifest)
        outfile.write('\n')


{'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03', 'humanAnswers': [{'acceptanceTime': '2021-07-17T03:47:58.469Z', 'answerContent': {'sentiment': {'label': 'Other'}}, 'submissionTime': '2021-07-17T03:48:04.426Z', 'timeSpentInSeconds': 5.957, 'workerId': 'ec24e95d77c7c040', 'workerMetadata': {'identityData': {'identityProviderType': 'Cognito', 'issuer': 'https://cognito-idp.us-west-2.amazonaws.com/us-west-2_9iI2T8Z22', 'sub': 'c138e2f0-d140-4ed8-8c98-16ed048e51ed'}}}], 'humanLoopName': 'bb4f0a36-20d7-4003-b1af-a00239af85a4', 'inputContent': {'initialValue': 0.9991849064826965, 'taskObject': 's3://sagemaker-us-west-2-355444812467/a2i-demo/bb4f0a36-20d7-4003-b1af-a00239af85a4.wav'}}
{'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03', 'humanAnswers': [{'acceptanceTime': '2021-07-17T03:50:11.780Z', 'answerContent':

{'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03', 'humanAnswers': [{'acceptanceTime': '2021-07-17T03:56:32.847Z', 'answerContent': {'sentiment': {'label': 'COSmoke'}}, 'submissionTime': '2021-07-17T03:56:40.157Z', 'timeSpentInSeconds': 7.31, 'workerId': 'ec24e95d77c7c040', 'workerMetadata': {'identityData': {'identityProviderType': 'Cognito', 'issuer': 'https://cognito-idp.us-west-2.amazonaws.com/us-west-2_9iI2T8Z22', 'sub': 'c138e2f0-d140-4ed8-8c98-16ed048e51ed'}}}], 'humanLoopName': 'ca225ed6-1fe6-421c-b7a1-9eeaa3b243bc', 'inputContent': {'initialValue': 0.9903430938720703, 'taskObject': 's3://sagemaker-us-west-2-355444812467/a2i-demo/ca225ed6-1fe6-421c-b7a1-9eeaa3b243bc.wav'}}
{'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03', 'humanAnswers': [{'acceptanceTime': '2021-07-17T03:56:49.311Z', 'answerContent'

{'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03', 'humanAnswers': [{'acceptanceTime': '2021-07-17T03:45:16.732Z', 'answerContent': {'sentiment': {'label': 'Barking'}}, 'submissionTime': '2021-07-17T03:47:58.379Z', 'timeSpentInSeconds': 161.647, 'workerId': 'ec24e95d77c7c040', 'workerMetadata': {'identityData': {'identityProviderType': 'Cognito', 'issuer': 'https://cognito-idp.us-west-2.amazonaws.com/us-west-2_9iI2T8Z22', 'sub': 'c138e2f0-d140-4ed8-8c98-16ed048e51ed'}}}], 'humanLoopName': 'e905fe12-60fb-46c6-bb30-037bf6ac59d1', 'inputContent': {'initialValue': 0.5870192646980286, 'taskObject': 's3://sagemaker-us-west-2-355444812467/a2i-demo/e905fe12-60fb-46c6-bb30-037bf6ac59d1.wav'}}
{'flowDefinitionArn': 'arn:aws:sagemaker:us-west-2:355444812467:flow-definition/fd-sagemaker-audio-classification-demo-2021-07-17-02-03-03', 'humanAnswers': [{'acceptanceTime': '2021-07-17T03:48:13.289Z', 'answerConte

In [111]:
# take a look at how Json Lines looks like
!head -n100 augmented.manifest

Filename,Label,Remark
bb4f0a36-20d7-4003-b1af-a00239af85a4,5,Other
086bef32-0281-4e1f-bca7-54ae3e0ceb8e,0,Barking
6c656b22-b15f-4bce-8bcf-766087042042,0,Barking
88697bd6-cf8e-4584-9cc8-b0c81473232c,6,Doorbell
ed608df8-9f26-4e42-ae69-c9eddcad84c3,9,Laugh_Shout_Scream
92f1b24c-8b3f-417c-b9b4-a865d44fc0d3,8,Music_Instrument
6796bd83-908c-46e1-b7b3-b14b82e13014,5,Other
e843396e-f7d2-4d2b-8967-9765e26c421d,0,Barking
54e1c8b7-0166-4c71-9c7f-81d937168ca5,0,Barking
dbfa0a1a-7311-4646-aef3-9bd11215780f,3,COSmoke
902ff03a-65a1-4f73-8094-c0e4dff13bb1,2,Crying
b61704b7-2d18-4fe9-90a1-6ed6480b0d15,2,Crying
51b07128-0c80-4ed1-9210-ad2ed7edf5b3,5,Other
ceb20ef1-f03b-4033-a3d9-b4a113f83dc7,4,GlassBreaking
12c51f7c-244c-406d-b61f-7ae89fdaec21,9,Laugh_Shout_Scream
ca225ed6-1fe6-421c-b7a1-9eeaa3b243bc,3,COSmoke
6fbb7391-0444-4400-abc5-cb396ba5d70a,3,COSmoke
1a39a8a1-354a-4bf5-a018-e27e7b10e17c,5,Other
4a90438d-ae32-42ae-ab1a-9c0142d80a56,5,Other
3a71ffbc-01b1-43c1-a9e2-771e6fab8f16,5,

In [112]:
# upload the manifest file to S3
import time;
ts = time.time()


train_path = f"{TRAIN_PATH}/{ts}/competition"

In [113]:
train_path

's3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition'

In [114]:
train_path

's3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition'

In [115]:
s3_paths

['s3://sagemaker-us-west-2-355444812467/a2i-demo/bb4f0a36-20d7-4003-b1af-a00239af85a4.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/086bef32-0281-4e1f-bca7-54ae3e0ceb8e.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/6c656b22-b15f-4bce-8bcf-766087042042.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/88697bd6-cf8e-4584-9cc8-b0c81473232c.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/ed608df8-9f26-4e42-ae69-c9eddcad84c3.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/92f1b24c-8b3f-417c-b9b4-a865d44fc0d3.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/6796bd83-908c-46e1-b7b3-b14b82e13014.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/e843396e-f7d2-4d2b-8967-9765e26c421d.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/54e1c8b7-0166-4c71-9c7f-81d937168ca5.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/dbfa0a1a-7311-4646-aef3-9bd11215780f.wav',
 's3://sagemaker-us-west-2-355444812467/a2i-demo/902ff03a-65a1-4f73-8094-c0e4dff

In [116]:
# !aws s3 cp augmented.manifest {train_path}/meta_train.csv
for s3_path in s3_paths: 
    filename = s3_path.split('/')[-1]
    !aws s3 cp {s3_path} {train_path}/train/{filename} 
!aws s3 cp augmented.manifest {train_path}/meta_train.csv
for s3_path in s3_paths: 
    filename = s3_path.split('/')[-1]
    !aws s3 cp {s3_path} {train_path}/train/{filename} 

upload: ./augmented.manifest to s3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition/meta_train.csv
copy: s3://sagemaker-us-west-2-355444812467/a2i-demo/bb4f0a36-20d7-4003-b1af-a00239af85a4.wav to s3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition/train/bb4f0a36-20d7-4003-b1af-a00239af85a4.wav
copy: s3://sagemaker-us-west-2-355444812467/a2i-demo/086bef32-0281-4e1f-bca7-54ae3e0ceb8e.wav to s3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition/train/086bef32-0281-4e1f-bca7-54ae3e0ceb8e.wav
copy: s3://sagemaker-us-west-2-355444812467/a2i-demo/6c656b22-b15f-4bce-8bcf-766087042042.wav to s3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition/train/6c656b22-b15f-4bce-8bcf-766087042042.wav
copy: s3://sagemaker-us-west-2-355444812467/a2i-demo/88697bd6-cf8e-4584-9cc8-b0c81473232c.wav to s3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition/train/88697bd6-cf8e-4584-9cc8-b0c814732

Completed 78.2 KiB/78.2 KiB (675.9 KiB/s) with 1 file(s) remainingcopy: s3://sagemaker-us-west-2-355444812467/a2i-demo/3de381f5-1720-4c1a-96df-246f4f34f937.wav to s3://sagemaker-us-west-2-355444812467/tomofun/1626494294.0506582/competition/train/3de381f5-1720-4c1a-96df-246f4f34f937.wav


## ---- 跑到這邊就好 -----

Similar to training with Ground Truth output augmented manifest file outlined in this [blog](https://aws.amazon.com/blogs/machine-learning/easily-train-models-using-datasets-labeled-by-amazon-sagemaker-ground-truth/), once we have collected enough data points, we can construct a new `Estimator` for incremental training. 

For incremental training, the choice of hyperparameters becomes critical. Since we are continue the learning and optimization from the last model, an appropriate starting `learning_rate`, for example, would again need to be determined. But as a rule of thumb, even with the introduction of new, unseen data, we should start out the incremental training with a smaller `learning_rate` and different learning rate schedule (`lr_scheduler_factor` and `lr_scheduler_step`) than that of the previous training job as the optimization has previously reached to a more stable state with reduced learning rate. We should see a similar mAP performance on the original validation dataset in the first epoch in the incremental training. 

We here will be using the hyperparameters exactly the same as how the first model was trained in the [training notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb), with the following exceptions

- smaller learning rate (`learning_rate` was 0.001, now 0.0001)
- using the weights from the trained model instead of pre-trained weights that comes with the algorithm (`use_pretrained_model=0`).

Note that the following working code snippet is meant to demonstrate how to set up the A2I output for training in SageMaker with object detection algorithm. Incremental training with merely 1 or 2 new samples and untuned hyperparameters, would not yield a meaning model, if not experiencing [catastrophic forgetting](https://en.wikipedia.org/wiki/Catastrophic_interference).

*The next cell would take about 5 minutes.*

In [154]:
%store -r model_s3_path

In [None]:
# path definition
s3_train_data = train_path
# Reusing the training data for validation here for demonstration purposes
# but in practice you should provide a set of data that you want to validate the training against
s3_validation_data = train_path 
s3_output_location = f'{OUTPUT_PATH}/incremental-training'

# num_training_samples = len(output)
num_training_samples = 3 

# Create a model object set to using "Pipe" mode because we are inputing augmented manifest files.
new_od_model = sagemaker.estimator.Estimator(image_uri, # same object detection image that we used for model hosting  
                                             role, 
                                             instance_count=1, 
                                             instance_type='ml.p3.2xlarge', 
                                             volume_size = 50, 
                                             max_run = 360000, 
                                             input_mode = 'File',
                                             output_path=s3_output_location, 
                                             sagemaker_session=sess) 

# same set of hyperparameters from the original training job
new_od_model.set_hyperparameters(batch_size = 1)

# setting the input data
train_data = sagemaker.inputs.TrainingInput(s3_train_data)
validation_data = sagemaker.inputs.TrainingInput(s3_validation_data)

# Use the output model from the original training job.  
model_data = sagemaker.inputs.TrainingInput(model_s3_path)

data_channels = {'competition': train_data, 
                 'model': model_data}
                 
new_od_model.fit(inputs=data_channels, logs=True, wait=False)

After training, you would get a new model in the `s3_output_location`, you can deploy it to a new endpoint or modify an endpoint without taking models that are already deployed into production out of service. For example, you can add new model variants, update the ML Compute instance configurations of existing model variants, or change the distribution of traffic among model variants. To modify an endpoint, you provide a new endpoint configuration. Amazon SageMaker implements the changes without any downtime. For more information, see [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpoint.html) and [UpdateEndpointWeightsAndCapacities](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateEndpointWeightsAndCapacities.html). 

In [None]:
new_od_model.model_data

In [None]:

incremented_model = sagemaker.model.Model(image_uri, 
                              model_data = new_od_model.model_data,
                              role = role,
                              predictor_cls = sagemaker.predictor.Predictor,
                              sagemaker_session = sess)

new_detector =  sagemaker.predictor.Predictor(endpoint_name = endpoint_name) 
new_detector.update_endpoint(model_name=incremented_model.name, initial_instance_count = 1,
                               instance_type = 'ml.p2.xlarge', wait=False)

### Create a Lambda function pass samples with low confidence to a2i 
<a id="lambda"></a>

In [47]:
%%bash -s "$BUCKET" 
cd invoke_endpoint_a2i 
zip -r invoke_endpoint_a2i.zip  .
aws s3 cp invoke_endpoint_a2i.zip s3://$1/lambda/

updating: lambda_function.py (deflated 58%)
Completed 1.1 KiB/1.1 KiB (13.0 KiB/s) with 1 file(s) remainingupload: ./invoke_endpoint_a2i.zip to s3://sagemaker-us-west-2-355444812467/lambda/invoke_endpoint_a2i.zip


In [48]:
%store -r lambda_role_arn

In [49]:
lambda_role_arn

'arn:aws:iam::355444812467:role/AmazonSageMaker-LambdaExecutionRole'

In [50]:
import os
cwd = os.getcwd()
!aws lambda create-function --function-name invoke_endpoint_a2i --zip-file fileb://$cwd/invoke_endpoint_a2i/invoke_endpoint_a2i.zip  --handler lambda_function.lambda_handler --runtime python3.7 --role $lambda_role_arn 


An error occurred (ResourceConflictException) when calling the CreateFunction operation: Function already exist: invoke_endpoint_a2i


#### Configure lambda function - invoke_image_object_detection 
* you can also do it by command line - 
```
aws lambda update-function-configuration --function-name invoke_image_object_detection \
    --environment "Variables={BUCKET=my-bucket,KEY=file.txt}"
```    
![configure environment variable](../03-lambda-api/content_image/setup_env_vars_for_lambda2.png)

In [1]:
bucket_key = "a2i-demo"
variables = f"A2IFLOW_DEF={flowDefinitionArn},BUCKET={BUCKET},ENDPOINT_NAME={endpoint_name},KEY={bucket_key}"
env = "Variables={"+variables+"}"

!aws lambda update-function-configuration --function-name invoke_endpoint_a2i --environment "$env"

NameError: name 'flowDefinitionArn' is not defined

In [267]:
!aws lambda add-permission \
    --function-name invoke_endpoint_a2i \
    --action lambda:InvokeFunction \
    --statement-id apigateway \
    --principal apigateway.amazonaws.com  

{
    "Statement": "{\"Sid\":\"apigateway\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"apigateway.amazonaws.com\"},\"Action\":\"lambda:InvokeFunction\",\"Resource\":\"arn:aws:lambda:us-west-2:355444812467:function:invoke_endpoint_a2i\"}"
}


### Integrate the Lambda with API Gateway 
* reference to the previous notebook 

### Advanced material - use sagemaker pipeline to manege the training / deployment process 

In [46]:
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
)


train_data = ParameterString(
    name="TrainData",
    default_value=s3_train_data,
)
validation_data = ParameterString(
    name="ValidationData",
    default_value=s3_validation_data,
)
model_data = ParameterString(
    name="ModelData",
    default_value=model_s3_path,
)
model_approval_status = ParameterString(
    name="ModelApprovalStatus",
    default_value="Approved"
)


NameError: name 's3_train_data' is not defined

In [None]:
from sagemaker.workflow.steps import TrainingStep


step_train = TrainingStep(
    name="AudioClassificationTraining",
    estimator=new_od_model,
    inputs={
        "competition": sagemaker.inputs.TrainingInput(train_data, 
                                            distribution='FullyReplicated'), 
        "validation":sagemaker.inputs.TrainingInput(validation_data, 
                                                 distribution='FullyReplicated'), 
        "model":sagemaker.inputs.TrainingInput(model_data, 
                                            distribution='FullyReplicated')
    },
)

In [None]:
import time 
from sagemaker.workflow.step_collections import CreateModelStep
model_name='audio-vgg16-'+str(int(time.time())) 

model = sagemaker.model.Model(
    name=model_name,
    image_uri=step_train.properties.AlgorithmSpecification.TrainingImage,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=sess,
    role=role
)

inputs = sagemaker.inputs.CreateModelInput(
    instance_type="ml.m4.xlarge"
)

create_model_step = CreateModelStep(
    name="ModelPreDeployment",
    model=model,
    inputs=inputs
)


In [None]:
from sagemaker.workflow.step_collections import RegisterModel
model_package_group_name = f"AudioClassificationGroupModel" 
step_register = RegisterModel(
    name="AudioClassificationModel",
    estimator=new_od_model,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["application/octet-stream"],
    response_types=["application/json"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
)

In [None]:
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.workflow.steps import ProcessingStep



deploy_model_processor = SKLearnProcessor(
    framework_version='0.23-1',
    role=role,
    instance_type="ml.m5.large",
    instance_count=1,
    sagemaker_session=sess)

deploy_step = ProcessingStep(
    name='DeployModel',
    processor=deploy_model_processor,
    job_arguments=[
        "--model-name", create_model_step.properties.ModelName,
        "--endpoint-name", endpoint_name, 
        "--region", region],
    code="./deploy_model.py")

In [None]:
endpoint_name

In [None]:
pipeline_name="AudioClassification"
from sagemaker.workflow.pipeline import Pipeline
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        train_data, validation_data, model_data, model_approval_status 
    ],
    steps=[ step_train, step_register, create_model_step, deploy_step],
)

In [None]:
json.loads(pipeline.definition())


In [None]:
pipeline.upsert(role_arn=role)

In [None]:
execution = pipeline.start()

### More on incremental training
It is recommended to perform a search over the hyperparameter space for your incremental training with [hyperparameter tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html) for an optimal set of hyperparameters, especially the ones related to learning rate: `learning_rate`, `lr_scheduler_factor` and `lr_scheduler_step` from the SageMaker object detection algorithm. We have an [example](https://github.com/aws/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/image_classification_early_stopping/hpo_image_classification_early_stopping.ipynb) of running a hyperparameter tuning job using Amazon SageMaker Automatic Model Tuning feature. Please try it out!

## The End, but....!
This is the end of the example. Remember to execute the next cell to delete the endpoint otherwise it will continue to incur charges.

In [None]:
%store flowDefinitionArn 
%store endpoint_name
%store model_package_group_name 
%store pipeline_name
%store role
%store lambda_role_arn
#object_detector.delete_endpoint()