# Chapter 2: Data Labeling with SageMaker Ground Truth: Custom Labeling

In this notebook we'll perform the following steps:

* Create a private workforce backed by a Cognito user pool.
* Create a manifest file that lists the items we want to label
* Define a custom Ground Truth labeling workflow, consisting of two Lambda functions and a UI template, and launch a labeling job
* Add a second worker to our private workforce
* Adjust the post-processing part of the workflow to handle input from multiple workers, and launch another labeling job



## Create a private workforce

Before executing the code in this section, review and set the following variables:

* `PoolName`: The name for the user pool in Cognito
* `ClientName`: The name for the Cognito user pool client
* `IdentityPoolName`: The name for the Cognito identity pool
* `Region`: The name of the AWS region you're working in
* `IamRolePrefix`: A prefix to use when naming new IAM roles
* `GroupName`: Name for the Cognito user group
* `DomainName`: Domain name for the Cognito authentication page
* `WorkteamName`: Name for the private work team
* `UserEmail`: User name to use (use a fake email address)
* `Password`: Use a password with at least one upper case character, one symbol, and one number

In [1]:
# Constants

PoolName = 'MyUserPool'
ClientName = 'MyUserPoolClient'
IdentityPoolName = 'MyIdentityPool'
Region = 'us-east-1'
IamRolePrefix = 'MyRole'
GroupName = 'MyGroup'
DomainName = 'MyDomain'
WorkteamName = 'MyTeam'
UserEmail = "me@foo.com"
Password = 'PwTest123!'

In [88]:
from utils.cognito-helper import CognitoHelper
cognito_helper = CognitoHelper(Region, IamRolePrefix)
cognito_helper.create_user_pool(PoolName)
cognito_helper.create_user_pool_client(ClientName)
cognito_helper.create_identity_pool(IdentityPoolName)
cognito_helper.create_group(GroupName)
cognito_helper.create_user_pool_domain(DomainName)

In [48]:
from util.sagemaker-helper import SagemakerHelper
sagemaker_helper = SagemakerHelper(Region, IamRolePrefix)
sagemaker_helper.create_workteam(WorkteamName, 
    cognito_helper.user_pool_id, 
    cognito_helper.group_name, 
    cognito_helper.user_pool_client_id)


arn:aws:sagemaker:us-east-1:102165494304:workteam/private-crowd/rdtest


In [None]:
cognito_helper.update_client(sagemaker_helper.get_workforce_domain())

In [76]:
cognito_helper.add_user(UserEmail, Password)

{'UserConfirmed': False,
 'UserSub': 'dac2455f-692c-4e38-b83b-6afbcd6a57ef',
 'ResponseMetadata': {'RequestId': '057a69f9-609b-4979-ad47-83475ec05eac',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Fri, 26 Mar 2021 00:02:48 GMT',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '72',
   'connection': 'keep-alive',
   'x-amzn-requestid': '057a69f9-609b-4979-ad47-83475ec05eac'},
  'RetryAttempts': 0}}

## Create a manifest file

In this section, you'll need to define:

* The name of your S3 bucket
* The folder (prefix) where you stored the _OpenAQ_ data set.
* The folder (prefix) where you want to store the manifest.

In [49]:
s3_bucket = 'MyS3Bucket'
s3_prefix = 'openaq/realtime/'
s3_prefix_manifest = 'inventory'

In [105]:
sagemaker_helper.create_manifest(s3_bucket, s3_prefix, s3_prefix_manifest)

Processing openaq/realtime/2013-11-27/2013-11-27.ndjson
Got 40 manifest entries


## Create custom workflow

In this section, you must define:

* The folder (prefix) where you want to store the workflow files.
* The name prefix for your Lambda functions.
* The folder (prefix) where you want to store the labeling output.

In [None]:
s3_prefix_workflow = 'workflow'
fn_prefix = 'MyFn'
s3_prefix_labels = 'labels'

In [114]:
sagemaker_helper.create_workflow(s3_bucket, s3_prefix_workflow, fn_prefix, s3_prefix_labels)

## Add another worker

In this section you'll need to define:

* `UserEmail2`: User name to use for second worker (use a fake email address)
* `Password2`: Use a password with at least one upper case character, one symbol, and one number

In [None]:
cognito_helper.add_user(UserEmail2, Password2)

## Launch labeling job for multiple workers

In [None]:
sagemaker_helper.create_workflow_multiple_workers(s3_bucket, s3_prefix_workflow, fn_prefix,        
    s3_prefix_labels, s3_prefix_labels)