# Getting Started

In this tutorial, we will be showing you how you can use AWS Augmented AI (A2) directly with your calls to Textract's Analyze Document API. For more in depth instructions, visit https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-getting-started.html#a2i-getting-started-prerequisites (**note**: the Prerequisites on this site must be completed prior to beginning this tutorial)

To incorporate Amazon A2I into your data labeling workflow for all task types, you need three resources:

* A **worker task template** to create a worker UI. The worker UI displays your input data, such as documents or images, and instructions to workers. It also provides interactive tools that the worker uses to complete your tasks. For more information, see Create a Worker UI.

* A **human review workflow**, also referred to as a flow definition. You use the flow definition to configure your human workforce and provide information about how to accomplish the labeling task. For built-in task types, you also use the flow definition to identify the conditions under which a review human loop is triggered. For example, Amazon Rekognition can perform image content moderation using machine learning. You can use the flow definition to specify that an image will be sent to a human for content moderation review if Amazon Rekognition's confidence is too low. You can create a flow definition in the Amazon SageMaker console or with the Amazon SageMaker API. To learn more about both of these options, see Create a Flow Definition.

* A **human loop** to start your human review workflow. When you use one of the built-in task types, the corresponding AWS service creates and starts a human loop on your behalf when the conditions specified in your flow definition are met or for each object if no conditions were specified. When a human loop is triggered, human review tasks are sent to the workers as specified in the flow definition.

When using a custom task type, you start a human loop using the Amazon Augmented AI Runtime API. When you call StartHumanLoop in your custom application, a task is sent to human reviewers.

In [None]:
!pip install --upgrade pip
!pip install boto3 --upgrade
!pip install -U botocore

In [None]:
import io
from io import BytesIO
import sys
import boto3
from PIL import Image, ImageDraw, ImageFont
import json

In [None]:
sagemaker_client = boto3.client('sagemaker')

## Creating the Worker Task Template

Since we are integrating A2I with Textract, we can create the template in the Console using default templates provided by A2I, to make the process easier (https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-instructions-overview.html). 

Following these instructions: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html#create-human-review-console, we've created our Worker Task Template and the following Task UI Arn was created:

In [None]:
taskUiArn = 'arn:aws:sagemaker:us-east-1:053520186210:human-task-ui/ronnie-test-template-textract-8'

## Creating the Flow Definition (Human Review Workflow)

In this section, we're going to create a flow definition definition. Flow Definitions allow us to specify:

* For the Amazon Textract and Amazon Rekognition built in task types, the conditions under which your human loop will be called.
* The workforce that your tasks will be sent to.
* The instructions that your workforce will receive. This is called a worker task template.
* The configuration of your worker tasks, including the number of workers that receive a task and time limits to complete tasks.
* Where your output data will be stored.

This demo is going to use the API, but you can optionally create this workflow definition in the console as well. 

For more details and instructions, see: https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html.

#### Setup initial values

##### Let's specify our SageMaker Execution Role Arn:

In [None]:
executionRole = 'arn:aws:iam::053520186210:role/service-role/AmazonSageMaker-ExecutionRole-20191231T143745'

##### WorkTeam (WorkForce) Arn:

As part of Prerequisites, you will have created a work team. A workforce is the group of workers that you have selected to label your dataset. You can choose either:
* the Amazon Mechanical Turk workforce, 
* a vendor-managed workforce, or 
* you can create your own private workforce

Whichever workforce type you choose, Amazon SageMaker takes care of sending tasks to workers. 

In case you have not already created a workteam, continue here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management.html

In [None]:
workTeam = 'arn:aws:sagemaker:us-east-1:053520186210:workteam/private-crowd/textract-private-workteam'

##### The name we want for this Flow Definition

In [None]:
flowDefinitionName = 'textract-demo-99-percent-with-important-form-keys-2'

##### The s3 path which A2I will send results to

In [None]:
s3OutputPath = 's3://053520186210-aws-textract-testing/output'

#### Specify Human Loop Activation Conditions

Since we are using a built-in integration type for A2I (certain Textract and Rekognition APIs), we can use Human Loop Activation Conditions to provide conditions that trigger a human loop.

Here we are specifying that for any Key in our document, if the confidence returned by Textract is below 100 or above 0, it should be sent to a human for review (this means everything will be sent to a human). 

In [None]:
# Visit https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-human-fallback-conditions-json-schema.html for more information on this schema.

humanLoopActivationConditions = json.dumps({
  "Conditions": [
    {
      "And": [
        {
          "ConditionType": "ImportantFormKeyConfidenceCheck",
          "ConditionParameters": {
            "ImportantFormKey": "*",
            "KeyValueBlockConfidenceLessThan": 100,
            "WordBlockConfidenceLessThan": 100
          }
        },
        {
          "ConditionType": "ImportantFormKeyConfidenceCheck",
          "ConditionParameters": {
            "ImportantFormKey": "*",
            "KeyValueBlockConfidenceGreaterThan": 0,
            "WordBlockConfidenceGreaterThan": 0
          }
        }
      ]
    }
  ]
})

#### Now we are ready to create our Flow Definition!

In [None]:
create_workflow_definition_response = sagemaker_client.create_flow_definition(
        FlowDefinitionName= flowDefinitionName,
        RoleArn= executionRole,
        HumanLoopConfig= {
            "WorkteamArn": workTeam,
            "HumanTaskUiArn": taskUiArn,
            "TaskCount": 1,
            "TaskDescription": "Document analysis sample task description",
            "TaskTitle": "Document analysis sample task"
        },
        HumanLoopActivationConfig={
            "HumanLoopRequestSource": {
                "AwsManagedHumanLoopRequestSource": "AWS/Textract/AnalyzeDocument/Forms/V1"
            },
            "HumanLoopActivationConditionsConfig": {
                "HumanLoopActivationConditions": humanLoopActivationConditions
            }
        },
        OutputConfig={
            "S3OutputPath" : s3OutputPath
        }
    )
flowDefinitionArn = create_workflow_definition_response['FlowDefinitionArn'] # let's save this ARN for future use

## Calling Textract to Analyze Document with A2I

Now that we have setup our Flow Definition, all that's left is calling Textract's Analyze Document API, and including our A2I paramters in the HumanLoopConfig.

#### Let's give our human loop a name

In [None]:
humanLoopName = 'textract-analyze-document-demo-a2i-2'

In [None]:
textract_client = boto3.client('textract')

In [None]:
humanLoopConfig = {
    'FlowDefinitionArn':flowDefinitionArn,
    'HumanLoopName':humanLoopName, 
    'DataAttributes': { 'ContentClassifiers': [ 'FreeOfPersonallyIdentifiableInformation' ]}
}

In [None]:
analyze_document_response = textract_client.analyze_document(
    Document={'S3Object': {'Bucket': '053520186210-aws-textract-testing', 'Name': 'invoice-1.jpg'}},
    FeatureTypes=["TABLES", "FORMS"], 
    HumanLoopConfig=humanLoopConfig
)

In addition to the standard Textract response body, the response includes a new field called "HumanLoopActivationOutput" which gives us information about our Human Loop...

In [None]:
display(analyze_document_response['HumanLoopActivationOutput']['HumanLoopArn'])
display(analyze_document_response['HumanLoopActivationOutput']['HumanLoopActivationReasons'])
display(analyze_document_response['HumanLoopActivationOutput']['HumanLoopActivationConditionsEvaluationResults'])

## Monitoring Human Loop for Completion

A2I gives the user the ability to monitor the human loop until all the work has been completed by the selected workforce. Using the A2I runtime client, we can check on our human loop and get updates as fast as we need.

In [None]:
a2i_runtime_client = boto3.client('sagemaker-a2i-runtime')

In [None]:
describe_human_loop_response = a2i_runtime_client.describe_human_loop(
    HumanLoopName=humanLoopName
)

In [None]:
display(describe_human_loop_response['HumanLoopStatus'])
display(describe_human_loop_response['HumanLoopOutput'])