# Setting up Amazon Augmented AI (A2I)

This notebook will guide you through setting up your A2I human review flow and connecting it to the Textract demo solution.

You can read the instructions and follow through manually in the [AWS Console](https://console.aws.amazon.com/console/home), or load this file in an [Amazon SageMaker](https://console.aws.amazon.com/sagemaker/home) **notebook instance** or **SageMaker Studio** to benefit from the automations in the code cells.

For more resources, check out:

- The [Getting Started instructions](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-getting-started.html) in the Amazon Textract developer guide
- https://github.com/aws-samples/amazon-a2i-sample-jupyter-notebooks for a list of official sample notebooks usign Amazon A2I!
- https://github.com/aws-samples/amazon-a2i-sample-task-uis for official sample custom task UIs

## Introduction

[Amazon Augmented AI (Amazon A2I)](https://aws.amazon.com/augmented-ai/) makes it easy to build the workflows required for human review of ML predictions: Removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers.

Amazon A2I provides **built-in human review workflows** for common machine learning use cases, such as content moderation and text extraction from documents, which allows predictions from Amazon Rekognition and Amazon Textract to be reviewed easily. You can also create your own **custom workflows** for models built on Amazon SageMaker or any other tools.

In this tutorial, we'll set up a **custom** workflow because what we want to review is the **business level field extraction** after post-processing (vendor, date, total, etc.) - not the raw Textract results (every key-value pair detected).


Amazon A2I defines 3 key resource types we'll need to be aware of:

* **Worker task templates** (either pre-built or custom) define the annotation UI. The worker UI displays your input data, such as documents or images, and instructions to workers. It also provides interactive tools that the worker uses to complete your tasks. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-instructions-overview.html

* **Human review workflows** (or 'flow definitions') define the review process. You use the flow definition to configure your human workforce and provide information about how to accomplish the human review task. For built-in task types, you also use the flow definition to identify the conditions under which a review human loop is triggered. For example, with Amazon Textract can analyze text in a document using machine learning. You can use the flow definition to specify that a document will be sent to a human for content moderation review if Amazon Textracts's confidence score output is low for any or all pieces of text returned by Textract. You can create a flow definition in the Amazon Augmented AI console or with the Amazon A2I APIs. To learn more about both of these options, see https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-create-flow-definition.html

* **Human loops** are the individual executions of a review workflow. When you use one of the built-in task types, the corresponding AWS service creates and starts a human loop on your behalf when the conditions specified in your flow definition are met or for each object if no conditions were specified. When a human loop is triggered, human review tasks are sent to the workers as specified in the flow definition.

When using a custom task type, you start a human loop using the Amazon Augmented AI Runtime API. When you call StartHumanLoop in your custom application, a task is sent to human reviewers.

## Python Notebook Setup

<div class="alert alert-info">You can skip this section if you'll be following through in the AWS Console, rather than using the code cells</div>

To run the programmatic/code sections of the notebook, we'll need to first import required libraries:

In [None]:
# Python Built-Ins:
from datetime import datetime
import json
import uuid
import time

# External Dependencies:
import boto3  # The AWS SDK for Python
import sagemaker  # A high-level SDK specifically for Amazon SageMaker

We'll also connect to and configure the various AWS services we'll be using up-front:

In [None]:
smsess = sagemaker.session.Session()
smsess.boto_region_name

a2iruntime = boto3.client("sagemaker-a2i-runtime")
lambdaclient = boto3.client("lambda")
smclient = boto3.client("sagemaker")

notebook_role = sagemaker.get_execution_role()

## Human Workteam (or Workforce)

A **workforce** is the group of workers that you have selected to perform your task: Whether it's a review loop with Amazon A2I, or an offline annotation exercise with [Amazon SageMaker Ground Truth](https://aws.amazon.com/sagemaker/groundtruth/).

You can use either:

- The Amazon Mechanical Turk workforce (the 'public crowd')
- A vendor-managed service, through AWS Marketplace, or
- Your own defined **private teams** - which we'll use in this example

Whichever workforce type you choose, Amazon Augmented AI takes care of sending tasks to the workers.

▶️ In the [Amazon SageMaker Ground Truth Console](https://console.aws.amazon.com/sagemaker/groundtruth?#/labeling-workforces):
- Go to the *Ground Truth > Labelling Workforces* tab in the sidebar
- Select the *Private* tab
- Click **Invite new workers** to invite yourself (and/or your workers) to join the workforce by email
- You'll receive an email with a link to sign up, where you'll be asked to set a password: The private workforce portal uses SSO, so workers don't require access to your AWS Console.
- Once your workers are onboarded, click **Create private team**
- Choose a name for your team (maybe `just-me`, if it's individual for development?), and leave other settings as default
- Once the team is created, choose it from the list to enter the team detail page and then **add workers** to the team.

Note that team SNS notifications are supported for SageMaker Ground Truth annotation jobs, but not (at the time of writing!) for Amazon A2I review tasks.

You should now have a private team set up with you and/or your workers registered and able to log in to the **labelling portal**.

![Workforce management screenshot](images/workforce-management-screenshot.png "Workforce management screenshot")

For more help on workforce setup, see [this section of the Amazon SageMaker developer guide](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-workforce-management.html).

After you have created your workteam, replace YOUR_WORKTEAM_ARN below

In [None]:
workteam_arn = "arn:aws:sagemaker:us-east-1:???" TODO

## Worker Task Template

The [task template](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-instructions-overview.html) defines the UI presented for the task at hand. For [built-in task types](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-task-types-general.html), the task template is pre-defined with no extra hacking required.

For custom tasks like ours though, we can [create custom worker templates](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-custom-templates.html) as HTML files based on the [Liquid template language](https://shopify.github.io/liquid/basics/introduction/) with [SageMaker Crowd HTML elements](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-ui-template-reference.html).

In this solution we've prepared a template for you ([a2i-text-with-checkboxes.liquid.html](a2i-text-with-checkboxes.liquid.html)), but it's worth diving in to how you can edit and test your own!

### Debugging Task Templates

Your template will be used *inside the A2I application* and *with reference to input data*: So you'll see that opening the `.liquid.html` file in your browser/notebook won't render very much interesting... You'll want a way to **see how it renders** without having to create a flow and trigger an A2I loop every time.

Fortunately, that's exactly what the [RenderUiTemplate](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_RenderUiTemplate.html) API (and equivalent Boto3 [render_ui_template()](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.render_ui_template) function) give us!

The code sample below renders our sample template to `render.html`. You can find your raw input bucket through the [Amplify Console](https://console.aws.amazon.com/amplify/home) under *Backend Environments > (Your env) > Storage*, but can just leave a placeholder `s3://` URI if you struggle - the template will just render with a broken image tag.

In [None]:
with open("a2i-text-with-checkboxes.liquid.html", "r") as fin:
    with open("render.html", "w") as fout:
        result = smclient.render_ui_template(
            RoleArn=notebook_role,
            Task={
                "Input": json.dumps({
                    "taskObject": "s3://MY-INPUT-BUCKET/A-RECEIPT-IMAGE.jpg", TODO
                    "vendor": {
                        "confidence": 40,
                        "value": "Amazon Web Services",
                    },
                    "date": {
                        "confidence": 90,
                        "value": "2020-07-07",
                    },
                    "total": {
                        "confidence": 65,
                        "value": "64.30",
                    }
                })
            },
            UiTemplate={
                "Content": fin.read(),
            }
        )
        print(result.get("Errors", "No errors!"))
        fout.write(result["RenderedContent"])

If opening `render.html` from SageMaker JupyterLab, click the "Trust HTML" button in the toolbar to see the true output. If running through this notebook locally, you can just open it in your browser!

![Screenshot of rendered template](images/a2i-render-screenshot.png "Screenshot of rendered template")

Note a few important design points:

1. In this example we're **collecting human annotation data** for fields we don't even have modelled/integrated yet! (Tax). A2I reviews get saved to S3, so we can use a flow like this to combine human reviews with new models for future use.
1. The task input data structure (apart from `taskObject`) is something **we control**
    * Nested objects and lists are fine, so long as our task template is expecting the same format of data our Lambda function is starting human loops with.
    * **We chose** to expose the model confidence scores to reviewers (which might be helpful or harmful!) via the bar charts to direct their attention.
    * Simplifications (like `"vendor": "Amazon Web Services"`) or extra complexity (like `"value": { "currency": "GBP", "amount": 64.30 }`) would both be fine.
1. **Extra and external JS and CSS** is supported
    * We've used [Bootstrap Grid](https://getbootstrap.com/docs/4.0/layout/grid/) to make a layout that responds to the browser screen size
    * We've added a custom `handleCheckBoxClick()` listener to link checkboxes to text inputs, so that reviewers must explicitly indicate when a field isn't present

See the [official samples repo](https://github.com/aws-samples/amazon-a2i-sample-task-uis) for many more examples of custom A2I task UIs extending on the pre-built components!

### Uploading the Task Template

Once we're happy our template presents a good UI for the task structure we expect, it's time to upload it to Amazon A2I.

▶️ **EITHER** In the [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/groundtruth?#/labeling-workforces):
- Select *Augmented AI > Worker task templates* from the sidebar
- Click **Create template**
- Give your template version a name (⚠️ bearing in mind that, at the time of writing, there's easy no way to delete it!)
- Select type *Custom*, and **paste the contents** of your template liquid.html (⚠️ **not** the `render.html`!)
- ...and click Create!

▶️ **OR INSTEAD**, you can create it programmatically with the code below:

In [None]:
template_name = "text-with-checkboxes-" + datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

with open("a2i-text-with-checkboxes.liquid.html", "r") as fin:
    create_template_response = smclient.create_human_task_ui(
        HumanTaskUiName=template_name,
        UiTemplate={ "Content": fin.read() }
    )
print(create_template_response)

In [None]:
# We'll use this ARN later:
task_ui_arn = create_template_response["HumanTaskUiArn"]

## Creating the Flow

Now we have our work team and task UI template set up, it's time to define the **A2I Workflow** which ties them together.

**To locate your target review output bucket:**
- The OCR stack has already created your human review output bucket for you, with a name that should include `humanreviewsbucket`.
- If you can't locate the bucket in the [S3 Console](https://s3.console.aws.amazon.com/s3/home), you can search the [CloudFormation Console](https://console.aws.amazon.com/cloudformation/home) for your deployed *Stack*
- If you deployed the solution via AWS Amplify, your CloudFormation stack name will be something like `amplify-{APPNAME}-{SOME-STUFF}-processing-{BRANCH}`. If you deployed just the OCR components through AWS SAM, you will have named your stack yourself.
- `HumanReviewsBucket` will be listed in the **Resources** tab of the correct CloudFormation stack, with a hyperlink to view the bucket in the S3 Console.

> ⚠️ It's important to select the right bucket, because the OCR stack has already configured for new objects in that bucket to trigger the HumanReviewCallback Lambda function. If you send reviews somewhere else, your pipeline will not continue once the review is completed!

Once you've found the **name** of your reviews bucket, record it below:

In [None]:
reviews_bucket_name = # TODO: e.g. "amplify-{APPNAME}-{SOME-STUFF}-humanreviewsbucket-{RANDOM}"

Now you should have all the information you need to create the workflow:

▶️ **EITHER** In the [Amazon SageMaker Console](https://console.aws.amazon.com/a2i/home?#/create-human-review-workflows):
- Select *Augmented AI > Human review workflows* from the sidebar and click the **Create** button
- Give your workflow a name (workflows can be edited/deleted later)
- **Enter your output S3 location** which should be an S3 URI e.g. `s3://MY-REVIEW-BUCKET/reviews`
- Select to **Create a new IAM Role**, unless you have one you know you can use already
- Choose **Task Type: Custom**
- Select the **Task Template** we created earlier, and enter a short description (e.g. "Review and amend the extracted fields")
- Choose **your Private work team** that we created earlier
- ...and click Create!

▶️ **OR INSTEAD**, you can create it programmatically with the code below:

In [None]:
flow_definition_name = ""

create_flow_response = sagemaker.create_flow_definition(
    FlowDefinitionName=flow_definition_name,
    RoleArn=#TODO - You'll need to create a role with appropriate permissions if you don't have one already
    HumanLoopConfig= {
        "WorkteamArn": workteam_arn,  # From earlier
        "HumanTaskUiArn": task_ui_arn,  # From earlier
        "TaskCount": 1,
        "TaskDescription": "Review and amend the extracted fields",
        "TaskTitle": "Receipt Extraction Review Task"
    },
    OutputConfig={
        "S3OutputPath" : f"s3://{reviews_bucket_name}/reviews"
    }
)

print(create_flow_response)

In [None]:
# We'll need this ARN later:
flow_arn = create_flow_response["FlowDefinitionArn"]

### (Optional) Testing the Flow

To test whether your flow routes to reviewers and records output as expected, you may like to send a test request outside of the E2E OCR flow.

As you'd expect the method is very similar to the example rendering method from debugging our template, but with a couple of extra parameters:

In [None]:
a2i_response = a2iruntime.start_human_loop(
    HumanLoopName=str(uuid.uuid4()),  # We've just used a random ID here
    FlowDefinitionArn=flow_arn
    HumanLoopInput={
        'InputContent': json.dumps({
            "taskObject": "s3://MY-INPUT-BUCKET/A-RECEIPT-IMAGE.jpg", TODO
            "vendor": {
                "confidence": 40,
                "value": "Amazon Web Services",
            },
            "date": {
                "confidence": 90,
                "value": "2020-07-07",
            },
            "total": {
                "confidence": 65,
                "value": "64.30",
            }
        })
    },
    DataAttributes={
        'ContentClassifiers': [
            'FreeOfPersonallyIdentifiableInformation'
        ]
    }
)

print(a2i_response)

## Linking the Flow to the Deployed Solution

Now we have a defined A2I human review flow which will:
- Assign incoming tasks to the worker team we defined
- Render the tasks and collect annotation results through the UI template we built
- Output each review result to our `HumanReviewsBucket` as a JSON file.

The Human Reviews component of our solution has already deployed:
- The Step Functions state machine which orchestrates the end-to-end OCR pipeline
- A `StartHumanReview` Lambda function which creates task objects (including a special token to identify the execution to the pipeline)
- A `HumanReviewCallback` Lambda triggered by new files arriving in our `HumanReviewsBucket`

...So to deploy our workflow, all we need to do is point the `StartHumanReview` Lambda to the review we want.

▶️ **EITHER** In the [AWS Lambda Functions Console](https://console.aws.amazon.com/lambda/home?#/functions):
- Find your `StartHumanReview` function (the full name will have some extra prefixes and a random suffix)
- Scroll down the function's *Configuration* tab (past the source code) until you see the **Environment variables** section
- Click **Edit**
- Add or update the `DEFAULT_FLOW_DEFINITION_ARN` variable to equal the **Workflow ARN** that we just created


▶️ **OR INSTEAD**, you can create it programmatically with the code below:

In [None]:
# TODO: Expose StartHumanReview Lambda ARN from stack deploy to this notebook

# By the time you've tracked down your Lambda ARN in the console, you may as well update the env var there
# too to be honest... But at least this might help if you're re-configuring it a lot!

lambda_arn = # TODO: Find your StartHumanReview Lambda's ARN from the AWS Console

def update_lambda_envvar(lambda_arn, variable, value):
    """Update a single environment variable on a single Lambda Function"""
    print(f"Configuring Lambda:\n{lambda_arn}")

    # Fetch the current configuration:
    lambda_details = lambdaclient.get_function(FunctionName=lambda_arn)

    # Override the one environment variable we want to update:
    Environment = lambda_details["Configuration"]["Environment"]
    print(
        "Updating {}: {} -> {}".format(
            variable,
            Environment["Variables"].get(variable, '[Unset]'),
            value
        )
    )
    Environment["Variables"][variable] = value
    print(Environment)

    response = lambdaclient.update_function_configuration(
        FunctionName=lambda_arn,
        Environment=Environment,
    )
    print("Done!")
    return response

update_lambda_envvar(
    lambda_arn,
    "DEFAULT_FLOW_DEFINITION_ARN",
    flow_arn,
)

That's it!

Your human review component should now be ready to go. If you've completed the rest of your stack setup, you can try it out by uploading files from the web client!

## More Information

To see how we:

- **Take the decision to invoke human review or not** - see the `Check Confidence` state in the AWS Step Functions [state machine definition](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html), [../StateMachine.asl.json](../StateMachine.asl.json)
- **Invoke the A2I "human loop"** - see the source code in the [fn-start-human-review](fn-start-human-review) folder (or the deployed StartHumanReview Lambda function!)
- **Process the results from S3** - see the source code in the [fn-human-review-callback](fn-human-review-callback) folder (or the deployed HumanReviewCallback Lambda function!)

It's also possible to instead:

- **Have Amazon Textract call Amazon A2I directly** using the [HumanLoopConfig](https://docs.aws.amazon.com/textract/latest/dg/API_AnalyzeDocument.html#Textract-AnalyzeDocument-request-HumanLoopConfig) API parameter, and
- **Let Amazon A2I handle the decision logic** for when a review is required, when using the direct-to-service integrations.

For a great worked example of both points, check out the [official sample A2I + Textract notebook](https://github.com/aws-samples/amazon-a2i-sample-jupyter-notebooks/blob/master/Amazon%20Augmented%20AI%20(A2I)%20and%20Textract%20AnalyzeDocument.ipynb) on GitHub!

If you encounter **permissions errors**, check the [Permissions & Security in Amazon A2I](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-permissions-security.html) guide for more details on what's required.

## The End!