# **Data Labeling and Human-in-the-loop Pipelines**

## **Data Labeling**

### **Data Labeling**

![](2024-01-02-17-02-16.png)

![](2024-01-02-17-02-55.png)

![](2024-01-02-17-03-12.png)

Prior to building training and deploying machine learning models, you need data. Again, successful models are built on high quality training data. But collecting and labelling the training data sets involves a lot of time and effort. To build training data sets you need to evaluate and label a large number of data samples. These labeling tasks are usually distributed across more than only one person, adding significant overhead and cost. If there are incorrect labels, the system will learn from the bad information and make inaccurate predictions. Let's discuss the concept of data labeling, common data labeling types, its challenges and ways to efficiently label data in more detail. A variety of studies have shown that the majority of the data scientists time is spent on data preparation tasks such as finding, cleaning, and labeling data. And only about 20% of the time is actually spent on developing the models and creating insight. Why is this? And how can you implement a more efficient data labeling to break this 80 20 rule. To understand the challenges, let's first define data labeling. In machine learning, data labeling is the process of identifying raw data such as images, text files, and videos, among others, and adding one or more meaningful and informative labels to the data. For example, labels might indicate whether a photo contains a dog or a cat which words were uttered in an audio recording, or if an X ray image shows a tumor. Labels provide context so that the machine learning model can learn from it. Today, most practical machine learning models utilize supervised learning. For supervised learning to work, you need a label set of data that the model can learn from, so it can make the correct decisions. In machine learning, a properly labeled data set that you use as the objective standard to train and assess a given model is often called the ground truth. The accuracy of your trained model will depend on the accuracy of your ground truth. So spending the time and resources to ensure highly accurate data labeling is essential. Let's have a look at common types of data labeling. When building a computer vision system, you need to label images or pixels or create a border that fully encloses a digital image known as a bounding box to generate your training data set. You can classify images by type, such as an image showing either a scene from a basketball or a soccer game. Or you can classify images by content defining what's actually in the image itself, such as a human and a vehicle in the example shown here. These are examples of single label and multi label classification. You can also segmented image at the pixel level. The process known as semantic segmentation identifies all pixels that fall under a given label and usually involves applying a colored filler or mask over those pixels. You can use labeled image data to build a computer vision model to automatically categorize images, detect the location of objects, or segment an image. If you need to label video data, you can choose between video classification and video object detection tasks. In a video classification task, you categorize your video clips into specific classes, such as whether the video shows a scene from a concert or sports. In video object detection tasks, you can choose between bounding box, where workers draw bounding boxes around specified objects in your video. Polygon, where you draw polygons around specified objects in your video, such as shown here with the cars example. Polyline, where you draw polylines around specified objects in your video, as shown here in the running track video. Or key point, where you draw key points around specified objects in your video, as shown here in the volleyball game example. Instead of just detecting objects, you can also track objects in video data using the same data labeling techniques shown here. The difference is that instead of looking at the video on an individual video frame by frame basis, you track the movement of objects in a sequence of video frames. In natural language processing, you identify important sections of text or text the text with specify labels to generate your training data set. For example, you may want to identify the sentiment or intent of a text. In a single label classification task, this might be assigning the label positive or negative to a text. Or you might want to assign multiple labels such as positive and inspiring to the text. This would be an example of multi-label classification. With named entity recognition, you apply labels two words within a larger text, for example, to identify places and people. Natural language processing models are used to a text classification, sentiment analysis, named entity recognition, and optical character recognition. The biggest challenge in data labeling is the massive scale. Machine learning models need large labeled data sets. This could be tens of thousands of images to train a computer vision model of thousands of documents to fine tune a natural language model. Another challenge is the need for high accuracy. Machine learning models depend on accurately labeled data. If there are incorrect labels. Again, the system will learn from the bad information and make an accurate prediction. A third challenge is time. Data labeling is time consuming. As discussed, building a training data set can take up to 80% of the data scientists time. So how can you label data more efficiently? To address the previously mentioned challenges, you can combine human labelers with managed data labeling services. These data labeling services provide additional tools to scale the labelling efforts for access to additional human workforces. Train a model based on human feedback, so it can perform automated data labeling. And increase the labeling quality by offering additional features to assist the human labelers. In the next video, I will introduce you to one of those managed data labeling services, Amazon SageMaker Ground Truth. 

![](2024-01-03-11-44-40.png)

![](2024-01-03-11-46-43.png)

![](2024-01-03-11-47-59.png)

![](2024-01-03-11-50-18.png)

![](2024-01-03-11-52-05.png)

![](2024-01-03-11-52-56.png)

![](2024-01-03-11-54-10.png)

### **Data Labeling with Amazon SageMaker Ground Truth**

![](2024-01-03-12-02-09.png)

![](2024-01-03-12-04-06.png)

Ground Truth provides a managed experience where you can set up an entire data labeling job with only a few steps. Ground Truth helps you efficiently perform highly accurate data labeling using data stored in Amazon Simple Storage Service or Amazon S3, using a combination of automated data labeling and human performed labeling. You can choose to use a crowdsourced Amazon Mechanical Turk workforce of over 500,000 labelers, a private team of your co-workers, or one of the third-party data labeling service providers listed on the AWS marketplace, which are pre-screened by Amazon. Let's dive deeper into how you can use Ground Truth to perform data labeling more efficiently and at scale. It is very easy to create a data labeling job and it takes only minutes via the AWS management console or call to an application programming interface or API. First, as part of the data labeling job creation, you provide a pointer to the S3 bucket that contains the input dataset to be labeled, then, you define the labeling task and provide relevant labeling instructions. Ground Truth offers templates for common labeling tasks, where you need to click only a few choices and provide minimal instructions on how to get your data labeled. Alternatively, you can create your own custom template. As the last step of creating a labeling job, you select a human workforce. You can choose between a public crowdsourced workforce, a curated set off third-party data labeling service providers, or your own workers through a private workforce team. Once the labeling job is completed, you can find the labeled data set in Amazon S3. Let's walk through the individual steps one by one. The first step in creating the data labeling job is to set up the input data that you want to label. You store your input dataset in Amazon S3 buckets. In the automated data setup, you only need to provide the S3 location of the dataset you want to label and Ground Truth will identify the dataset and connect the data to your labeling job. As part of this automated setup, Ground Truth creates an input manifest file which identifies the objects you want to label. If you choose manual data setup, you have to provide your own manifest file. An input manifest file contains the list of objects to label. Each line in the manifest file is a complete and valid JSON object and contains either a source-ref or a source JSON key. The source-ref value points to the S3 object to label. This approach is commonly used for binary objects, such as images in your S3 bucket. The source key is used if the object to label is text. In this case, you can start the text directly as the value in the manifest file. Once the labeling job is complete, Ground Truth will create an output manifest file in the S3 bucket as well. The output file will contain the results of the labeling job. Next, you need to select the labeling task. This defines the type of data that needs to be labeled. Ground Truth has several built-in task types which also come with a corresponding pre-built worker tasked templates. If you need to label image data, you can choose between single-label or a multi-label image classification tasks, bounding boxes, semantic segmentation, or label verification, where workers verify existing labels in your dataset. If you need to label video data, you can choose between video clip classification, video object detection, and video object tracking tasks. In video object detection and tracking tasks, you can choose between bounding box, polygon, polyline, or key point. If you need to label text data, you can choose between single label or multi-label text classification or a named entity recognition. You can also define a custom labeling task when the pre-defined tasks don't meet your needs. Ground Truth provides templates that you can use as a starting point. You can also create custom HTML if necessary. Let's have a quick look at how you can build a custom labeling task. 

![](2024-01-03-12-05-42.png)

![](2024-01-03-12-06-43.png)

![](2024-01-03-12-07-17.png)

![](2024-01-03-12-07-55.png)

![](2024-01-03-12-08-13.png)

![](2024-01-03-12-08-38.png)

![](2024-01-03-12-09-31.png)

As part of the custom labeling task, you create custom AWS Lambda functions. The run before and after each data object is sent to the worker. The pre-annotation Lambda function is initiated for and pre-processes each data object sent to your labeling job, before sending it to the workers. The post annotation Lambda function processes the results once worker submit a task. If you specify multiple workers per data object, this function may include a logic to consolidate the annotations. The next step in defining the data labeling job, is to select the human workforce. Use the workforce of your choice, to label your dataset. You can choose your workforce from the following options. You can use the Amazon Mechanical Turk workforce of over 500,000 independent contractors worldwide. If your data requires domain knowledge, or has sensitive data, you can use a private workforce based on your employees or coworkers. You can also choose a vendor company in the AWS marketplace, that specializes in data labeling services. These vendor companies are pre-screened by AWS, and have gone for a SOC 2 compliance, and an AWS security review. In this week's lab assignment, you will work with a private workforce, consisting of yourself, to perform data labeling. As a best practice, you can restrict worker access to tasks, by defining allowable IP addresses. By default, a workforce isn't restricted to specific IP addresses. You can use the update workforce operation, to require that workers use a specific range of IP addresses to access tasks. Workers who attempt to access tasks using any IP address outside the specified ranges, are denied and they get a not found error message on the worker portal. Here's an example of how you can set up a private workforce using the AWS management console. First, navigate to the Amazon SageMaker UI, and select "Labeling workforces" on the Ground Truth menu. Then select "Private" and click "Create private team." The next dialog box will present you with the available options for creating a private team by Amazon Cognito, or OpenID Connect. Amazon Cognito provides authentication, authorization, and user management for apps. This enables your workers to sign indirectly to you the labeling UI with a username and a password. You can use Amazon Cognito to connect to your enterprise identity provider as well. The second option for worker authentication is OpenID Connect, or OIDC, which is an identity layer built on top of the OAuth 2.0 Framework. You can use this option to set up a private work team, with your own identity provider. To finish the workforce setup, you'll provide a team name, and invite new workers by email, or you can import workers from an existing Amazon Cognito user group. You can also enable notifications to notify your workers about available work. The last step in defining the data labeling job, is to create the labeling task UI, which presents the human labeler with a data to label and the labeling instructions. You can think of it as the instructions page. The human task UI is defined as an HTML template. If you have chosen one of the built-in labeling tasks, you can use a predefined tasks template and customize it to your needs. You can also write a custom task template. You can create the task template using HTML, CSS, JavaScript, the Liquid template language, and Crowd HTML Elements. Liquid is used to automate the template. Crowd HTML Elements can be used to include common annotation tools, and to provide the logic you will submit to Ground Truth. The easiest way to get started is to use one of the available samples or built-in templates and customize it to your needs. Here is a sample task UI template, for text classification. You can customize the instructions to match the use case of classifying product reviews for example. In the full instructions section, you can provide more detailed instructions, for how to label the data correctly. In this example, you are instructing the labeler to classify the reviews into the sentiment classes and minus one for negative, zero for neutral, and one for positive. Here you can see the resulting instructions page for the workers. You can see the data to be labeled, as well as the labeling options of minus one, zero, and one, and labeling instructions in the title, and on the left side of the screen. 

![](2024-01-03-12-09-54.png)

![](2024-01-03-12-11-21.png)

![](2024-01-03-12-12-42.png)

![](2024-01-03-12-13-09.png)

![](2024-01-03-12-14-14.png)

![](2024-01-03-12-15-08.png)

![](2024-01-03-12-15-36.png)

![](2024-01-03-12-16-15.png)

### **Data Labeling Best Practices**

![](2024-01-03-12-25-57.png)

![](2024-01-03-12-27-46.png)

There are many techniques to improve the efficiency and accuracy of data labeling. I'll summarize them here and then dive into the details of each. Communicate with the labelers through labeling instructions and make them clear to help ensure high-accuracy. Consolidate annotations to improve label quality. Labeler consensus helps to counteract the error of individual annotators. Labeler consensus involves sending each data set object to multiple annotators and then consolidating the annotations into a single label. Audit labels to verify the accuracy of labels and adjust them as necessary. Use automated data labeling on large data sets. Active learning is the machine learning technique that identifies data that should be labeled by a workers. In ground truth, this functionality is called automated data labeling. Automated data labeling helps to reduce the cost and time that it takes to label your data set compared to using only humans. Automated data labeling is most appropriate when you have thousands of data objects. The minimum number of objects allowed for automated data labeling is 1,250, but it's better to start with 5,000 objects. Another technique to improve the efficiency of data labeling is to reuse prior labeling jobs to create hierarchical labels. This chaining of labeling jobs allows you to use the prior jobs output manifest file as the new jobs input manifest. For example, a first data labeling job could classify objects in an image into cats and dogs. In a second labeling job, you filter their images that contain a dog and add additional labels for the specific dog breed. Now, let's have a look at these techniques in a little bit more detail. First, communicate with the labelers through labeling instructions and make them clear to help ensure high accuracy. For example, provide samples of good and bad annotations. Also, minimize the choice of labels and show only the relevant labels when you assign them to workers. When you define the labeling job, you can also specify how many workers receive the same object to label, this will help to counteract the error in individual annotations. If you send the same task to more than one worker, you need to implement a method for consolidating the annotations. Ground truth provides built-in annotation consolidation logic using consensus-based algorithms. Ground truth provides an annotation consolidation function for each of its predefined labeling tasks, bounding box, image classification named entity recognition, semantic segmentation, and text classification. You can also create custom annotation consolidation functions. In general, these functions first assess the similarity between annotations and then assess the most likely label. In the example shown here, the same text classification task can result in different labels from labeler A, B, and C. In the case of discreet mutually exclusive categories, such as the example shown here, this can be straightforward. One of the most common ways to do this is to take the results of a majority vote between the annotations this weighs the annotations equally. 

![](2024-01-03-12-28-27.png)

![](2024-01-03-12-29-36.png)

![](2024-01-03-12-31-20.png)

Another best practice for data labeling is to audit labels, to verify the accuracy, and adjust the labels as necessary. Assembled data labeling pipeline for achieving this could look like this. In step 1, unlabeled data is labeled via your choice of work team to create the initial labels. In step 2, an audit and adjustment workflow for improving quality can be set up to review and adjust labels. Note that by default, ground truths processes all objects in your input manifest file again in this setup. You can also filter out objects you don't want to verify to reduce time and costs. New adjusted labels are appended to the initial labels from step 1. This makes it easier to calculate the Delta and measure key performance indicators, KPIs. In step 3, you can then use the random selection capability in ground truth to perform an audit that will calculate an error RAID based on the sample data. Automated data labeling is optional, when you choose automated data labeling, part of your data set is labeled using active learning. Again, active learning is a technique that identifies data objects that need to be labeled by humans and data objects that can be labeled by a machine learning model. In this process, a machine learning model for labeling data is first trained on a subset of your raw data that humans have labeled. Where the labeling model has high confidence in its results based on what he has to learn so far, it will automatically apply labels to the raw data. Where the labeling model has lower confidence in its results, it will pass the data to humans to do the labeling. The human generated labels are then provided back to the labeling model for it to learn and improve its ability to automatically label the next set of raw data. Over time, the model can label more and more data automatically and substantially speed up the creation of training datasets. Note that the minimum number of objects allowed for automated data labeling is 1,250, but it is better to have 5,000 objects. The machine learning model for performing the automated data labeling is trained using a SageMaker built-in algorithm that depends on the use case. For example, the model uses the BlazingText algorithm for text classification tasks or the built-in object detection algorithm for bounding box tasks. Another technique for improving the efficiency of data labeling is to reuse prior labeling jobs to create hierarchical labels. This chaining of labeling jobs allows you to use the prior jobs output manifest file as the new jobs input manifest. For example, as shown here, a first data labeling job could classify objects in an image into cats and dogs, and in a second labeling jobs, you could filter the images that contain a dog and that an additional label for the specific dog breed. The result of the second labeling job is an augmented manifest file. Augmented refers to the fact that the manifest file contains the labels from both labeling jobs.

![](2024-01-03-12-32-17.png)

![](2024-01-03-12-32-41.png)

![](2024-01-03-12-33-05.png)

![](2024-01-03-12-33-37.png)

## **Human-in-the-loop Pipelines**

### **Human-In-The-Loop Pipelines**

![](2024-01-02-21-38-51.png)

Some machine learning applications need human oversight to ensure accuracy with sensitive data to help provide continuous improvements and retrain models with updated predictions. However, in these situations, you're often forced to choose between a machine learning only or human only system. Again, what if you want the best of both worlds? Integrating machine learning systems into your workflow while keeping a human eye on the results to provide a required level or position. This concept is called human-in-the-loop pipelines. You can allow human reviewers to step in when a model is unable to make a high confidence prediction, or to audit its prediction on an ongoing basis. For example, in image classification tasks, you can allow human reviewers to verify the appropriate class that has been selected. In the example shown here, a human reviewer would verify that the appropriate dog breed has been assigned by the machine learning model. Another common use case is to review form extraction tasks. Extracting information from scanned employment or mortgage application forms can require human review in some cases due to low quality scans or poor handwriting. Now, let's have a look at how to implement human reviews of model predictions. First, a client application sends input data to your machine learning model, and the model makes a prediction. If the prediction comes back with a high confidence score, the prediction result is returned directly to the client application. If the prediction comes back with a low confidence score, the result is sent for human review. The human reviewers correct the result if needed, and the consolidated results across all human reviewers is returned to the client application. You should also store the reviewed prediction result and make it available to retrain the model, so it can improve over time. Note that the confidence threshold value that decides whether to start a human loop for review will depend on your use case and requirements. Instead of starting a human loop based on the model's prediction confidence, you could also decide to review a random sampling percentage off the model predictions. Now, if you were to implement this human review of ML predictions manually, you would need to coordinate across ML scientists, engineering, and operations teams. You would need to manage a large number of reviewers and write custom software to manage the review tasks. Also, with manual processes, it's difficult to achieve high review accuracy. This is another scenario in which you can benefit from managed services to orchestrate the human-in-the-loop pipelines for you. One of those managed services is Amazon Augmented AI, or Amazon A2I

![](2024-01-03-10-46-02.png)

![](2024-01-03-10-46-29.png)

![](2024-01-03-10-47-07.png)

![](2024-01-03-10-48-00.png)

![](2024-01-03-10-48-35.png)

### **Human-In-The-Loop Pipelines with Amazon Augmented AI (Amazon A2I))**

![](2024-01-03-11-12-19.png)

![](2024-01-03-11-13-20.png)

Let's discuss how you build human-in-the-loop pipelines with Amazon Augmented AI or Amazon A2I. Here is the ambition human-in-the-loop workflow again, to implement human review of model predictions. Amazon A2I makes it easy to build and manage human reviews for machine learning applications. Amazon A2I provides built-in human review workflows for common machine learning use cases such as text extraction from documents, which allows predictions from, for example, Amazon Textract to be reviewed easily. You can also create your own workflows for machine-learning models built on SageMaker or any other tools. Using Amazon A2I, you can allow human reviewers to step in when a model is unable to make a high confidence prediction or to audit its predictions on an ongoing basis. Let's dive deeper into the individual steps for building a human-in-the-loop review system, and predictions. First, you need to define the human workforce again. Similar to ground truth, which was shown earlier this week, you can choose between the Amazon Mechanical Turk workforce of over 500,000 independent contractors worldwide, a private workforce consisting of your employees or co-workers or from a vendor company listed on the AWS marketplace. The workforce setup steps are exactly the same as Amazon A2I leverages the workforce teams set up by our ground truth. In this example, let's reuse the private workforce setup I demonstrated earlier this week. Next, you need to do find the human task UI, which is the instructions page for your human reviewers. Again, this has done exactly the same way you set up the task UI for the data labelers in the ground truth example. Again, I will reuse the text classification UI to classify product reviews into the three sentiment classes. In a third step, you need to define the human review workflow. The human review workflow is defined in a flow definition. The flow definition specifies the workforce, where your tasks are sent. The set of instructions that your workforce receives, which is the task UI template you've created in the previous step, and the configuration of your work or tasks, including the number of workers that receive a task. The flow definition also specifies where your output data is stored. Amazon A2I provides built-in human review workflows for common ML use cases such as content moderation and text extraction from documents. For this purpose, Amazon A2I is integrated with AWS AI Services, including Amazon Textract. You can also create your custom workflows to integrate with additional AI services or ML models built on SageMaker or any other tools. Let's have a look at those options. Amazon Textract is a document analysis service that detects and extracts printed text and handwriting. Structured data such as fields of interest and their values and tables from images and scans of documents. Thanks to the built-in integration, you can start a human review workflow within an application program interface or API call to Amazon Textract by providing the conditions that cause a human loop to be called. You can also build your custom human review workflow with other AWS AI Services by providing some lines of code to define those conditions that cause the human loop to be created. Let's have a look at how you could start a human loop on low confidence model predictions from Amazon Comprehend. Amazon Comprehend is a natural language processing service that uses machine learning to uncover information and unstructured data. Let's use Amazon Comprehend to classify our product reviews. First, let's define if you sample product reviews such as, I enjoyed this product and it's okay. You also need to define a condition for when to start the HumanLoop. For this example, let's say you want to review all predictions that are returned with a confidence score lower than 90 percent. Next, you call the Amazon Comprehend API and provide the sample reviews and an Amazon Comprehend modeling point. Note that in this example I've already previously trained a custom Amazon Comprehend model on the product reviews data and the three is sentiment classes. This custom Amazon Comprehend model is available via the provided end point Amazon Resource Name or ARN. Then I parse the sentiment class and confidence score from the Amazon Comprehend API response. In the simple if-clause, I can check whether the returned confidence score is under the defined threshold of 90 percent. Which means I want to start the HumanLoop with the predicted sentiment class and the actual sample review as inputs. Lastly, let's discuss how you can create your human review workflow for custom ML models built on SageMaker. Like the previous example of how Amazon A2I can work with AWS AI Services. You can also use Amazon A2I to review real-time low confidence predictions made by a model deployed to a SageMaker hosted endpoint and incrementally train your model using Amazon AI output data. From your notebook environment let's define a custom SageMaker predict a class to specify how to process the model inputs and how to process the model outputs. You can do this via the Serializer and Deserializer settings. Then let's deploy the fine tune PyTorch RoBERTa model as a SageMaker model endpoint. You might recall that RoBERTa stands for the robustly optimized bert pre-training approach. You can use the PyTorch model class, provide a model name, specify the custom predictor I created in the previous step, and the S3 location of the model artifact of the modal tar.gz file.

![](2024-01-03-11-14-24.png)

![](2024-01-03-11-15-04.png)

![](2024-01-03-11-15-32.png)

Then you can deploy the model by calling model.deploy and providing a model endpoint name. Once the model endpoint is ready, you can send a sample reviews again to the model via the predictor predict API call. Note that you need to pass the reviews in the JSON format the model expects as input. Then you parse the modal response again to obtain the predicted label and the confidence score. Now, you are ready to define the human loop condition again. In an if-clause similar to the previous one, you can check whether the returned confidence score is under the defined threshold of 90 percent. Which again means you want to start the HumanLoop with the predicted label and the review as inputs. You can also add additional print statements to document when the HumanLoops get started. Here you can see a sample results showing human loop was started for a low confidence prediction of 54 percent for the sampler review, it is okay. To verify the human loop results, you can capture the completed HumanLoops API responses. Assembler result of the HumanLoop input content and the human answer is shown here. Let's discuss how you build human-in-the-loop pipelines with Amazon Augmented AI or Amazon A2I. Here is the ambition human-in-the-loop workflow again, to implement human review of model predictions. Amazon A2I makes it easy to build and manage human reviews for machine learning applications. Amazon A2I provides built-in human review workflows for common machine learning use cases such as text extraction from documents, which allows predictions from, for example, Amazon Textract to be reviewed easily. You can also create your own workflows for machine-learning models built on SageMaker or any other tools. Using Amazon A2I, you can allow human reviewers to step in when a model is unable to make a high confidence prediction or to audit its predictions on an ongoing basis. Let's dive deeper into the individual steps for building a human-in-the-loop review system, [inaudible] predictions. First, you need to define the human workforce again. Similar to ground truth, which was shown earlier this week, you can choose between the Amazon Mechanical Turk workforce of over 500,000 independent contractors worldwide, a private workforce consisting of your employees or co-workers or from a vendor company listed on the AWS marketplace. The workforce setup steps are exactly the same as Amazon A2I leverages the workforce teams set up by our ground truth. In this example, let's reuse the private workforce setup I demonstrated earlier this week. Next, you need to do find the human task UI, which is the instructions page for your human reviewers. Again, this has done exactly the same way you set up the task UI for the data labelers in the ground truth example. Again, I will reuse the text classification UI to classify product reviews into the three sentiment classes. In a third step, you need to define the human review workflow. The human review workflow is defined in a flow definition. The flow definition specifies the workforce, where your tasks are sent. The set of instructions that your workforce receives, which is the task UI template you've created in the previous step, and the configuration of your work or tasks, including the number of workers that receive a task. The flow definition also specifies where your output data is stored. Amazon A2I provides built-in human review workflows for common ML use cases such as content moderation and text extraction from documents. For this purpose, Amazon A2I is integrated with AWS AI Services, including Amazon Textract. You can also create your custom workflows to integrate with additional AI services or ML models built on SageMaker or any other tools. Let's have a look at those options. Amazon Textract is a document analysis service that detects and extracts printed text and handwriting. Structured data such as fields of interest and their values and tables from images and scans of documents. Thanks to the built-in integration, you can start a human review workflow within an application program interface or API call to Amazon Textract by providing the conditions that cause a human loop to be called. You can also build your custom human review workflow with other AWS AI Services by providing some lines of code to define those conditions that cause the human loop to be created. Let's have a look at how you could start a human loop on low confidence model predictions from Amazon Comprehend.

![](2024-01-03-11-16-33.png)

![](2024-01-03-11-16-54.png)

![](2024-01-03-11-17-25.png)

![](2024-01-03-11-18-15.png)

![](2024-01-03-11-19-22.png)

![](2024-01-03-11-20-42.png)

![](2024-01-03-11-21-31.png)

Amazon Comprehend is a natural language processing service that uses machine learning to uncover information and unstructured data. Let's use Amazon Comprehend to classify our product reviews. First, let's define if you sample product reviews such as, I enjoyed this product and it's okay. You also need to define a condition for when to start the HumanLoop. For this example, let's say you want to review all predictions that are returned with a confidence score lower than 90 percent. Next, you call the Amazon Comprehend API and provide the sample reviews and an Amazon Comprehend modeling point. Note that in this example I've already previously trained a custom Amazon Comprehend model on the product reviews data and the three is sentiment classes. This custom Amazon Comprehend model is available via the provided end point Amazon Resource Name or ARN. Then I parse the sentiment class and confidence score from the Amazon Comprehend API response. In the simple if-clause, I can check whether the returned confidence score is under the defined threshold of 90 percent. Which means I want to start the HumanLoop with the predicted sentiment class and the actual sample review as inputs. Lastly, let's discuss how you can create your human review workflow for custom ML models built on SageMaker. Like the previous example of how Amazon A2I can work with AWS AI Services. You can also use Amazon A2I to review real-time low confidence predictions made by a model deployed to a SageMaker hosted endpoint and incrementally train your model using Amazon AI output data. From your notebook environment let's define a custom SageMaker predict a class to specify how to process the model inputs and how to process the model outputs. You can do this via the Serializer and Deserializer settings. Then let's deploy the fine tune PyTorch RoBERTa model as a SageMaker model endpoint. You might recall that RoBERTa stands for the robustly optimized bert pre-training approach. You can use the PyTorch model class, provide a model name, specify the custom predictor I created in the previous step, and the S3 location of the model artifact of the modal tar.gz file. Then you can deploy the model by calling model.deploy and providing a model endpoint name. Once the model endpoint is ready, you can send a sample reviews again to the model via the predictor predict API call. Note that you need to pass the reviews in the JSON format the model expects as input. Then you parse the modal response again to obtain the predicted label and the confidence score. Now, you are ready to define the human loop condition again. In an if-clause similar to the previous one, you can check whether the returned confidence score is under the defined threshold of 90 percent. Which again means you want to start the HumanLoop with the predicted label and the review as inputs. You can also add additional print statements to document when the HumanLoops get started. Here you can see a sample results showing human loop was started for a low confidence prediction of 54 percent for the sampler review, it is okay. To verify the human loop results, you can capture the completed HumanLoops API responses. Assembler result of the HumanLoop input content and the human answer is shown here. 

![](2024-01-03-11-22-09.png)

![](2024-01-03-11-22-40.png)

![](2024-01-03-11-23-36.png)

![](2024-01-03-11-26-33.png)

![](2024-01-03-11-26-59.png)

![](2024-01-03-11-27-41.png)

![](2024-01-03-11-28-17.png)

![](2024-01-03-11-28-37.png)

![](2024-01-03-11-29-00.png)

![](2024-01-03-11-30-03.png)

![](2024-01-03-11-30-52.png)

![](2024-01-03-11-31-12.png)

![](2024-01-03-11-31-57.png)