# Custom Human-in-the-Loop (HITL) Review for JSON

- Author: docai-incubator@google.com

## Purpose and Description

This guide provides instructions on how to trigger a JSON for a HITL Review based on specified conditions, such as document level or entity level.

This document provides two criteria for triggering HITL: document level and entity level, based on the confidence level of each entity. You can modify the script with your own custom criteria to meet your specific needs.

For more information about HITL, refer to [Human-in-the-Loop Overview](https://cloud.google.com/document-ai/docs/hitl) or the following video:

[What is Human-in-the-Loop?](https://www.youtube.com/watch?v=qeuRQAityB8&list=PLIivdWyY5sqIR88BxIK-3w14Vm-jTH1id&index=6)

## Prerequisites 
1. Access to a Google Cloud project to create Document AI processors.
   - Permission to Google project is needed to access Document AI processors.
1. Python: Jupyter notebook (Vertex AI) or Google Colab.

## Configure  HITL in the Processor

Create a processor for HITL according to your document type. Set the output path for results and filter method as **No Filter (self-validation)**, as shown in the below screenshot.

![HITL Setup](hitl-setup.png)

## Tool Operation Procedure

### 1. Install required libraries

In [11]:
%pip install google-cloud-documentai
%pip install google-cloud-storage
%pip install google-api-core


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m23.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


### 2. Import Packages

In [12]:
from typing import Dict, Optional
from google.cloud import storage
from google.cloud import documentai
from google.api_core.client_options import ClientOptions

### 3. Input Details

In [13]:
# Specify the bucket and JSON file name
bucket_name = "your-bucket-name"  # Name of the storage bucket
file_path = "folder/subfolder/json_example_1.json"  # File path of the json

# Processor Inputs
project_id = "xxxxx-xxxxx-xxxx"  # Project Id
location = "us"  # Format is 'us' or 'eu'
processor_id = "xxxxxxxxxxxx"  # Processor Id

### 4. Run the functions

In [17]:
def review_document(
    project_id: str, location: str, processor_id: str, document: documentai.Document
) -> str:
    # You must set the api_endpoint if you use a location other than 'us'.
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    # Create a client
    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # Gets the full resource name of the human review config
    human_review_config = client.human_review_config_path(
        project_id, location, processor_id
    )

    # Options are DEFAULT, URGENT
    priority = documentai.ReviewDocumentRequest.Priority.DEFAULT

    # Creates the human review request
    request = documentai.ReviewDocumentRequest(
        inline_document=document,
        human_review_config=human_review_config,
        enable_schema_validation=False,
        priority=priority,
    )

    # Send a request for human review of the processed document
    operation = client.review_document(request=request)

    # Return operation name, can be used to check status of the request
    return operation.operation.name


def check_entities(
    document_threshold: Optional[float] = None,
    expected_entities: Optional[Dict[str, float]] = None,
) -> bool:
    # Get Document from Google Cloud Storage
    storage_client = storage.Client()
    blob = storage_client.bucket(bucket_name).blob(file_path)
    document = documentai.Document.from_json(
        blob.download_as_bytes(), ignore_unknown_fields=True
    )

    for entity in document.entities:
        if (document_threshold and entity.confidence < document_threshold) or (
            expected_entities
            and entity.confidence < expected_entities.get(entity.type_, 0)
        ):
            print(entity.type_, entity.confidence)
            operation = review_document(
                project_id=project_id,
                location=location,
                processor_id=processor_id,
                document=document,
            )
            print(f"HITL Triggered - Operation: {operation}")
            return True

    print("No Entities have Confidence less than Threshold")
    return False


def document_level_confidence(document_threshold: float) -> bool:
    return check_entities(document_threshold=document_threshold)


def entity_level_confidence(expected_entities: Dict[str, float]) -> bool:
    return check_entities(expected_entities=expected_entities)

### 5. Trigger the HITL Review

Based on your specific requirements, you have the option to call one of two functions: `document_level_confidence()` and `entity_level_confidence()`.

These functions initiate a HITL (Human-in-the-Loop) review based on the threshold value that you have provided.

- `document_level_confidence()` examines the confidence level of all entities in a document. 
  - If any entity falls below the overall threshold you specified, it will trigger HITL review.
- `entity_level_confidence()` focuses only on the specified entities.
  - It checks the confidence level of each individual entity and triggers HITL review if any of them have a confidence level lower than the threshold.

You have the option to initiate the HITL review process by running either of the two provided functions, or you can create your own custom function for your specific criteria using the information available in the JSON and then trigger the HITL review accordingly.

In [20]:
# Select any one of the following criteria below for the HITL review: document level or entity level.

# Overall Document threshold
document_threshold = 0.8

# Entity name and its confidence
expected_entities = {
    "credit_card_last_four_digits": 0.5,
    "payment_type": 0.9,
    "total_amount": 0.8,
}

document_level_confidence(document_threshold)
entity_level_confidence(expected_entities)

No Entities have Confidence less than Threshold


False