<h1> <strong>Custom Named Entity Recognition (NER) from Mortgage Documents </strong>

A powerful feature in AWS Comprehend is custom NER. This allows us to find certain key values in documents without having to build custom matching scripts or parsing through large quantites of documents. We have multiple options on AWS to build a custom NER model. The first is through flat data txt or csv files with the entity name and value. This method supports <strong>Synchronous and Asynchronous</strong> calls. The other method and the method we will be doing in this module is custom NER from native document types, this method currently only supports ***Asynchronous*** calls. This method becomes especially powerful as it takes into account spatial information in a document to improve inference. 

<h4>Please follow the instructions in the lab guide before proceeding</h4> 

In [None]:
import boto3
import json
import requests

In [None]:
#updating the manifest file with the s3 bucket you created in the earlier step. 

inputted_s3_uri = "s3://{enter_bucket_name}"
replace_with_input_source = "s3://comprehend-semi-structured-documents-us-east-1-737050353456/source-semi-structured-documents/"
input_source = f"{inputted_s3_uri}/source/"

replace_with_input_annotations = "s3://comprehend-semi-structured-documents-us-east-1-737050353456/output/labeling-job-labeling-job-20211110T120317/annotations/consolidated-annotation/consolidation-response/iteration-1/"
input_annotations =  f"{inputted_s3_uri}/"

with open("output.manifest") as output_file:
    data = output_file.read()
    data = data.replace(replace_with_input_source, input_source)
    data = data.replace(replace_with_input_annotations, input_annotations)
    with open("output.manifest", "w") as parsed_output_file:
        parsed_output_file.write(data)

print("The manifest file is updated with the correct bucket")

In [None]:
#We are now going to upload the latest version of our manifest to s3
s3 = boto3.resource('s3')
s3.meta.client.upload_file(Filename='output.manifest', Bucket='{enter_bucket_name}', Key='output.manifest')
print("manifest file sucessfully sent")

<h4> Now that we have sucessfully updated our manifest file go back to the lab guide to train your model! </h4> 

<h2> Pre-Built Custom NER Model </h2>

While we wait for our model to train let's use a pre-built NER model to detect entities in our "sample.pdf" document.

In [None]:
#sending our sample.pdf to the pre-built custom NER model
input_file_path = "sample.pdf"
url = "https://uctwj7mbwu.us-east-1.awsapprunner.com/customner"

with open(input_file_path, "rb") as file:
    files = {
         "input_file": file
    }
    response = requests.post(url, files=files)
    responsecontent = response.json()
    print(responsecontent)