# 1. Copy pre-annotated check samples to your adapter

### Step 1: Create an adapter via console
<img src="./screenshots/checks-notebook-step1.png"/>

### Step 2: Copy the adapter ID and dataset S3 bucket location from Adapter Details page.
<img src="./screenshots/checks-notebook-step2.png"/>

In [None]:
ADAPTERID="111111111111"
S3BUCKET="textract-adapters-us-east-1-1111"

### Step 3: Update the manifest file with the adapter details
Run the below cell to programmatically extract the preannotations and update the manifest file with your adapter ID and dataset S3 bucket location 

In [None]:
import shutil
shutil.unpack_archive("./samples/checks-annotations.zip", extract_dir=ADAPTERID)
print("Check samples archive extracted successfully...")

!sed -i -e "s/<s3-bucket-name>/$S3BUCKET/;s/<adapter-id>/$ADAPTERID/" "./$ADAPTERID/checks-annotations/manifest.jsonl"
print(f"Replaces all instances of the adapter ID with {ADAPTERID} and S3bucket with {S3BUCKET}")

### Step 4: Copy the pre-annotations to the data set location

In [None]:
!aws s3 cp "./$ADAPTERID/checks-annotations" "s3://$S3BUCKET/adapters/$ADAPTERID" --recursive
print("\nSuccessfully copied all files")

### Step 5: Refresh the adapter details page
Return back to the Textract console and refresh the adapter details page. You should see the following
1. The dataset is created successfully
2. Queries have been created
3. Documents have been verified

<img src="./screenshots/checks-notebook-step5_1.png"/>

### Step 6: View the pre-annotated samples 
Click on the Verify Documents button to open the dataset page. Once open, select the files and click review annotations.
<img src="./screenshots/checks-notebook-step6.png"/>

### Step 7: Train the Adapter
Click on the Train Adapter button to initiate training. Training can take 1 to 30 hours to complete, however, given our dataset is small, it should complete in an hour or so.
<img src="./screenshots/checks-notebook-step7.png"/>

### Step 8: Evaluate the adapter (console)

### Step 9: Test the adapter programmatically (API)

In [None]:
from IPython.display import Image

document_name = f"{ADAPTERID}/checks-annotations/original_assets/2e7e80de-4c39-41dd-b61f-1a9e38812d86.jpg"
Image(filename=document_name) 


In [None]:
!python -m pip install amazon-textract-caller --upgrade
!python -m pip install amazon-textract-response-parser --upgrade

In [None]:
import boto3
from textractcaller.t_call import call_textract, Textract_Features, Query, QueriesConfig, Adapter, AdaptersConfig
import trp.trp2 as t2
from tabulate import tabulate

textract_client = boto3.client('textract')

def tabulate_query_answers(textract_json):
    d = t2.TDocumentSchema().load(textract_json)
    for page in d.pages:
        query_answers = d.get_query_answers(page=page)
        print(tabulate(query_answers, tablefmt="github"))

queries = []
queries.append(Query(text="What is the check#?", alias="CHECK_NUMBER", pages=["*"]))
queries.append(Query(text="What is the date?", alias="DATE", pages=["*"]))
queries.append(Query(text="What is the check amount in words?", alias="CHECK_AMOUNT_WORDS", pages=["*"]))
queries.append(Query(text="What is the dollar amount?", alias="DOLLAR_AMOUNT", pages=["*"]))
queries.append(Query(text="Who is the payee?", alias="PAYEE_NAME", pages=["*"]))
queries.append(Query(text="What is the customer account#", alias="ACCOUNT_NUMBER", pages=["*"]))
queries.append(Query(text="what is the payee appendress?", alias="PAYEE_appendRESS", pages=["*"]))
queries.append(Query(text="What is the bank routing number?", alias="BANK_ROUTING_NUMBER", pages=["*"]))
queries.append(Query(text="What is the memo", alias="MEMO", pages=["*"]))
queries.append(Query(text="What is the account name/payer/drawer name?", alias="ACCOUNT_NAME", pages=["*"]))
queries.append(Query(text="What is the bank name/drawee name?", alias="BANK_NAME", pages=["*"]))

queries_config = QueriesConfig(queries=queries)



In [None]:
textract_json_prebuilt = call_textract(input_document=document_name,
                  boto3_textract_client=textract_client,
                  features=[Textract_Features.QUERIES],
                  queries_config=queries_config)

tabulate_query_answers(textract_json_prebuilt)

In [None]:
adapter1 = Adapter(adapter_id=ADAPTERID, version="1", pages=["*"])
adapters_config = AdaptersConfig(adapters=[adapter1])
print(ADAPTERID)

textract_json_with_adapter = call_textract(input_document=document_name,
                  boto3_textract_client=textract_client,
                  features=[Textract_Features.QUERIES],
                  queries_config=queries_config,
                  adapters_config=adapters_config)

tabulate_query_answers(textract_json_with_adapter)