## Azure Python SDK

https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/formrecognizer/azure-ai-formrecognizer

Form Recognizer Sample Tool
- https://fott-2-1.azurewebsites.net/ (Test an Existing model - Invoice, Receipt, Business Card, ID)
- https://fott.azurewebsites.net/ (GitHub Source) -> Start a new Training Process


Azure SDK Source
- https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/formrecognizer/azure-ai-formrecognizer

V2 Api Details
- https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2/operations/AnalyzeWithCustomForm

Master Document to Follow:
- https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/client-library?tabs=windows&pivots=programming-language-python&WT.mc_id=Portal-Microsoft_Azure_ProjectOxford

In [1]:
!pip install azure-ai-formrecognizer --pre



In [2]:
import os
from azure.core.exceptions import ResourceNotFoundError
from azure.ai.formrecognizer import FormRecognizerClient
from azure.ai.formrecognizer import FormTrainingClient
from azure.core.credentials import AzureKeyCredential

In [3]:
from PIL import Image
import requests
import io

from matplotlib.pyplot import imshow
%matplotlib inline

In [4]:
endpoint = "https://avkash-form-rec.cognitiveservices.azure.com/"
key = "5ec7d09277104720be739d80bb4bc384"

In [10]:
#form_recognizer_client = FormRecognizerClient(endpoint, AzureKeyCredential(key))
form_training_client = FormTrainingClient(endpoint=endpoint, credential=AzureKeyCredential(key))

In [12]:
custom_models = form_training_client.list_custom_models()

In [17]:
print("Models with the following IDs:")
for model in custom_models:
    print(model.model_id)

Models with the following IDs:


https://github.com/Azure-Samples/cognitive-services-quickstart-code/blob/master/python/FormRecognizer/FormRecognizerQuickstart.py

In [19]:
# <snippet_imports>
import os
from azure.core.exceptions import ResourceNotFoundError
from azure.ai.formrecognizer import FormRecognizerClient
from azure.ai.formrecognizer import FormTrainingClient
from azure.core.credentials import AzureKeyCredential
# </snippet_imports>

# <snippet_creds>
endpoint = "https://avkash-form-rec.cognitiveservices.azure.com/"
key = "5ec7d09277104720be739d80bb4bc384"
# </snippet_creds>

# <snippet_auth>
form_recognizer_client = FormRecognizerClient(endpoint, AzureKeyCredential(key))
form_training_client = FormTrainingClient(endpoint, AzureKeyCredential(key))
# </snippet_auth>


In [20]:
# <snippet_getcontent>
formUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/forms/Form_1.jpg"

poller = form_recognizer_client.begin_recognize_content_from_url(formUrl)
page = poller.result()

table = page[0].tables[0] # page 1, table 1
print("Table found on page {}:".format(table.page_number))
for cell in table.cells:
    print("Cell text: {}".format(cell.text))
    print("Location: {}".format(cell.bounding_box))
    print("Confidence score: {}\n".format(cell.confidence))
# </snippet_getcontent>


Table found on page 1:
Cell text: Details
Location: [Point(x=156.0, y=1037.0), Point(x=847.0, y=1037.0), Point(x=847.0, y=1086.0), Point(x=156.0, y=1086.0)]
Confidence score: 1.0

Cell text: Quantity
Location: [Point(x=847.0, y=1037.0), Point(x=1071.0, y=1038.0), Point(x=1071.0, y=1086.0), Point(x=847.0, y=1086.0)]
Confidence score: 1.0

Cell text: Unit Price
Location: [Point(x=1071.0, y=1038.0), Point(x=1309.0, y=1038.0), Point(x=1309.0, y=1086.0), Point(x=1071.0, y=1086.0)]
Confidence score: 1.0

Cell text: Total
Location: [Point(x=1309.0, y=1038.0), Point(x=1543.0, y=1038.0), Point(x=1543.0, y=1086.0), Point(x=1309.0, y=1086.0)]
Confidence score: 1.0

Cell text: Bindings
Location: [Point(x=156.0, y=1086.0), Point(x=847.0, y=1086.0), Point(x=847.0, y=1127.0), Point(x=156.0, y=1127.0)]
Confidence score: 1.0

Cell text: 20
Location: [Point(x=847.0, y=1086.0), Point(x=1071.0, y=1086.0), Point(x=1071.0, y=1127.0), Point(x=847.0, y=1127.0)]
Confidence score: 1.0

Cell text: 1.00
Location:

In [21]:
# <snippet_receipts>
receiptUrl = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/master/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png"

poller = form_recognizer_client.begin_recognize_receipts_from_url(receiptUrl)
result = poller.result()

for receipt in result:
    for name, field in receipt.fields.items():
        if name == "Items":
            print("Receipt Items:")
            for idx, items in enumerate(field.value):
                print("...Item #{}".format(idx + 1))
                for item_name, item in items.value.items():
                    print("......{}: {} has confidence {}".format(item_name, item.value, item.confidence))
        else:
            print("{}: {} has confidence {}".format(name, field.value, field.confidence))
# </snippet_receipts>


Receipt Items:
...Item #1
......Name: Surface Pro 6 has confidence 0.914
......Quantity: 1.0 has confidence 0.971
......TotalPrice: 999.0 has confidence 0.983
...Item #2
......Name: SurfacePen has confidence 0.718
......Quantity: 1.0 has confidence 0.976
......TotalPrice: 99.99 has confidence 0.967
MerchantAddress: 123 Main Street Redmond, WA 98052 has confidence 0.975
MerchantName: Contoso has confidence 0.974
MerchantPhoneNumber: None has confidence 0.988
ReceiptType: Itemized has confidence 0.99
Subtotal: 1098.99 has confidence 0.982
Tax: 104.4 has confidence 0.985
Total: 1203.39 has confidence 0.957
TransactionDate: 2019-06-10 has confidence 0.987
TransactionTime: 13:59:00 has confidence 0.985


In [None]:
https://avkashsample11.blob.core.windows.net/avkashimages?sp=racwdl&st=2021-06-19T23:51:37Z&se=2021-06-20T23:51:37Z&sv=2020-02-10&sr=c&sig=1EEWW6NKFGZUeOgpXmFkEFfAwCpCv2ATtEBe%2BtIs0kg%3D

In [35]:
# <snippet_train>
# To train a model you need an Azure Storage account.
# Use the SAS URL to access your training files.

## Must have only 4 Permission - Read, Write, Delete and List (Otherwise will not work)
trainingDataUrl = "https://avkashsample11.blob.core.windows.net/avkashimages?sp=rwdl&st=2021-06-19T23:56:05Z&se=2021-06-25T23:56:00Z&sv=2020-02-10&sr=c&sig=wd33pdDeszspV%2F7SYdTq4LcsWUJUyPIzJZhSghB61jA%3D"



In [52]:
## All files in the given folder will be used for training
## Only 1 model will be create based on all training documents
## Only keep those files which are needed

poller = form_training_client.begin_training(trainingDataUrl, use_training_labels=False)
model = poller.result()

In [53]:

print("Model ID: {}".format(model.model_id))
print("Status: {}".format(model.status))
print("Training started on: {}".format(model.training_started_on))
print("Training completed on: {}".format(model.training_completed_on))

print("\nRecognized fields:")
for submodel in model.submodels:
    print(
        "The submodel with form type '{}' has recognized the following fields: {}".format(
            submodel.form_type,
            ", ".join(
                [
                    field.label if field.label else name
                    for name, field in submodel.fields.items()
                ]
            ),
        )
    )


Model ID: d286df3b-731c-4840-891d-fe50475a4f47
Status: ready
Training started on: 2021-06-20 00:55:57+00:00
Training completed on: 2021-06-20 00:56:10+00:00

Recognized fields:
The submodel with form type 'form-0' has recognized the following fields: Contoso, Ltd., Item, Price, Tax, Total
The submodel with form type 'form-1' has recognized the following fields: 
The submodel with form type 'form-2' has recognized the following fields: Charges, Invoice Date, Invoice Due Date, Invoice For:, Invoice Number, VAT ID
The submodel with form type 'form-3' has recognized the following fields: 09 / 21 in the amount of:, 650-768-2322 or e-mail to:, Card Type:, City:, Date:, Email Address:, Mailing Address:, Name of Cardholder:, Purpose of Payment:, STATE OF CALIFORNIA:, Signature:, State:, Zip Code:, form. Telephone #:, to charge my:
The submodel with form type 'form-4' has recognized the following fields: Address:, Company Name:, Company Phone:, Dated As:, Details, Name:, Phone:, Purchase Order 

In [54]:
# Training result information
for doc in model.training_documents:
    print("Document name: {}".format(doc.name))
    print("Document status: {}".format(doc.status))
    print("Document page count: {}".format(doc.page_count))
    print("Document errors: {}".format(doc.errors))
# </snippet_train>


Document name: Form_1.jpg
Document status: succeeded
Document page count: 1
Document errors: []
Document name: Invoice_1.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: form_selection_mark.png
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows1.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows2.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows3.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows4.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows5.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_fixed_rows1.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document

In [57]:
new_test_url = 'https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/forms/sample_invoice.jpg'


In [59]:
poller = form_recognizer_client.begin_recognize_custom_forms_from_url(
    model_id=model.model_id, form_url=new_test_url)
result = poller.result()

for recognized_form in result:
    print("Form type: {}".format(recognized_form.form_type))
    for name, field in recognized_form.fields.items():
        print("Field '{}' has label '{}' with value '{}' and a confidence score of {}".format(
            name,
            field.label_data.text if field.label_data else name,
            field.value,
            field.confidence
        ))
# </snippet_analyze>

Form type: None
Field 'field-0' has label 'INVOICE:' with value 'INV-100' and a confidence score of 1.0
Field 'field-1' has label 'INVOICE DATE:' with value '11/15/2019' and a confidence score of 1.0
Field 'field-2' has label 'DUE DATE:' with value '12/15/2019' and a confidence score of 1.0
Field 'field-3' has label 'CUSTOMER NAME:' with value 'MICROSOFT CORPORATION' and a confidence score of 1.0
Field 'field-4' has label 'SERVICE PERIOD:' with value '10/14/2019 - 11/14/2019' and a confidence score of 1.0
Field 'field-5' has label 'CUSTOMER ID:' with value 'CID-12345' and a confidence score of 1.0
Field 'field-6' has label 'BILL TO:' with value 'Microsoft Finance 123 Bill St, Redmond WA, 98052' and a confidence score of 1.0
Field 'field-7' has label 'SHIP TO:' with value 'Microsoft Delivery 123 Ship St, Redmond WA, 98052' and a confidence score of 1.0
Field 'field-8' has label 'SERVICE ADDRESS:' with value 'Microsoft Services 123 Service St, Redmond WA, 98052' and a confidence score of

In [42]:
# Upload labeled files from SDK
# /azure-sdk-for-python/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/labeled_tables/dynamic

In [43]:
# <snippet_trainlabels>
# To train a model you need an Azure Storage account.
# Use the SAS URL to access your training files.


# trainingDataUrl = "PASTE_YOUR_SAS_URL_OF_YOUR_FORM_FOLDER_IN_BLOB_STORAGE_HERE"

poller = form_training_client.begin_training(trainingDataUrl, use_training_labels=True)
model = poller.result()
trained_model_id = model.model_id

print("Model ID: {}".format(trained_model_id))
print("Status: {}".format(model.status))
print("Training started on: {}".format(model.training_started_on))
print("Training completed on: {}".format(model.training_completed_on))

print("\nRecognized fields:")
for submodel in model.submodels:
    print(
        "The submodel with form type '{}' has recognized the following fields: {}".format(
            submodel.form_type,
            ", ".join(
                [
                    field.label if field.label else name
                    for name, field in submodel.fields.items()
                ]
            ),
        )
    )



Model ID: d47ec1ad-c41b-4a09-8c8a-c82e5c29d4f0
Status: ready
Training started on: 2021-06-20 00:40:06+00:00
Training completed on: 2021-06-20 00:40:09+00:00

Recognized fields:
The submodel with form type 'custom:d47ec1ad-c41b-4a09-8c8a-c82e5c29d4f0' has recognized the following fields: table, table: Item, table: Price, table: Tax, table: Total


In [None]:
# A list of json lables for specific PDF with exact name and PDF

In [44]:
# Training result information
for doc in model.training_documents:
    print("Document name: {}".format(doc.name))
    print("Document status: {}".format(doc.status))
    print("Document page count: {}".format(doc.page_count))
    print("Document errors: {}".format(doc.errors))
# </snippet_trainlabels>

# <snippet_analyze>


Document name: label_table_dynamic_rows1.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows2.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows3.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows4.pdf
Document status: succeeded
Document page count: 1
Document errors: []
Document name: label_table_dynamic_rows5.pdf
Document status: succeeded
Document page count: 1
Document errors: []


In [56]:
poller = form_recognizer_client.begin_recognize_custom_forms_from_url(
    model_id=trained_model_id, form_url=new_test_url)
result = poller.result()

for recognized_form in result:
    print("Form type: {}".format(recognized_form.form_type))
    for name, field in recognized_form.fields.items():
        print("Field '{}' has label '{}' with value '{}' and a confidence score of {}".format(
            name,
            field.label_data.text if field.label_data else name,
            field.value,
            field.confidence
        ))
# </snippet_analyze>

Form type: custom:d47ec1ad-c41b-4a09-8c8a-c82e5c29d4f0
Field 'table' has label 'table' with value '[]' and a confidence score of 1.0


In [51]:
trained_model_id

'd47ec1ad-c41b-4a09-8c8a-c82e5c29d4f0'

In [49]:
new_test_url = 'https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/forms/sample_invoice.jpg'

poller = form_recognizer_client.begin_recognize_custom_forms_from_url(
    model_id=trained_model_id, form_url=new_test_url)
result = poller.result()

In [50]:
for recognized_form in result:
    print("Form type: {}".format(recognized_form.form_type))
    for name, field in recognized_form.fields.items():
        print("Field '{}' has label '{}' with value '{}' and a confidence score of {}".format(
            name,
            field.label_data.text if field.label_data else name,
            field.value,
            field.confidence
        ))

Form type: custom:d47ec1ad-c41b-4a09-8c8a-c82e5c29d4f0
Field 'table' has label 'table' with value '[]' and a confidence score of 1.0


In [63]:
# <snippet_manage_count>
account_properties = form_training_client.get_account_properties()
print("Our account has {} custom models, and we can have at most {} custom models".format(
    account_properties.custom_model_count, account_properties.custom_model_limit
))
# </snippet_manage_count>

# <snippet_manage_list>
# Next, we get a paged list of all of our custom models
custom_models = form_training_client.list_custom_models()

print("We have models with the following ids:")

# Let's pull out the first model
first_model = next(custom_models)
print(first_model.model_id)
for model in custom_models:
    print(model.model_id)
    print(model.status)
# </snippet_manage_list>



Our account has 7 custom models, and we can have at most 250 custom models
We have models with the following ids:
067dc863-ef69-4895-8c0a-19a6a444a00c
605d630c-2698-4b8a-b3bd-4a69cca35fce
invalid
7e272103-8cbe-42f6-b0f9-6506341a552a
ready
d286df3b-731c-4840-891d-fe50475a4f47
ready
d47ec1ad-c41b-4a09-8c8a-c82e5c29d4f0
ready
e1d903ea-1d7e-4831-aec2-3b57dfbff641
invalid
ff8c0233-a727-469c-9672-e6c38952dbc1
invalid


In [48]:
## ^^^ For each training Document you will get One Model

In [47]:
# <snippet_manage_getmodel>
custom_model = form_training_client.get_custom_model(model_id=trained_model_id)
print("Model ID: {}".format(custom_model.model_id))
print("Status: {}".format(custom_model.status))
print("Training started on: {}".format(custom_model.training_started_on))
print("Training completed on: {}".format(custom_model.training_completed_on))
# </snippet_manage_getmodel>


Model ID: d47ec1ad-c41b-4a09-8c8a-c82e5c29d4f0
Status: ready
Training started on: 2021-06-20 00:40:06+00:00
Training completed on: 2021-06-20 00:40:09+00:00


In [60]:
# Backing Up Model
# https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/disaster-recovery

In [None]:
new_test_url = 'https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/samples/sample_forms/forms/sample_invoice.jpg'

poller = form_recognizer_client.begin_recognize_custom_forms_from_url(
    model_id=trained_model_id, form_url=new_test_url)
result = poller.result()

In [67]:
my_model = form_training_client.get_custom_model(model_id='d286df3b-731c-4840-891d-fe50475a4f47')
print(my_model)

CustomFormModel(model_id=d286df3b-731c-4840-891d-fe50475a4f47, status=ready, training_started_on=2021-06-20 00:55:57+00:00, training_completed_on=2021-06-20 00:56:10+00:00, submodels=[CustomFormSubmodel(accuracy=None, model_id=d286df3b-731c-4840-891d-fe50475a4f47, fields={'field-0': CustomFormModelField(label=Contoso, Ltd., name=field-0, accuracy=None), 'field-1': CustomFormModelField(label=Item, name=field-1, accuracy=None), 'field-2': CustomFormModelField(label=Price, name=field-2, accuracy=None), 'field-3': CustomFormModelField(label=Tax, name=field-3, accuracy=None), 'field-4': CustomFormModelField(label=Total, name=field-4, accuracy=None)}, form_type=form-0), CustomFormSubmodel(accuracy=None, model_id=d286df3b-731c-4840-891d-fe50475a4f47, fields={}, form_type=form-1), CustomFormSubmodel(accuracy=None, model_id=d286df3b-731c-4840-891d-fe50475a4f47, fields={'field-0': CustomFormModelField(label=Charges, name=field-0, accuracy=None), 'field-1': CustomFormModelField(label=Invoice Date

In [68]:
print("\nRecognized fields:")
for submodel in my_model.submodels:
    print(
        "The submodel with form type '{}' has recognized the following fields: {}".format(
            submodel.form_type,
            ", ".join(
                [
                    field.label if field.label else name
                    for name, field in submodel.fields.items()
                ]
            ),
        )
    )



Recognized fields:
The submodel with form type 'form-0' has recognized the following fields: Contoso, Ltd., Item, Price, Tax, Total
The submodel with form type 'form-1' has recognized the following fields: 
The submodel with form type 'form-2' has recognized the following fields: Charges, Invoice Date, Invoice Due Date, Invoice For:, Invoice Number, VAT ID
The submodel with form type 'form-3' has recognized the following fields: 09 / 21 in the amount of:, 650-768-2322 or e-mail to:, Card Type:, City:, Date:, Email Address:, Mailing Address:, Name of Cardholder:, Purpose of Payment:, STATE OF CALIFORNIA:, Signature:, State:, Zip Code:, form. Telephone #:, to charge my:
The submodel with form type 'form-4' has recognized the following fields: Address:, Company Name:, Company Phone:, Dated As:, Details, Name:, Phone:, Purchase Order #:, Quantity, Total, Unit Price, Vendor Name:, Website:


In [None]:
# DELETE MODEL
# <snippet_manage_delete>
form_training_client.delete_model(model_id=custom_model.model_id)

try:
    form_training_client.get_custom_model(model_id=custom_model.model_id)
except ResourceNotFoundError:
    print("Successfully deleted model with id {}".format(custom_model.model_id))
# </snippet_manage_delete>