In this notebook, you'll use the Azure Form Recognizer v3.0 REST API with Python to extract and identify relevant information in health insurance cards.


## Prerequistes
- Azure subscription - [Create one for free](https://azure.microsoft.com/en-us/free/cognitive-services/)
- [Python 3.x](https://www.python.org/) - Your Python installation should include [pip](https://pip.pypa.io/en/stable/). You can check if you have pip installed by running `pip --version` on the command line. Get pip by installing the latest version of Python.
- Once you have your Azure subscription, [create a Form Recognizer resource](https://ms.portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) in the Azure portal to get your **key** and **endpoint**. After it deploys, click **Go to resource** - You will need the key and endpoint from the resource you create to connect your application to the Form Recognizer API. Later in the quickstart, you will paste your key and endpoint into the code below. You can use the free pricing tier (`F0`) to try the service, and upgrade later to a paid tier (`S0`) for production.

In [10]:
import json
import time
import base64
from requests import get, post

## Get the key and endpoint
Refer to the screenshot on how to get the key and endpoint of your Form Recognizer resource.
![How to find endpoint and key](./images/how-to-find-endpoint-and-key.png)

In [11]:
endpoint = r"<your endpoint>"
apim_key = "<your key>"
post_url = endpoint + "formrecognizer/documentModels/prebuilt-healthInsuranceCard.us:analyze" # Refer to refer to https://docs.microsoft.com/azure/applied-ai-services/form-recognizer/v3-migration-guide#analyze-operation for the full list of model IDs supported in v3.0
source = r"sample-insurance-card.png" # You can replace it with your own image or pdf file, see https://docs.microsoft.com/azure/applied-ai-services/form-recognizer/concept-model-overview#input-requirements for input requirements

## Compose the headers, body and parameters

In [12]:
# Use this header if your input is a publicly accessible URL or base64 encoded data
headers = {
    # Request headers
    'Content-Type': 'application/json',
    'Ocp-Apim-Subscription-Key': apim_key
}

In [None]:
# # Alternatively, you can this header if your input is in bytes
# headers_octet_stream = {
#     # Request headers
#     'Content-Type': 'application/octet-stream',
#     'Ocp-Apim-Subscription-Key': apim_key
# }

In [13]:
# Read a local file and convert to base64 format
with open(source, "rb") as f:
    data_bytes = f.read()
base64_bytes = base64.b64encode(data_bytes).decode()

In [15]:
# Compose the request body for base64 encoded data
body = "{\"base64Source\": \"" + f"{base64_bytes}" + "\"}"

In [None]:
# # Use this body if your input can be publicly accessed via an URL
# body_url = "{'urlSource': \"https://formrecognizer.appliedai.azure.com/documents/samples/prebuilt/insurance.jpg\"}"

In [16]:
# Specify the API version in params.
params = {
    "api-version": "<api version>" # replace with the latest API version, e.g. "2022-03-31-preview"
}

## POST to analyze the document

In [None]:
# POST using "application/json" header
try:
    resp = post(url = post_url, headers = headers, params = params, data = body)
    if resp.status_code != 202:
        print("POST analyze failed:\n%s" % resp.text)
    else:
        print("POST analyze succeeded:\n%s" % resp.headers)
        get_url = resp.headers["operation-location"]
        print("GET URL:")
        print(get_url)
except Exception as e:
    print("POST analyze failed:\n%s" % str(e))

You'll receive a 202 (Success) response that includes an `Operation-Location` GUID header, which the script will print to the console. This header contains an operation ID that you can use to query the status of the asynchronous operation and get the results.

In [None]:
# # POST using "application/octet-stream" header 
# try:
#     resp = post(url = post_url, headers = headers_octet_stream, params = params, data = data_bytes)
#     if resp.status_code != 202:
#         print("POST analyze failed:\n%s" % resp.text)
#     else:
#         print("POST analyze succeeded:\n%s" % resp.headers)
#         get_url = resp.headers["operation-location"]
#         print(get_url)
# except Exception as e:
#     print("POST analyze failed:\n%s" % str(e))

## GET analyzed result
After you've called the Analyze API, you call the GET Analyze Result API to get the status of the operation and the extracted data. This uses the `Operation-Location` value in a new API call. This script calls the API at regular intervals until the results are available. We recommend an interval of one second or more.

In [None]:
n_tries = 10
n_try = 0
wait_sec = 6
while n_try < n_tries:
    try:
        resp = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": apim_key})
        resp_json = json.loads(resp.text)
        if resp.status_code != 200:
            print("GET insurance card results failed:\n%s" % resp_json)
            break
        status = resp_json["status"]
        if status == "succeeded":
            print("Insurance card analysis succeeded:\n%s" % resp_json)
            break
        if status == "failed":
            print("Insurance card analysis failed:\n%s" % resp_json)
            break
        # Analysis still running. Wait and retry.
        time.sleep(wait_sec)
        n_try += 1
    except Exception as e:
        msg = "GET analyze results failed:\n%s" % str(e)
        print(msg)
        break

## Examine the reponse
The script will print responses to the console until the Analyze operation completes. Then, it will print the extracted text data in JSON format. The "words" and "lines" in "pages" section contain every word and line of text that was extracted from the insurance card; the "fields" in "documents" section contains key/value information (e.g. member name, ID number, prescription info, etc.) extracted from the insurance card.