# Enhance your analyzer with labeled data


> #################################################################################
>
> Note: Currently this feature is only available for analyzer scenario is `document`
>
> #################################################################################

Labeled data is a group of samples that have been tagged with one or more labels to add context or meaning, which is used to improve analyzer's performance.

Please go to [Azure AI Foundry]() to use the labling tool to annotate your data.

In this notebook we will demonstrate after you have the labeled data, how to create analyzer with them and analyze your files.



## Prerequisites
1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)
1. Follow steps in [Set labeled data](../docs/set_env_for_labeled_data.md) to add training data related env variables in `.env`.
1. Install packages needed to run the sample




In [1]:
%pip install -r ../requirements.txt

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.



## Analyzer template
In this sample we define a template for [purchase order](../analyzer_templates/purchase_order.json). We labeled the fields in the training data.

In [2]:
analyzer_template = '../analyzer_templates/receipt.json'

## Create Azure content understanding client
>The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is utility Class which contain the functions to interact with the Content Understanding server. Before Content Understanding SDK release, we can regard it as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.

In [2]:
import logging
import json
import os
import sys
from pathlib import Path
from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# import utility package from python samples root directory
parent_dir = Path(Path.cwd()).parent
sys.path.append(str(parent_dir))
from python.content_understanding_client import AzureContentUnderstandingClient

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

client = AzureContentUnderstandingClient(
    endpoint=os.getenv("AZURE_AI_ENDPOINT"),
    api_version=os.getenv("AZURE_AI_API_VERSION", "2025-05-01-preview"),
    token_provider=token_provider,
    x_ms_useragent="azure-ai-content-understanding-python/analyzer_training", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.
)

INFO:azure.identity._credentials.environment:No environment configuration found.
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=REDACTED&resource=REDACTED'
Request method: 'GET'
Request headers:
    'User-Agent': 'azsdk-python-identity/1.23.0 Python/3.11.12 (Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.36)'
No body was attached to the request
INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token from AzureDeveloperCliCredential


## Use analyzer to extract document content
After the analyzer is successfully setted, we can use it to analyze our input files.

In [5]:
ANALYZER_ID = 'prebuilt-imageAnalyzer'
response = client.begin_analyze(ANALYZER_ID, file_location='../data/receipt.png')
result_json = client.poll_result(response)

logging.info(json.dumps(result_json, indent=2))

INFO:python.content_understanding_client:Analyzing file ../data/receipt.png with analyzer: prebuilt-imageAnalyzer
INFO:python.content_understanding_client:Request 397f0ac8-8975-4346-abc2-ea27c7c450aa in progress ...
INFO:python.content_understanding_client:Request 397f0ac8-8975-4346-abc2-ea27c7c450aa in progress ...
INFO:python.content_understanding_client:Request result is ready after 7.20 seconds.
INFO:root:{
  "id": "397f0ac8-8975-4346-abc2-ea27c7c450aa",
  "status": "Succeeded",
  "result": {
    "analyzerId": "prebuilt-imageAnalyzer",
    "apiVersion": "2025-05-01-preview",
    "createdAt": "2025-06-05T08:48:02Z",
    "contents": [
      {
        "markdown": "![image](image)\n",
        "fields": {
          "Summary": {
            "type": "string",
            "valueString": "The image is a receipt from a store named Contoso, located at 123 Main Street, Redmond, WA 98052. The receipt is dated June 10, 2019, at 12:59, and the sales associate is named Paul. The purchase includes 

> The markdown output contains layout information, which is very useful for Retrieval-Augmented Generation (RAG) scenarios. You can paste the markdown into a viewer such as Visual Studio Code and preview the layout structure.

In [9]:
  print(result_json["result"]["contents"][0]["markdown"])

![image](image)



> You can get the layout information, including words/lines in the pagesnode and paragraphs info in paragraphs, and tables in the table.

In [12]:
print(json.dumps(result_json["result"]["contents"][0]))

{"markdown": "![image](image)\n", "fields": {"Summary": {"type": "string", "valueString": "The image is a receipt from a store named Contoso, located at 123 Main Street, Redmond, WA 98052. The receipt is dated June 10, 2019, at 12:59, and the sales associate is named Paul. The purchase includes two Surface Pro 6 devices priced at $1,999.00 each and three Surface Pens priced at $299.97 each. The subtotal for the purchase is $2,299.97, with an additional tax of $258.31, bringing the total amount to $2,558.28."}}, "kind": "document", "startPageNumber": 1, "endPageNumber": 1, "unit": "pixel", "pages": [{"pageNumber": 1, "spans": []}]}
