# Enhance your analyzer with labeled data


> #################################################################################
>
> Note: Currently this feature is only available for analyzer scenario is `document`
>
> #################################################################################

Labeled data is a group of samples that have been tagged with one or more labels to add context or meaning, which is used to improve analyzer's performance.

Please go to [Azure AI Foundry]() to use the labling tool to annotate your data.

In this notebook we will demonstrate after you have the labeled data, how to create analyzer with them and analyze your files.



## Prerequisites
1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)
1. Follow steps in [Set labeled data](../docs/set_env_for_labeled_data.md) to add training data related env variables in `.env`.
1. Install packages needed to run the sample




In [None]:
%pip install -r ../requirements.txt

## Create Azure content understanding client
>The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is utility Class which contain the functions to interact with the Content Understanding server. Before Content Understanding SDK release, we can regard it as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.

In [None]:
import logging
import json
import os
import sys
from pathlib import Path
from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# import utility package from python samples root directory
parent_dir = Path(Path.cwd()).parent
sys.path.append(str(parent_dir))
from python.content_understanding_client import AzureContentUnderstandingClient

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

client = AzureContentUnderstandingClient(
    endpoint=os.getenv("AZURE_AI_ENDPOINT"),
    api_version=os.getenv("AZURE_AI_API_VERSION", "2025-05-01-preview"),
    token_provider=token_provider,
    x_ms_useragent="azure-ai-content-understanding-python/analyzer_training", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.
)

## Use analyzer to extract document content
After the analyzer is successfully setted, we can use it to analyze our input files.

In [None]:
ANALYZER_ID = 'prebuilt-imageAnalyzer'
response = client.begin_analyze(ANALYZER_ID, file_location='../data/receipt.png')
result_json = client.poll_result(response)

logging.info(json.dumps(result_json, indent=2))

> The markdown output contains layout information, which is very useful for Retrieval-Augmented Generation (RAG) scenarios. You can paste the markdown into a viewer such as Visual Studio Code and preview the layout structure.

In [None]:
  print(result_json["result"]["contents"][0]["markdown"])

> You can get the layout information, including words/lines in the pagesnode and paragraphs info in paragraphs, and tables in the table.

In [None]:
print(json.dumps(result_json["result"]["contents"][0], indent=2))