# Enhance your analyzer with labeled data

Labeled data is a group of samples that have been tagged with one or more labels to add context or meaning, which is used to improve analyzer's performance.

Please go to [Azure AI Foundry]() to use the labling tool to annotate your data.

In this notebook we will demonstrate after you have the labeled data, how to create analyzer with them and analyze your files.

> Note: Currently this feature is only available for analyzer scenario is `document`


## Prerequisites
1. Follow steps in [README](../README.md#Configure-Azure-AI-Service-resource) to create `.env` file to configure your Azure AI Service.
1. Follow steps in [Set labeled data](../docs/set_env_for_labeled_data.md) to add training data related env variables in `.env`.
1. Install packages needed to run the sample




In [None]:
%pip install -r ../requirements.txt


## Analyzer template
In this sample we define a template for [purchase order](../analyzer_templates/purchase_order.json). We labeled the fields in the training data.

In [2]:
analyzer_template = '../analyzer_templates/purchase_order.json'

## Create Azure content understanding client
>The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is utility Class which contain the functions to interact with the Content Understanding server. Before Content Understanding SDK release, we can regard it as a lightweight SDK. Fill the constant **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, **AZURE_AI_API_KEY** with the information from your Azure AI Service.

In [None]:
import logging
import json
import os
import sys
from dotenv import find_dotenv, load_dotenv

# import utility package from python samples root directory
py_samples_root_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(py_samples_root_dir)
from python.content_understanding_client import AzureContentUnderstandingClient

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

client = AzureContentUnderstandingClient(
    endpoint=os.getenv("AZURE_AI_ENDPOINT"),
    api_version=os.getenv("AZURE_AI_API_VERSION", "2024-12-01-preview"),
    x_ms_useragent="azure-ai-content-understanding-python/analyzer_training",
)

## Create analyzer with defined schema
Before creating the custom fields analyzer, you should fill the constant ANALYZER_ID with a business-related name. Here we randomly generate a name for demo purpose.

We use **TRAINING_DATA_SAS_URL** and **TRAINING_DATA_PATH** that's set in the prerequisite step.

In [None]:
import uuid
ANALYZER_ID = "train-sample-" + str(uuid.uuid4())

response = client.begin_create_analyzer(
    ANALYZER_ID,
    analyzer_schema_path=analyzer_template,
    training_storage_container_sas_url=os.getenv("TRAINING_DATA_SAS_URL"),
    training_storage_container_path_prefix=os.getenv("TRAINING_DATA_PATH"),
)
result = client.poll_result(response)
if result is not None and "status" in result and result["status"] == "Succeeded":
    logging.info(f"Here is the analyzer detail for {result['result']['analyzerId']}")
    logging.info(json.dumps(result, indent=2))
else:
    logging.info(
        "Check your service please, may be some issues in configuration and deployment"
    )

## Use created analyzer to extract document content
After the analyzer is successfully created, we can use it to analyze our input files.

In [5]:
response = client.begin_analyze(ANALYZER_ID, file_location='../data/purchase_order.jpg')
result = client.poll_result(response)

logging.info(json.dumps(result, indent=2))

INFO:python.content_understanding_client:Analyzing file ../data/purchase_order.jpg with analyzer: train-sample-3292ff56-bc75-4bf0-8a09-8aa866d8553f
INFO:python.content_understanding_client:Request 9ed825c9-551e-45e2-8ec0-1ae555bcd56f in progress ...
INFO:python.content_understanding_client:Request 9ed825c9-551e-45e2-8ec0-1ae555bcd56f in progress ...
INFO:python.content_understanding_client:Request 9ed825c9-551e-45e2-8ec0-1ae555bcd56f in progress ...
INFO:python.content_understanding_client:Request 9ed825c9-551e-45e2-8ec0-1ae555bcd56f in progress ...
INFO:python.content_understanding_client:Request result is ready after 11.27 seconds.
INFO:root:{
  "id": "9ed825c9-551e-45e2-8ec0-1ae555bcd56f",
  "status": "Succeeded",
  "result": {
    "analyzerId": "train-sample-3292ff56-bc75-4bf0-8a09-8aa866d8553f",
    "apiVersion": "2024-12-01-preview",
    "createdAt": "2024-12-09T19:42:58Z",
    "contents": [
      {
        "markdown": "Purchase Order\n\n\n# Hero Limited\n\nCompany Phone: 555-348

## Delete exist analyzer in Content Understanding Service
This snippet is not required, but it's only used to prevent the testing analyzer from residing in your service. The custom fields analyzer could be stored in your service for reusing by subsequent business in real usage scenarios.


In [6]:
client.delete_analyzer(ANALYZER_ID)

INFO:python.content_understanding_client:Analyzer train-sample-3292ff56-bc75-4bf0-8a09-8aa866d8553f deleted.


<Response [204]>