# Extract Custom Fields from Your File

This notebook demonstrates how to use analyzers to extract custom fields from your input files.

## Prerequisites
1. Ensure the Azure AI service is configured by following the [configuration steps](../README.md#configure-azure-ai-service-resource).
2. Install the required packages to run this sample.

In [None]:
%pip install -r ../requirements.txt

## Analyzer Templates

The following is a collection of analyzer templates designed to extract fields from different types of input files.

These templates are highly customizable, allowing you to modify them to meet your specific requirements. For additional verified templates from Microsoft, please refer to [this directory](../analyzer_templates/README.md).

In [None]:
extraction_templates = {
    # Extract fields from invoices without grounding sources or confidence scores.
    "invoice": ('../analyzer_templates/invoice.json', '../data/invoice.pdf'),

    # Extract fields from invoices including grounding sources and confidence scores (optional).
    "invoice_field_source": ('../analyzer_templates/invoice_field_source.json', '../data/invoice.pdf'),

    # Extract insights from call recordings, such as summaries, topics, companies, and people mentioned.
    "call_recording": ('../analyzer_templates/call_recording_analytics.json', '../data/callCenterRecording.mp3'),

    # Extract summary and sentiment from conversational audio (e.g., customer service calls).
    "conversation_audio": ('../analyzer_templates/conversational_audio_analytics.json', '../data/callCenterRecording.mp3'),

    # Extract descriptions and perform sentiment analysis on marketing videos.
    "marketing_video": ('../analyzer_templates/marketing_video.json', '../data/FlightSimulator.mp4'),
}

Specify the analyzer template you wish to use and assign a unique name for the analyzer that will be created based on this template.

In [None]:
import uuid

ANALYZER_TEMPLATE = "invoice"

(analyzer_template_path, analyzer_sample_file_path) = extraction_templates[ANALYZER_TEMPLATE]

## Create Azure AI Content Understanding Client

> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class providing functions to interact with the Content Understanding API. Prior to the official release of the Content Understanding SDK, it serves as a lightweight SDK.

In [None]:
import logging
import json
import os
import sys
from pathlib import Path
from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# Load environment variables from a .env file if present
load_dotenv(find_dotenv())

logging.basicConfig(level=logging.INFO)

AZURE_AI_ENDPOINT = os.getenv("AZURE_AI_ENDPOINT")
AZURE_AI_API_VERSION = os.getenv("AZURE_AI_API_VERSION", "2025-05-01-preview")

# Add the parent directory to the system path to import shared modules
parent_dir = Path.cwd().parent
sys.path.append(str(parent_dir))

from python.content_understanding_client import AzureContentUnderstandingClient

# Authenticate using DefaultAzureCredential
credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

# Initialize the Content Understanding client
client = AzureContentUnderstandingClient(
    endpoint=AZURE_AI_ENDPOINT,
    api_version=AZURE_AI_API_VERSION,
    token_provider=token_provider,
    # Uncomment the following line to enable usage telemetry for this sample
    # x_ms_useragent="azure-ai-content-understanding-python/field_extraction",
)

## Create Analyzer from the Template

In [None]:
# Generate a unique analyzer ID
CUSTOM_ANALYZER_ID = "field-extraction-sample-" + str(uuid.uuid4())

# Begin creation of the analyzer using the selected template
response = client.begin_create_analyzer(CUSTOM_ANALYZER_ID, analyzer_template_path=analyzer_template_path)

# Poll the result until the creation operation completes
result = client.poll_result(response)

print(json.dumps(result, indent=2))

## Extract Fields Using the Analyzer

Once the analyzer is successfully created, you can use it to analyze your input files.

In [None]:
# Begin analysis of the sample file using the created analyzer
response = client.begin_analyze(CUSTOM_ANALYZER_ID, file_location=analyzer_sample_file_path)

# Poll the result until the analysis operation completes
result_json = client.poll_result(response)

print(json.dumps(result_json, indent=2))

## Clean Up
Optionally, delete the sample analyzer from your resource. Typically, you would analyze multiple files using the same analyzer without deleting it each time.

In [None]:
client.delete_analyzer(CUSTOM_ANALYZER_ID)