# Extract fields through custom fields analyzer

These code snippets demonstrates how to use custom fields analyzer to extracting document content with suitable steps order. This code snippets focuse more on the sequential process, so that you could quickly overview the process of extracting document content through custom fields analyzer. 

## Prerequisites
Link to environment creation

In [38]:
%pip install -r ../requirements.txt

1840.53s - pydevd: Sending message related to process being replaced timed-out after 5 seconds


Defaulting to user installation because normal site-packages is not writeable
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Analyzer template examples

Here we provide a list of analyzer template examples that can be used to extract fields from different input file types.

In [39]:
extraction_samples = {
    "sample_invoice": ('../analyzer_templates/sample_invoice_analyzer.json', '../data/invoice.pdf'),
    "sample_chart": ('../analyzer_templates/sample_chart_analyzer.json', '../data/pieChart.jpg'),
    "sample_call_transcript": ('../analyzer_templates/sample_call_transcript_analyzer.json', '../data/callCenterRecording.mp3'),
    "sample_marketing_video": ('../analyzer_templates/sample_marketing_video_analyzer.json', '../data/video.mp4')
}

Set the target to the sample analyzer that you want to try.

In [40]:
target_sample = "sample_marketing_video"

## Create Azure content understanding client
>The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is utility Class which contain the functions to interact with the Content Understanding server. Before Content Understanding SDK release, we can regard it as a lightweight SDK. Fill the constant **AZURE_CU_ENDPOINT**, **AZURE_CU_API_VERSION**, **AZURE_CU_API_KEY** with the information from your Azure AI Content Understanding Service.

In [41]:
import logging
import json
import os
import sys
from dotenv import find_dotenv, load_dotenv

# import utility package from python samples root directory
py_samples_root_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(py_samples_root_dir)
from python.content_understanding_client import AzureContentUnderstandingClient

load_dotenv(find_dotenv())
logging.basicConfig(level=logging.INFO)

client = AzureContentUnderstandingClient(
    endpoint=os.getenv("AZURE_CU_ENDPOINT"),
    api_version=os.getenv("AZURE_CU_API_VERSION", "2024-12-01-preview"),
    subscription_key=os.getenv("AZURE_CU_API_KEY"),
    api_token=os.getenv("AZURE_CU_API_TOKEN"),
)

## Create analyzer with defined schema
Before creating the custom fields analyzer, you should fill the constant ANALYZER_ID with a business-related name. Here we randomly generate a name for demo purpose.

In [42]:
import uuid
ANALYZER_ID = "extraction-sample-" + str(uuid.uuid4())

response = client.begin_create_analyzer(ANALYZER_ID, analyzer_schema_path=extraction_samples[target_sample][0])
result = client.poll_result(response)

logging.info(json.dumps(result, indent=2))

INFO:python.content_understanding_client:Analyzer extraction-sample-68d07283-1e36-47bc-9460-a9cff6a28030 create request accepted.
INFO:python.content_understanding_client:Request result is ready after 0.00 seconds.
INFO:root:{
  "id": "9ca25b92-ebc1-4d84-a4cc-adca4f4a9fdf",
  "status": "Succeeded",
  "result": {
    "analyzerId": "extraction-sample-68d07283-1e36-47bc-9460-a9cff6a28030",
    "description": "Sample marketing video analyzer",
    "createdAt": "2024-12-06T22:50:58Z",
    "lastModifiedAt": "2024-12-06T22:50:58Z",
    "config": {
      "locales": [
        "en-US",
        "es-ES",
        "es-MX",
        "fr-FR",
        "hi-IN",
        "it-IT",
        "ja-JP",
        "ko-KR",
        "pt-BR",
        "zh-CN"
      ],
      "returnDetails": true,
      "enableFace": false
    },
    "fieldSchema": {
      "fields": {
        "Description": {
          "type": "string",
          "description": "Detailed summary of the video segment, focusing on product characteristics, 

## Use created analyzer to extract document content


After the analyzer is successfully created, we can use it to analyze our input files.

In [43]:
response = client.begin_analyze(ANALYZER_ID, file_location=extraction_samples[target_sample][1])
result = client.poll_result(response)

logging.info(json.dumps(result, indent=2))

INFO:python.content_understanding_client:Analyzing file ../data/video.mp4 with analyzer: extraction-sample-68d07283-1e36-47bc-9460-a9cff6a28030
INFO:python.content_understanding_client:Request https://chetho-swe-ai-2.openai.azure.com/contentunderstanding/analyzers/extraction-sample-68d07283-1e36-47bc-9460-a9cff6a28030/results/5f26f11c-a2f7-4178-a93c-b1cf94740ed5?api-version=2024-12-01-preview in progress ...
INFO:python.content_understanding_client:Request https://chetho-swe-ai-2.openai.azure.com/contentunderstanding/analyzers/extraction-sample-68d07283-1e36-47bc-9460-a9cff6a28030/results/5f26f11c-a2f7-4178-a93c-b1cf94740ed5?api-version=2024-12-01-preview in progress ...
INFO:python.content_understanding_client:Request https://chetho-swe-ai-2.openai.azure.com/contentunderstanding/analyzers/extraction-sample-68d07283-1e36-47bc-9460-a9cff6a28030/results/5f26f11c-a2f7-4178-a93c-b1cf94740ed5?api-version=2024-12-01-preview in progress ...
INFO:python.content_understanding_client:Request htt

## Delete exist analyzer in AI Understanding Content Service
This snippet is not required, but it's only used to prevent the testing analyzer from residing in your service. The custom fields analyzer could be stored in your service for reusing by subsequent business in real usage scenarios.



In [44]:
client.delete_analyzer(ANALYZER_ID)

INFO:python.content_understanding_client:Analyzer extraction-sample-68d07283-1e36-47bc-9460-a9cff6a28030 deleted.


<Response [204]>