# Manage Analyzers in Your Resource

This notebook demo how to create a simple analyzer and manage its lifecycle.

Source: https://github.com/Azure-Samples/azure-ai-content-understanding-python.git
further expanded for training purposes

## Create Azure AI Content Understanding Client

> The [AzureContentUnderstandingClient](python/content_understanding_client.py) is a utility class containing functions to interact with the Content Understanding API. 


Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK.


In [1]:
import logging
import json
import os
import sys
from pathlib import Path
from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

In [2]:
load_dotenv(override=True, dotenv_path=find_dotenv())

True

In [3]:
logging.basicConfig(level=logging.INFO)

In [4]:
AZURE_AI_ENDPOINT = os.getenv("AZURE_CU_ENDPOINT")
AZURE_AI_API_VERSION = os.getenv("AZURE_CU_API_VERSION", "2024-12-01-preview")
print(f"Current Azure Content Understanding endpoint: {AZURE_AI_ENDPOINT}")
print(f"Current Azure Content Understanding API version: {AZURE_AI_API_VERSION}")

Current Azure Content Understanding endpoint: https://ep-ai-services.services.ai.azure.com/
Current Azure Content Understanding API version: 2024-12-01-preview


In [5]:
# only if necessary, add the parent directory to the path to use shared modules
# parent_dir = Path(Path.cwd()).parent
# sys.path.append(str(parent_dir))

# import the utility class AzureContentUnderstandingClient, which is a wrapper around the Azure Content Understanding REST API client
from python.content_understanding_client import AzureContentUnderstandingClient

In [6]:
credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

INFO:azure.identity._credentials.environment:No environment configuration found.
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS


In [None]:
# As an alternative to the DefaultAzureCredential, you can register an App in Entra ID and use it client secret
# from azure.identity import ClientSecretCredential

# credential = ClientSecretCredential(
#     tenant_id=os.getenv("TENANT_ID"), 
#     client_id=os.getenv("CLIENT_ID"), 
#     client_secret=os.getenv("CLIENT_SECRET") 
# )

# # Token provider compatibile
# def token_provider(scopes=None, **kwargs):
#     if scopes is None:
#         scopes = ["https://cognitiveservices.azure.com/.default"] # original value
#     token = credential.get_token(*scopes)
#     return token.token

# # scopes = ["https://cognitiveservices.azure.com/.default"] # original value
# # token = credential.get_token(*scopes)
# # print(token.token)

In [None]:
# consider running az login
client = AzureContentUnderstandingClient(
    endpoint=AZURE_AI_ENDPOINT,
    api_version=AZURE_AI_API_VERSION,
    token_provider=token_provider,
    x_ms_useragent="azure-ai-content-understanding-python/analyzer_management", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.
)

# This cell printout INFO level logs about the token acquisition process (e.g. AzCli vs ClientSecretCredential)
# INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token from AzureCliCredential
# INFO:azure.identity._internal.get_token_mixin:ClientSecretCredential.get_token succeeded

## Create a simple analyzer
We first create an analyzer from a template to extract invoice fields.

In [8]:
import uuid

ANALYZER_TEMPLATE = "analyzer_templates/call_recording_analytics.json"
ANALYZER_ID = "analyzer-management-sample-" + str(uuid.uuid4())

response = client.begin_create_analyzer(ANALYZER_ID, analyzer_template_path=ANALYZER_TEMPLATE)
result = client.poll_result(response)

print(json.dumps(result, indent=2))

INFO:python.content_understanding_client:Analyzer analyzer-management-sample-29ff2d49-94ac-49fb-b925-abca24f50e05 create request accepted.
INFO:python.content_understanding_client:Request 80bfaddc-4519-4a77-87f8-48503919321f in progress ...
INFO:python.content_understanding_client:Request result is ready after 2.57 seconds.


{
  "id": "80bfaddc-4519-4a77-87f8-48503919321f",
  "status": "Succeeded",
  "result": {
    "analyzerId": "analyzer-management-sample-29ff2d49-94ac-49fb-b925-abca24f50e05",
    "description": "Sample call recording analytics",
    "createdAt": "2025-04-14T08:24:00Z",
    "lastModifiedAt": "2025-04-14T08:24:03Z",
    "config": {
      "locales": [
        "en-US"
      ],
      "returnDetails": true,
      "disableContentFiltering": false
    },
    "fieldSchema": {
      "fields": {
        "Summary": {
          "type": "string",
          "method": "generate",
          "description": "A one-paragraph summary"
        },
        "Topics": {
          "type": "array",
          "method": "generate",
          "description": "Top 5 topics mentioned",
          "items": {
            "type": "string"
          }
        },
        "Companies": {
          "type": "array",
          "method": "generate",
          "description": "List of companies mentioned",
          "items": {
      

## List all analyzers created in your resource

After the analyzer is successfully created, we can use it to analyze our input files.

In [8]:
all_analyzers = client.get_all_analyzers()
print(f"Number of analyzers in your resource: {len(all_analyzers['value'])}")
print(f"First 3 analyzer details: {json.dumps(all_analyzers['value'][:3], indent=2)}")

Number of analyzers in your resource: 23
First 3 analyzer details: [
  {
    "analyzerId": "prebuilt-read",
    "description": "Extract content elements such as words, barcodes, and formulas from documents.",
    "config": {
      "returnDetails": true,
      "enableOcr": true,
      "enableLayout": false,
      "enableBarcode": false,
      "enableFormula": false
    },
    "status": "undefined",
    "scenario": "document"
  },
  {
    "analyzerId": "prebuilt-layout",
    "description": "Extract various content and layout elements such as words, paragraphs, and tables from documents.",
    "config": {
      "returnDetails": true,
      "enableOcr": true,
      "enableLayout": true,
      "enableBarcode": false,
      "enableFormula": false
    },
    "status": "undefined",
    "scenario": "document"
  },
  {
    "analyzerId": "ai-call-analytics-analyzer",
    "description": "Analyzer to get summary and sentiment from an audio file",
    "tags": {
      "projectId": "c2ccd4f0-2b3a-43dd

In [15]:
import pandas as pd
df_analyzers = pd.json_normalize(all_analyzers['value'])
df_analyzers_sorted = df_analyzers.sort_values(by='createdAt', ascending=False)
df_analyzers_sorted

Unnamed: 0,analyzerId,description,warnings,status,scenario,config.returnDetails,config.enableOcr,config.enableLayout,config.enableBarcode,config.enableFormula,...,fieldSchema.fields.TripDetails.items.properties.DestinationCity.description,fieldSchema.fields.TripDetails.items.properties.ReturnDate.type,fieldSchema.fields.TripDetails.items.properties.ReturnDate.method,fieldSchema.fields.TripDetails.items.properties.ReturnDate.description,fieldSchema.fields.Signature.type,fieldSchema.fields.Signature.method,fieldSchema.fields.Signature.description,fieldSchema.fields.InsuranceCorp.type,fieldSchema.fields.InsuranceCorp.method,fieldSchema.fields.InsuranceCorp.description
3,analyzer-management-sample-eccab73a-1aa3-4ab0-...,Sample call recording analytics,[],ready,callCenter,True,,,,,...,,,,,,,,,,
2,ai-call-analytics-analyzer,Analyzer to get summary and sentiment from an ...,[],ready,callCenter,False,,,,,...,,,,,,,,,,
12,auto-labeling-model-1743663137774-273,,[],ready,callCenter,False,,,,,...,,,,,,,,,,
11,auto-labeling-model-1743662883751-756,,[],ready,callCenter,False,,,,,...,,,,,,,,,,
10,auto-labeling-model-1743612274442-504,,[],ready,document,True,True,True,False,False,...,,,,,,,,,,
14,cu-chart-analyzer-v1,,[],ready,image,False,,,,,...,,,,,,,,,,
9,auto-labeling-model-1743611755799-918,,[],ready,image,False,,,,,...,,,,,,,,,,
13,cu-chart-analyzer-v0,,[],ready,image,False,,,,,...,,,,,,,,,,
20,test-build,my test,[],ready,document,True,True,True,False,False,...,,,,,,,,,,
8,auto-labeling-model-1743611145819-773,,[],ready,document,True,True,True,False,False,...,,,,,,,,,,


## Get analyzer details with id

Remember the analyzer id when you create it. You can use the id to look up detail analyzer definitions afterwards.

In [15]:
result = client.get_analyzer_detail_by_id(ANALYZER_ID)
print(json.dumps(result, indent=2))

{
  "analyzerId": "analyzer-management-sample-bff712bc-bcf4-4223-b47b-51318af4dba7",
  "description": "Sample call recording analytics",
  "createdAt": "2025-04-02T15:54:50Z",
  "lastModifiedAt": "2025-04-02T15:54:53Z",
  "config": {
    "locales": [
      "en-US"
    ],
    "returnDetails": true,
    "disableContentFiltering": false
  },
  "fieldSchema": {
    "fields": {
      "Summary": {
        "type": "string",
        "method": "generate",
        "description": "A one-paragraph summary"
      },
      "Topics": {
        "type": "array",
        "method": "generate",
        "description": "Top 5 topics mentioned",
        "items": {
          "type": "string"
        }
      },
      "Companies": {
        "type": "array",
        "method": "generate",
        "description": "List of companies mentioned",
        "items": {
          "type": "string"
        }
      },
      "People": {
        "type": "array",
        "method": "generate",
        "description": "List of peop

## Delete Analyzer
If you don't need an analyzer anymore, delete it with its id.

### Delete Analyzer by ID

In [16]:
client.delete_analyzer(ANALYZER_ID)

INFO:python.content_understanding_client:Analyzer analyzer-management-sample-bff712bc-bcf4-4223-b47b-51318af4dba7 deleted.


<Response [204]>

### Delete Analyzer by prefix

In [None]:
# delete all analyzers which ANALYZER_ID start with a specific prefix
prefix = "content-understanding-search-sample"

for analyzer in all_analyzers['value']:
    if analyzer['analyzerId'].startswith("content-understanding-search-sample"):
        print(f"Deleting analyzer: {analyzer['analyzerId']}")
        client.delete_analyzer(analyzer['analyzerId'])
        print(f"Deleted analyzer: {analyzer['analyzerId']}")
    else:
        print(f"Skipping analyzer: {analyzer['analyzerId']}")

Skipping analyzer: prebuilt-read
Skipping analyzer: prebuilt-layout
Skipping analyzer: ai-call-analytics-analyzer
Skipping analyzer: analyzer-management-sample-eccab73a-1aa3-4ab0-a9b6-65e7c995762f
Skipping analyzer: auto-labeling-model-1743436442370-230
Skipping analyzer: auto-labeling-model-1743437717122-749
Skipping analyzer: auto-labeling-model-1743441022830-535
Skipping analyzer: auto-labeling-model-1743610364191-908
Skipping analyzer: auto-labeling-model-1743611145819-773
Skipping analyzer: auto-labeling-model-1743611755799-918
Skipping analyzer: auto-labeling-model-1743612274442-504
Skipping analyzer: auto-labeling-model-1743662883751-756
Skipping analyzer: auto-labeling-model-1743663137774-273
Deleting analyzer: content-understanding-search-sample-220c670f-e8d7-4f3f-9efc-497e89e9da75


INFO:python.content_understanding_client:Analyzer content-understanding-search-sample-220c670f-e8d7-4f3f-9efc-497e89e9da75 deleted.


Deleted analyzer: content-understanding-search-sample-220c670f-e8d7-4f3f-9efc-497e89e9da75
Deleting analyzer: content-understanding-search-sample-6e2d4c76-f2b4-4c60-9a31-7750a9f158d8


INFO:python.content_understanding_client:Analyzer content-understanding-search-sample-6e2d4c76-f2b4-4c60-9a31-7750a9f158d8 deleted.


Deleted analyzer: content-understanding-search-sample-6e2d4c76-f2b4-4c60-9a31-7750a9f158d8
Deleting analyzer: content-understanding-search-sample-9405e723-3b1e-4b23-a4f5-7f2056db564b


INFO:python.content_understanding_client:Analyzer content-understanding-search-sample-9405e723-3b1e-4b23-a4f5-7f2056db564b deleted.


Deleted analyzer: content-understanding-search-sample-9405e723-3b1e-4b23-a4f5-7f2056db564b
Deleting analyzer: content-understanding-search-sample-97062efd-e55f-4a1a-b276-1dd504e5d6da


INFO:python.content_understanding_client:Analyzer content-understanding-search-sample-97062efd-e55f-4a1a-b276-1dd504e5d6da deleted.


Deleted analyzer: content-understanding-search-sample-97062efd-e55f-4a1a-b276-1dd504e5d6da
Deleting analyzer: content-understanding-search-sample-a6bb3814-da7a-441b-98d4-be772418a29a


INFO:python.content_understanding_client:Analyzer content-understanding-search-sample-a6bb3814-da7a-441b-98d4-be772418a29a deleted.


Deleted analyzer: content-understanding-search-sample-a6bb3814-da7a-441b-98d4-be772418a29a
Deleting analyzer: content-understanding-search-sample-ba6245bb-e712-442e-b4a5-b1623e61c54a


INFO:python.content_understanding_client:Analyzer content-understanding-search-sample-ba6245bb-e712-442e-b4a5-b1623e61c54a deleted.


Deleted analyzer: content-understanding-search-sample-ba6245bb-e712-442e-b4a5-b1623e61c54a
Skipping analyzer: cu-chart-analyzer-v0
Skipping analyzer: cu-chart-analyzer-v1
Skipping analyzer: invoice-analyzer-v0
Skipping analyzer: lab-receipt-analyzer-v2
Skipping analyzer: lab-receipt-analyzer
Skipping analyzer: my-invoice-analyzer
Skipping analyzer: people-detection-analyzer
Skipping analyzer: test-build
Skipping analyzer: travel-insurance-analyzer-v2
Skipping analyzer: travel-insurance-analyzer


### Delete Analyzer by datetime

In [16]:
# delete all analyzers which ANALYZER_ID created after a specific date time
from datetime import datetime, timezone
datetime_str = "2025-04-14T00:00:00Z"

specific_datetime = datetime.fromisoformat(datetime_str.replace("Z", "+00:00")).astimezone(timezone.utc)

# print current datetime in UTC
current_datetime = datetime.now(timezone.utc)
print(f"Current datetime in UTC: {current_datetime}")

for analyzer in all_analyzers['value']:
    try:
        created_at = datetime.fromisoformat(analyzer['createdAt'].replace("Z", "+00:00")).astimezone(timezone.utc)
        if created_at > specific_datetime:
            print(f"Deleting analyzer: {analyzer['analyzerId']}")
            client.delete_analyzer(analyzer['analyzerId'])
            print(f"Deleted analyzer: {analyzer['analyzerId']}")
        else:
            print(f"Skipping analyzer: {analyzer['analyzerId']}")
    except KeyError:
        print(f"Skipping analyzer: {analyzer['analyzerId']} (createdAt not found)")

Current datetime in UTC: 2025-04-14 08:38:09.292263+00:00
Skipping analyzer: prebuilt-read (createdAt not found)
Skipping analyzer: prebuilt-layout (createdAt not found)
Skipping analyzer: ai-call-analytics-analyzer
Skipping analyzer: analyzer-management-sample-eccab73a-1aa3-4ab0-a9b6-65e7c995762f
Skipping analyzer: auto-labeling-model-1743436442370-230
Skipping analyzer: auto-labeling-model-1743437717122-749
Skipping analyzer: auto-labeling-model-1743441022830-535
Skipping analyzer: auto-labeling-model-1743610364191-908
Skipping analyzer: auto-labeling-model-1743611145819-773
Skipping analyzer: auto-labeling-model-1743611755799-918
Skipping analyzer: auto-labeling-model-1743612274442-504
Skipping analyzer: auto-labeling-model-1743662883751-756
Skipping analyzer: auto-labeling-model-1743663137774-273
Skipping analyzer: cu-chart-analyzer-v0
Skipping analyzer: cu-chart-analyzer-v1
Skipping analyzer: invoice-analyzer-v0
Skipping analyzer: lab-receipt-analyzer-v2
Skipping analyzer: lab-rec