# Manage Analyzers in Your Resource

This notebook demo how to create a simple analyzer and manage its lifecycle.

Source: https://github.com/Azure-Samples/azure-ai-content-understanding-python.git
further expanded for training purposes

## Create Azure AI Content Understanding Client

> The AzureContentUnderstandingClient is a utility class containing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK.


In [1]:
import logging
import json
import os
import sys
from pathlib import Path
from dotenv import find_dotenv, load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

In [2]:
load_dotenv(override=True, dotenv_path=find_dotenv())

True

In [3]:
logging.basicConfig(level=logging.INFO)

In [4]:
AZURE_AI_ENDPOINT = os.getenv("AZURE_CU_ENDPOINT_NEW")
AZURE_AI_API_VERSION = os.getenv("AZURE_CU_API_VERSION_NEW", "2025-05-01-preview")
print(f"Current Azure Content Understanding endpoint: {AZURE_AI_ENDPOINT}")
print(f"Current Azure Content Understanding API version: {AZURE_AI_API_VERSION}")

Current Azure Content Understanding endpoint: https://epaifhub6672084982.cognitiveservices.azure.com/
Current Azure Content Understanding API version: 2025-05-01-preview


In [5]:
# only if necessary, add the parent directory to the path to use shared modules
# parent_dir = Path(Path.cwd()).parent
# sys.path.append(str(parent_dir))

# import the utility class AzureContentUnderstandingClient, which is a wrapper around the Azure Content Understanding REST API client
from python.content_understanding_client_NEW import AzureContentUnderstandingClient

In [6]:
credential = DefaultAzureCredential()
token_provider = get_bearer_token_provider(credential, "https://cognitiveservices.azure.com/.default")

INFO:azure.identity._credentials.environment:No environment configuration found.
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS


In [7]:
# As an alternative to the DefaultAzureCredential, you can register an App in Entra ID and use it client secret
# from azure.identity import ClientSecretCredential

# credential = ClientSecretCredential(
#     tenant_id=os.getenv("TENANT_ID"), 
#     client_id=os.getenv("CLIENT_ID"), 
#     client_secret=os.getenv("CLIENT_SECRET") 
# )

# # Token provider compatibile
# def token_provider(scopes=None, **kwargs):
#     if scopes is None:
#         scopes = ["https://cognitiveservices.azure.com/.default"] # original value
#     token = credential.get_token(*scopes)
#     return token.token

# # scopes = ["https://cognitiveservices.azure.com/.default"] # original value
# # token = credential.get_token(*scopes)
# # print(token.token)

In [8]:
# consider running az login
client = AzureContentUnderstandingClient(
    endpoint=AZURE_AI_ENDPOINT,
    api_version=AZURE_AI_API_VERSION,
    token_provider=token_provider,
    x_ms_useragent="azure-ai-content-understanding-python/analyzer_management", # This header is used for sample usage telemetry, please comment out this line if you want to opt out.
)


# This cell printout INFO level logs about the token acquisition process (e.g. AzCli vs ClientSecretCredential)
# INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token from AzureCliCredential
# INFO:azure.identity._internal.get_token_mixin:ClientSecretCredential.get_token succeeded

INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=REDACTED&resource=REDACTED'
Request method: 'GET'
Request headers:
    'User-Agent': 'azsdk-python-identity/1.23.0 Python/3.13.4 (Windows-11-10.0.26100-SP0)'
No body was attached to the request
INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token from AzureCliCredential


## Create a simple analyzer
We first create an analyzer from a template to extract invoice fields.

In [31]:
import uuid

ANALYZER_TEMPLATE = "analyzer_templates_NEw/call_recording_analytics.json"
CUSTOM_ANALYZER_ID = "analyzer-management-sample-" + str(uuid.uuid4())

response = client.begin_create_analyzer(CUSTOM_ANALYZER_ID, analyzer_template_path=ANALYZER_TEMPLATE)
result = client.poll_result(response)

print(json.dumps(result, indent=2))

INFO:python.content_understanding_client_NEW:Analyzer analyzer-management-sample-b5439aca-f0a7-49ff-b504-e0750f339214 create request accepted.
INFO:python.content_understanding_client_NEW:Request result is ready after 0.00 seconds.


{
  "id": "1666cdb7-63f8-4067-a705-448ddedfeb61",
  "status": "Succeeded",
  "result": {
    "analyzerId": "analyzer-management-sample-b5439aca-f0a7-49ff-b504-e0750f339214",
    "description": "Sample call recording analytics",
    "createdAt": "2025-06-14T09:01:33Z",
    "lastModifiedAt": "2025-06-14T09:01:33Z",
    "baseAnalyzerId": "prebuilt-callCenter",
    "config": {
      "locales": [
        "en-US"
      ],
      "returnDetails": true,
      "disableContentFiltering": false
    },
    "fieldSchema": {
      "fields": {
        "Summary": {
          "type": "string",
          "method": "generate",
          "description": "A one-paragraph summary"
        },
        "Topics": {
          "type": "array",
          "method": "generate",
          "description": "Top 5 topics mentioned",
          "items": {
            "type": "string"
          }
        },
        "Companies": {
          "type": "array",
          "method": "generate",
          "description": "List of comp

## List all analyzers created in your resource

After the analyzer is successfully created, we can use it to analyze our input files.

In [32]:
all_analyzers = client.get_all_analyzers()
print(f"Number of analyzers in your resource: {len(all_analyzers['value'])}")
# print(f"The first 3 analyzer details: {json.dumps(all_analyzers['value'][:3], indent=2)}")
print(f"The last analyzer details: {json.dumps(all_analyzers['value'][:-1], indent=2)}")

Number of analyzers in your resource: 22
The last analyzer details: [
  {
    "analyzerId": "prebuilt-callCenter",
    "description": "Analyze call center conversations to extract transcripts, summaries, sentiment, and more.",
    "createdAt": "2025-05-01T00:00:00Z",
    "config": {
      "returnDetails": true,
      "disableContentFiltering": false
    },
    "fieldSchema": {
      "name": "PostCallAnalytics",
      "fields": {
        "Summary": {
          "type": "string",
          "description": "A one-paragraph summary"
        },
        "Topics": {
          "type": "array",
          "description": "Top 5 topics mentioned",
          "items": {
            "type": "string"
          }
        },
        "Companies": {
          "type": "array",
          "description": "List of companies mentioned",
          "items": {
            "type": "string"
          }
        },
        "People": {
          "type": "array",
          "description": "List of people mentioned",
      

In [33]:
import pandas as pd
df_analyzers = pd.json_normalize(all_analyzers['value'])
df_analyzers_sorted = df_analyzers.sort_values(by='createdAt', ascending=False)
df_analyzers_sorted

Unnamed: 0,analyzerId,description,createdAt,warnings,status,processingLocation,mode,config.returnDetails,config.disableContentFiltering,fieldSchema.name,...,fieldSchema.fields.PONumber.type,fieldSchema.fields.PONumber.method,fieldSchema.fields.PONumber.description,fieldSchema.fields.TotalAmount.type,fieldSchema.fields.TotalAmount.method,fieldSchema.fields.TotalAmount.description,fieldSchema.fields.InvoiceTotal.description,fieldSchema.fields.RemittanceAddressRecipient.description,fieldSchema.fields.ServiceAddressRecipient.description,fieldSchema.fields.ShippingAddressRecipient.description
10,analyzer-management-sample-b5439aca-f0a7-49ff-...,Sample call recording analytics,2025-06-14T09:01:33Z,[],ready,geography,standard,True,False,,...,,,,,,,,,,
20,po-simple-analyzer,,2025-06-13T17:15:01Z,[],ready,geography,standard,True,False,,...,,,,number,extract,"The total order amount to be paid, tax included",,,,
16,invoice-analyzer,Invoice analyzer,2025-06-13T15:45:49Z,[],ready,geography,standard,True,False,,...,,,,,,,,,,
18,invoice-po-wreference-files,,2025-06-13T14:24:12Z,[],ready,global,pro,True,False,,...,string,generate,,,,,,,,
17,invoice-po-matching-analyzer,,2025-06-13T14:08:15Z,[],ready,global,pro,True,False,,...,string,generate,,,,,,,,
8,analyzer-invoice-sample-2025-new,,2025-06-12T16:42:30Z,[],ready,geography,standard,True,False,,...,,,,,,,,,,
11,analyzer-prebuilt-invoice-sample-2025,,2025-06-12T16:00:14Z,[],ready,geography,standard,True,False,,...,,,,,,,,,,
9,analyzer-invoice-sample-2025,,2025-06-12T15:47:21Z,[],ready,geography,standard,True,False,,...,,,,,,,,,,
7,analyzer-002-my-demo-invoice,,2025-06-12T14:31:19Z,[],ready,geography,standard,True,False,,...,,,,,,,,,,
6,analyzer-001-build-demo-invoice,,2025-06-12T14:26:18Z,[],ready,geography,standard,True,False,,...,,,,,,,,,,


## Get analyzer details with id

Remember the analyzer id when you create it. You can use the id to look up detail analyzer definitions afterwards.

In [34]:
result_json = client.get_analyzer_detail_by_id(CUSTOM_ANALYZER_ID)
print(json.dumps(result_json, indent=2))

{
  "analyzerId": "analyzer-management-sample-b5439aca-f0a7-49ff-b504-e0750f339214",
  "description": "Sample call recording analytics",
  "createdAt": "2025-06-14T09:01:33Z",
  "lastModifiedAt": "2025-06-14T09:01:33Z",
  "baseAnalyzerId": "prebuilt-callCenter",
  "config": {
    "locales": [
      "en-US"
    ],
    "returnDetails": true,
    "disableContentFiltering": false
  },
  "fieldSchema": {
    "fields": {
      "Summary": {
        "type": "string",
        "method": "generate",
        "description": "A one-paragraph summary"
      },
      "Topics": {
        "type": "array",
        "method": "generate",
        "description": "Top 5 topics mentioned",
        "items": {
          "type": "string"
        }
      },
      "Companies": {
        "type": "array",
        "method": "generate",
        "description": "List of companies mentioned",
        "items": {
          "type": "string"
        }
      },
      "People": {
        "type": "array",
        "method": "gene

## Delete Analyzer
If you don't need an analyzer anymore, delete it with its id.

In [35]:
client.delete_analyzer(CUSTOM_ANALYZER_ID)

INFO:python.content_understanding_client_NEW:Analyzer analyzer-management-sample-b5439aca-f0a7-49ff-b504-e0750f339214 deleted.


<Response [204]>

### Delete Analyzer by prefix

In [None]:
# delete all analyzers which ANALYZER_ID start with a specific prefix
prefix = "auto-labeling-model"

for analyzer in all_analyzers['value']:
    if analyzer['analyzerId'].startswith(prefix):
        print(f"Deleting analyzer: {analyzer['analyzerId']}")
        client.delete_analyzer(analyzer['analyzerId'])
        print(f"Deleted analyzer: {analyzer['analyzerId']}")
    else:
        print(f"Skipping analyzer: {analyzer['analyzerId']}")

### Delete Analyzer by datetime

In [27]:
# delete all analyzers which ANALYZER_ID created between start and end datetime
from datetime import datetime, timezone

#2025-06-12T13:10:41Z
start_datetime_str = "2025-06-12T13:00:00Z"
end_datetime_str = "2025-06-12T14:00:00Z"

start_datetime = datetime.fromisoformat(start_datetime_str.replace("Z", "+00:00")).astimezone(timezone.utc)
end_datetime = datetime.fromisoformat(end_datetime_str.replace("Z", "+00:00")).astimezone(timezone.utc)

print(f"Start datetime: {start_datetime}")
print(f"End datetime: {end_datetime}")

Start datetime: 2025-06-12 13:00:00+00:00
End datetime: 2025-06-12 14:00:00+00:00


In [28]:
for analyzer in all_analyzers['value']:
    deleted_analyzers = 0
    try:
        created_at = datetime.fromisoformat(analyzer['createdAt'].replace("Z", "+00:00")).astimezone(timezone.utc)
        if start_datetime <= created_at <= end_datetime:
            print(f"Deleting analyzer: {analyzer['analyzerId']}")
            client.delete_analyzer(analyzer['analyzerId'])
            print(f"Deleted analyzer: {analyzer['analyzerId']}")
            deleted_analyzers += 1
        else:
            # print(f"Skipping analyzer: {analyzer['analyzerId']}")
            print("...")
    except KeyError:
        print(f"Skipping analyzer: {analyzer['analyzerId']} (createdAt not found)")

print(f"\nTotal deleted analyzers: {deleted_analyzers}")


...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Deleting analyzer: test


INFO:python.content_understanding_client_NEW:Analyzer test deleted.


Deleted analyzer: test

Total deleted analyzers: 1
