# Extract Content from Your File

This notebook demonstrates how to use the Content Understanding API to extract semantic content from multimodal files.

## Prerequisites
1. Ensure your Azure AI service is configured by following the [configuration steps](../README.md#configure-azure-ai-service-resource).
2. Install the required packages to run this sample.

In [None]:
%pip install -r ../requirements.txt

## Create Azure AI Content Understanding Client

> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class that provides functions to interact with the Content Understanding API. Prior to the official release of the Content Understanding SDK, it serves as a lightweight SDK.
>
> Fill in the constants **AZURE_AI_ENDPOINT**, **AZURE_AI_API_VERSION**, and **AZURE_AI_API_KEY** with the details from your Azure AI Service.

> ⚠️ Important:
You must update the code below to use your preferred Azure authentication method.
Look for the `# IMPORTANT` comments in the code and modify those sections accordingly.
Skipping this step may cause the sample to not run correctly.

> ⚠️ Note: While using a subscription key is supported, it is strongly recommended to use a token provider with Azure Active Directory (AAD) for enhanced security in production environments.

In [None]:
import logging
import json
import os
from pathlib import Path
import sys
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.identity.aio import DefaultAzureCredential
from azure.ai.contentunderstanding.aio import ContentUnderstandingClient
from azure.ai.contentunderstanding.models import (
    AnalyzeResult,
    ContentAnalyzer,
    ContentAnalyzerConfig,
    AnalysisMode,
    ProcessingLocation,
    AudioVisualContent,
)
from datetime import datetime
from typing import Any
import uuid

# Add the parent directory to the Python path to import the sample_helper module
sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'python'))
from extension.sample_helper import (
    extract_operation_id_from_poller,
    PollerType,
    save_json_to_file,
    save_keyframe_image_to_file,
)

load_dotenv()
logging.basicConfig(level=logging.INFO)

endpoint = os.environ.get("AZURE_CONTENT_UNDERSTANDING_ENDPOINT")
# Return AzureKeyCredential if AZURE_CONTENT_UNDERSTANDING_KEY is set, otherwise DefaultAzureCredential
key = os.getenv("AZURE_CONTENT_UNDERSTANDING_KEY")
credential = AzureKeyCredential(key) if key else DefaultAzureCredential()
# Create the ContentUnderstandingClient
client = ContentUnderstandingClient(endpoint=endpoint, credential=credential)

## Document Content

The Content Understanding API extracts all textual content from a given document file. In addition to text extraction, it performs a comprehensive layout analysis to identify and categorize tables and figures within the document. The output is presented in a structured markdown format, ensuring clarity and ease of use.

In [None]:
analyzer_sample_file = '../data/invoice.pdf'
analyzer_id = 'prebuilt-documentAnalyzer'

with open(analyzer_sample_file, "rb") as f:
    pdf_bytes = f.read()

print(f"🔍 Analyzing {analyzer_sample_file} with prebuilt-documentAnalyzer...")
poller = await client.content_analyzers.begin_analyze_binary(
    analyzer_id=analyzer_id,
    input=pdf_bytes,
    content_type="application/pdf"
)
result: AnalyzeResult = await poller.result()

> The markdown output contains detailed layout information, which is especially useful for Retrieval-Augmented Generation (RAG) scenarios. You can paste the markdown into a viewer such as Visual Studio Code to preview the layout structure.

In [None]:
print("\n📄 Markdown Content:")
print("=" * 50)
content = result.contents[0]
print(content.markdown)
print("=" * 50)

> You can access layout information including `words` and `lines` within the `pages` node, paragraph details under `paragraphs`, and tables listed in the `tables` section.

In [None]:
print(json.dumps(result.as_dict(), indent=2))

## Audio Content
The API provides detailed analysis of spoken language, enabling developers to build applications such as voice recognition, customer service analytics, and conversational AI. The output structure facilitates extraction and analysis of different conversation components for further processing or insights.

Key features include:
1. **Speaker Identification:** Each phrase is linked to a specific speaker (e.g., "Speaker 2"), enabling clear differentiation in multi-participant conversations.
2. **Timing Information:** Each transcription includes precise timing data.
    - startTimeMs: The time (in milliseconds) when the phrase begins.
    - endTimeMs: The time (in milliseconds) when the phrase ends.
    This information is crucial for applications like video subtitles and audio-text synchronization.
3. **Text Content:** The actual spoken text, such as "Thank you for calling Woodgrove Travel," representing the main transcription.
4. **Confidence Score:** Each phrase has a confidence score (e.g., 0.933) indicating transcription reliability.
5. **Word-Level Breakdown:** Detailed timing for each word supports advanced speech analysis and improvements in speech recognition.
6. **Locale Specification:** The locale (e.g., "en-US") informs the transcription process of regional dialects and pronunciation nuances.

In [None]:
analyzer_id = f"audio-sample-{datetime.now().strftime('%Y%m%d')}-{datetime.now().strftime('%H%M%S')}-{uuid.uuid4().hex[:8]}"

# Create a marketing video analyzer using object model
print(f"🔧 Creating marketing video analyzer '{analyzer_id}'...")

audio_analyzer = ContentAnalyzer(
    base_analyzer_id="prebuilt-audioAnalyzer",
    config=ContentAnalyzerConfig(return_details=True),
    description="Marketing audio analyzer for result file demo",
    mode=AnalysisMode.STANDARD,
    processing_location=ProcessingLocation.GLOBAL,
    tags={"demo_type": "audio_analysis"},
)

 # Start the analyzer creation operation
poller = await client.content_analyzers.begin_create_or_replace(
    analyzer_id=analyzer_id,
    resource=audio_analyzer,
)

# Extract operation ID from the poller
operation_id = extract_operation_id_from_poller(
    poller, PollerType.ANALYZER_CREATION
)
print(f"📋 Extracted creation operation ID: {operation_id}")

# Wait for the analyzer to be created
print(f"⏳ Waiting for analyzer creation to complete...")
await poller.result()
print(f"✅ Analyzer '{analyzer_id}' created successfully!")

# Analyze audio file with the created analyzer
audio_file_path = "../data/audio.wav"
print(f"🔍 Analyzing audio file from path: {audio_file_path} with analyzer '{analyzer_id}'...")

with open(audio_file_path, "rb") as f:
    audio_data = f.read()

# Begin audio analysis operation
print(f"🎬 Starting audio analysis with analyzer '{analyzer_id}'...")
analysis_poller = await client.content_analyzers.begin_analyze_binary(
    analyzer_id=analyzer_id,
    input=audio_data,
    content_type="application/octet-stream",
)

 # Wait for analysis completion
print(f"⏳ Waiting for audio analysis to complete...")
analysis_result = await analysis_poller.result()
print(f"✅ Audio analysis completed successfully!")
print(f"📊 Analysis Results: {json.dumps(analysis_result.as_dict(), indent=2)}")

# Clean up the created analyzer (demo cleanup)
print(f"🗑️  Deleting analyzer '{analyzer_id}' (demo cleanup)...")
await client.content_analyzers.delete(analyzer_id=analyzer_id)
print(f"✅ Analyzer '{analyzer_id}' deleted successfully!")

## Video Content
The video output provides detailed metadata about audiovisual content, specifically video shots. Key features include:

1. **Shot Information:** Each shot has a start and end time with a unique identifier. For example, 'Shot 0:0.0 to 0:2.800' includes a transcript and key frames.
2. **Transcript:** Audio transcripts formatted in WEBVTT facilitate synchronization with video playback. It captures spoken content and specifies the timing of the dialogue.
3. **Key Frames:** A collection of key frames (images) represent important moments in the video, allowing visualization of specific timestamps.
4. **Description:** Each shot includes a descriptive summary, providing context about the visuals presented. This helps in understanding the scene or subject matter without watching the video.
5. **Audio Visual Metadata:** Information such as video dimensions (width, height), type (audiovisual), and key frame timestamps.
6. **Transcript Phrases:** Specific dialog phrases with timing and speaker attribution enhance usability for applications like closed captioning and search.

In [None]:
analyzer_id = f"video-sample-{datetime.now().strftime('%Y%m%d')}-{datetime.now().strftime('%H%M%S')}-{uuid.uuid4().hex[:8]}"

video_analyzer = ContentAnalyzer(
    base_analyzer_id='prebuilt-videoAnalyzer', 
    config=ContentAnalyzerConfig(return_details=True), 
    description="Marketing video analyzer for result file demo", 
    mode=AnalysisMode.STANDARD,
    processing_location=ProcessingLocation.GLOBAL,
    tags={"demo_type": "video_analysis"}
)

# Start the analyzer creation operation
poller = await client.content_analyzers.begin_create_or_replace(
    analyzer_id=analyzer_id,
    resource=video_analyzer,
)

 # Extract operation ID from the poller
operation_id = extract_operation_id_from_poller(
    poller, PollerType.ANALYZER_CREATION
)
print(f"📋 Extracted creation operation ID: {operation_id}")

# Wait for the analyzer to be created
print(f"⏳ Waiting for analyzer creation to complete...")
await poller.result()
print(f"✅ Analyzer '{analyzer_id}' created successfully!")

# Use the FlightSimulator.mp4 video file from remote location
video_file_path = "../data/FlightSimulator.mp4"
print(f"📹 Using video file from URL: {video_file_path}")

with open(video_file_path, "rb") as f:
    video_data = f.read()

# Begin video analysis operation
print(f"🎬 Starting video analysis with analyzer '{analyzer_id}'...")
analysis_poller = await client.content_analyzers.begin_analyze_binary(
    analyzer_id=analyzer_id,
    input=video_data,
    content_type="application/octet-stream"
)

# Wait for analysis completion
print(f"⏳ Waiting for video analysis to complete...")
analysis_result = await analysis_poller.result()
print(json.dumps(analysis_result.as_dict(), indent=2))
print(f"✅ Video analysis completed successfully!")

# Extract operation ID for get_result_file
analysis_operation_id = extract_operation_id_from_poller(
    analysis_poller, PollerType.ANALYZE_CALL
)
print(f"📋 Extracted analysis operation ID: {analysis_operation_id}")

# Get the result to see what files are available
print(f"🔍 Getting analysis result to find available files...")
operation_status = await client.content_analyzers.get_result(
    operation_id=analysis_operation_id,
)

# The actual analysis result is in operation_status.result
operation_result: Any = operation_status.result
if operation_result is None:
    print("⚠️  No analysis result available")
else:
    print(f"✅ Analysis result contains {len(operation_result.contents)} contents")

# Look for keyframe times in the analysis result
keyframe_times_ms: list[int] = []
for content in operation_result.contents:
    if isinstance(content, AudioVisualContent):
        video_content: AudioVisualContent = content
        print(f"KeyFrameTimesMs: {video_content.key_frame_times_ms}")
        print(video_content)
        keyframe_times_ms.extend(video_content.key_frame_times_ms or [])
        print(f"📹 Found {len(keyframe_times_ms)} keyframes in video content")
        break
    else:
        print(f"Content is not an AudioVisualContent: {content}")

if not keyframe_times_ms:
    print("⚠️  No keyframe times found in the analysis result")
else:
    print(f"🖼️  Found {len(keyframe_times_ms)} keyframe times in milliseconds")

# Build keyframe filenames using the time values
keyframe_files = [f"keyFrame.{time_ms}" for time_ms in keyframe_times_ms]

# Download and save a few keyframe images as examples (first, middle, last)
if len(keyframe_files) >= 3:
    frames_to_download = {
        keyframe_files[0],
        keyframe_files[-1],
        keyframe_files[len(keyframe_files) // 2],
    }
else:
    frames_to_download = set(keyframe_files)

files_to_download = list(frames_to_download)
print(
    f"📥 Downloading {len(files_to_download)} keyframe images as examples: {files_to_download}"
)

for keyframe_id in files_to_download:
    print(f"📥 Getting result file: {keyframe_id}")

    # Get the result file (keyframe image)
    response: Any = await client.content_analyzers.get_result_file(
        operation_id=analysis_operation_id,
        path=keyframe_id,
    )

    # Handle the response which may be bytes or an async iterator of bytes
    if isinstance(response, (bytes, bytearray)):
        image_content = bytes(response)
    else:
        chunks: list[bytes] = []
        async for chunk in response:
            chunks.append(chunk)
        image_content = b"".join(chunks)

    print(
        f"✅ Retrieved image file for {keyframe_id} ({len(image_content)} bytes)"
    )

    # Save the image file
    saved_file_path = save_keyframe_image_to_file(
        image_content=image_content,
        keyframe_id=keyframe_id,
        test_name="content_analyzers_get_result_file",
        test_py_file_dir=os.getcwd(),
        identifier=analyzer_id,
    )
    print(f"💾 Keyframe image saved to: {saved_file_path}")

# Clean up the created analyzer (demo cleanup)
print(f"🗑️  Deleting analyzer '{analyzer_id}' (demo cleanup)...")
await client.content_analyzers.delete(analyzer_id=analyzer_id)
print(f"✅ Analyzer '{analyzer_id}' deleted successfully!")

## Video Content with Face Recognition
This is a gated feature. To enable it, please follow the registration process outlined in [Azure AI Resource Face Gating](https://learn.microsoft.com/en-us/legal/cognitive-services/computer-vision/limited-access-identity?context=%2Fazure%2Fai-services%2Fcomputer-vision%2Fcontext%2Fcontext#registration-process).
In the registration form, select:
`[Video Indexer] Facial Identification (1:N or 1:1 matching)` to search for faces within media or entertainment video archives and generate metadata for these use cases.

In [None]:
analyzer_id = f"video-sample-{datetime.now().strftime('%Y%m%d')}-{datetime.now().strftime('%H%M%S')}-{uuid.uuid4().hex[:8]}"

# Create a marketing video analyzer using object model
print(f"🔧 Creating marketing video analyzer '{analyzer_id}'...")

video_analyzer = ContentAnalyzer(
    base_analyzer_id='prebuilt-videoAnalyzer',
    config=ContentAnalyzerConfig(
        return_details=True,
    ),
    description="Marketing video analyzer for result file demo",
    mode=AnalysisMode.STANDARD,
    processing_location=ProcessingLocation.GLOBAL,
    tags={"demo_type": "video_analysis"},
)

# Start the analyzer creation operation
poller = await client.content_analyzers.begin_create_or_replace(
    analyzer_id=analyzer_id,
    resource=video_analyzer,
)

# Extract operation ID from the poller
operation_id = extract_operation_id_from_poller(
    poller, PollerType.ANALYZER_CREATION
)
print(f"📋 Extracted creation operation ID: {operation_id}")

# Wait for the analyzer to be created
print(f"⏳ Waiting for analyzer creation to complete...")
await poller.result()
print(f"✅ Analyzer '{analyzer_id}' created successfully!")

# Use the FlightSimulator.mp4 video file from remote location
video_file_path = "../data/FlightSimulator.mp4"
print(f"📹 Using video file from URL: {video_file_path}")

with open(video_file_path, "rb") as f:
    video_data = f.read()

# Begin video analysis operation
print(f"🎬 Starting video analysis with analyzer '{analyzer_id}'...")
analysis_poller = await client.content_analyzers.begin_analyze_binary(
    analyzer_id=analyzer_id,
    input=video_data,
    content_type="application/octet-stream"
)

# Wait for analysis completion
print(f"⏳ Waiting for video analysis to complete...")
analysis_result = await analysis_poller.result()
print("result: ", json.dumps(analysis_result.as_dict(), indent=2))
print(f"✅ Video analysis completed successfully!")

# Extract operation ID for get_result_file
analysis_operation_id = extract_operation_id_from_poller(
    analysis_poller, PollerType.ANALYZE_CALL
)
print(f"📋 Extracted analysis operation ID: {analysis_operation_id}")

# Get the result to see what files are available
print(f"🔍 Getting analysis result to find available files...")
operation_status = await client.content_analyzers.get_result(
    operation_id=analysis_operation_id,
)

# The actual analysis result is in operation_status.result
operation_result: Any = operation_status.result
if operation_result is None:
    print("⚠️  No analysis result available")
else:
    print(f"✅ Analysis result contains {len(operation_result.contents)} contents")

### Retrieve and Save Key Frames and Face Thumbnails

In [None]:
# Initialize sets to store unique face IDs
face_ids = set()

# Look for keyframe times in the analysis result
keyframe_times_ms: list[int] = []
for content in operation_result.contents:
    if isinstance(content, AudioVisualContent):
        video_content: AudioVisualContent = content
        print(f"KeyFrameTimesMs: {video_content.key_frame_times_ms}")
        print(video_content)
        keyframe_times_ms.extend(video_content.key_frame_times_ms or [])
        print(f"📹 Found {len(keyframe_times_ms)} keyframes in video content")
        faces = content.get("faces", [])
        if isinstance(faces, list):
            for face in faces:
                face_id = face.get("faceId")
                if face_id:
                    face_ids.add(f"face.{face_id}")
        break
    else:
        print(f"Content is not an AudioVisualContent: {content}")

if not keyframe_times_ms:
    print("⚠️  No keyframe times found in the analysis result")
else:
    print(f"🖼️  Found {len(keyframe_times_ms)} keyframe times in milliseconds")

# Build keyframe filenames using the time values
keyframe_files = [f"keyFrame.{time_ms}" for time_ms in keyframe_times_ms]

# Download and save a few keyframe images as examples (first, middle, last)
if len(keyframe_files) >= 3:
    frames_to_download = {
        keyframe_files[0],
        keyframe_files[-1],
        keyframe_files[len(keyframe_files) // 2],
    }
else:
    frames_to_download = set(keyframe_files)

files_to_download = list(frames_to_download)
print(
    f"📥 Downloading {len(files_to_download)} keyframe images as examples: {files_to_download}"
)

for keyframe_id in files_to_download:
    print(f"📥 Getting result file: {keyframe_id}")

    # Get the result file (keyframe image)
    response: Any = await client.content_analyzers.get_result_file(
        operation_id=analysis_operation_id,
        path=keyframe_id,
    )

    # Handle the response which may be bytes or an async iterator of bytes
    if isinstance(response, (bytes, bytearray)):
        image_content = bytes(response)
    else:
        chunks: list[bytes] = []
        async for chunk in response:
            chunks.append(chunk)
        image_content = b"".join(chunks)

    print(
        f"✅ Retrieved image file for {keyframe_id} ({len(image_content)} bytes)"
    )

    # Save the image file
    saved_file_path = save_keyframe_image_to_file(
        image_content=image_content,
        keyframe_id=keyframe_id,
        test_name="content_analyzers_get_result_file",
        test_py_file_dir=os.getcwd(),
        identifier=analyzer_id,
    )
    print(f"💾 Keyframe image saved to: {saved_file_path}")

# Clean up the created analyzer (demo cleanup)
print(f"🗑️  Deleting analyzer '{analyzer_id}' (demo cleanup)...")
await client.content_analyzers.delete(analyzer_id=analyzer_id)
print(f"✅ Analyzer '{analyzer_id}' deleted successfully!")