# Lab 9: Content Understanding

**Azure AI Content Understanding** extracts structured data from multimodal content using prebuilt analyzers:

| Content Type | Analyzer | Output |
|-------------|----------|--------|
| Documents | `prebuilt-layout` | Text, tables, structure as markdown |
| Images | `prebuilt-imageSearch` | Descriptions, object detection |
| Video | `prebuilt-videoSearch` | Keyframes, transcripts, segments |

> ‚ö†Ô∏è Requires GPT-4.1, GPT-4.1-mini, and text-embedding-3-large deployed in the same resource.

## Step 1: Install Dependencies

In [None]:
!pip install requests azure-identity azure-ai-projects pandas ijson pypdf -q

## Step 2: Configuration

In [None]:
import subprocess, os, json, time, requests, base64, re
from pathlib import Path
from IPython.display import display, Markdown, HTML
from azure.identity import DefaultAzureCredential

RG = "content-understanding-lab-rg"
LOCATION = "swedencentral"
HUB_RG = "foundry-lz-parent"
CU_API_VERSION = "2025-11-01"

# Load .env files
for env_path in [Path("../.env"), Path(".env")]:
    if env_path.exists():
        for line in env_path.read_text().splitlines():
            if '=' in line and not line.startswith('#'):
                k, v = line.split('=', 1)
                os.environ[k.strip()] = v.strip()

APIM_URL = os.environ.get("APIM_URL", "")
APIM_KEY = os.environ.get("APIM_KEY", "")
APIM_NAME = re.match(r'https://([^.]+)\.', APIM_URL).group(1) if APIM_URL else ""
PRINCIPAL_ID = subprocess.run('az ad signed-in-user show --query id -o tsv', shell=True, capture_output=True, text=True).stdout.strip()

print(f"Resource Group: {RG} | Location: {LOCATION}")
print(f"Principal ID: {PRINCIPAL_ID[:20]}...")

## Step 3: Deploy Infrastructure (~5-7 min)
Deploys AI Services with CU capability + required model deployments.

In [None]:
!az group create -n "{RG}" -l "{LOCATION}" -o table

In [None]:
# Deploy spoke infrastructure with Content Understanding models
deploy_cmd = f'''az deployment group create -g "{RG}" --template-file spoke.bicep \
    -p deployerPrincipalId="{PRINCIPAL_ID}" \
    -p hubResourceGroup="{HUB_RG}" \
    -p apimName="{APIM_NAME}" \
    -p apimSubscriptionKey="{APIM_KEY}" \
    -o table'''

!{deploy_cmd}

In [None]:
result = subprocess.run(f'az deployment group show -g "{RG}" -n spoke --query properties.outputs -o json', shell=True, capture_output=True, text=True)
if result.returncode != 0: raise Exception(f"Deployment failed: {result.stderr}")

outputs = json.loads(result.stdout)
CU_ENDPOINT = outputs['contentUnderstandingEndpoint']['value']
GPT41_DEPLOYMENT = outputs['gpt41Deployment']['value']
GPT41_MINI_DEPLOYMENT = outputs['gpt41MiniDeployment']['value']
EMBEDDING_DEPLOYMENT = outputs['embeddingDeployment']['value']

print(f"‚úÖ CU Endpoint: {CU_ENDPOINT}")
print(f"   Models: {GPT41_DEPLOYMENT}, {GPT41_MINI_DEPLOYMENT}, {EMBEDDING_DEPLOYMENT}")

## Step 4: Configure CU Model Defaults

In [None]:
print("‚è≥ Waiting for RBAC propagation (30s)...")
time.sleep(30)

credential = DefaultAzureCredential()
get_cu_token = lambda: credential.get_token("https://cognitiveservices.azure.com/.default").token
print(f"‚úÖ Ready! Token: {get_cu_token()[:20]}...")

In [99]:
defaults = {"modelDeployments": {"gpt-4.1": GPT41_DEPLOYMENT, "gpt-4.1-mini": GPT41_MINI_DEPLOYMENT, "text-embedding-3-large": EMBEDDING_DEPLOYMENT}}
resp = requests.patch(f"{CU_ENDPOINT}/contentunderstanding/defaults?api-version={CU_API_VERSION}",
    headers={"Authorization": f"Bearer {get_cu_token()}", "Content-Type": "application/json"}, json=defaults)
print("‚úÖ Model defaults configured!" if resp.ok else f"‚ùå Failed: {resp.text}")

‚úÖ Model defaults configured!


---
# Part A: Document Analysis
Use `prebuilt-layout` to extract text/tables from PDFs.

In [100]:
class CUClient:
    """Simple Content Understanding client with AAD auth."""
    def __init__(self, endpoint, credential, api_version="2025-11-01"):
        self.endpoint, self.credential, self.api_version = endpoint.rstrip('/'), credential, api_version
    
    def _headers(self):
        return {"Authorization": f"Bearer {self.credential.get_token('https://cognitiveservices.azure.com/.default').token}", "Content-Type": "application/json"}
    
    def analyze(self, analyzer, inputs, poll_interval=2, max_wait=600):
        url = f"{self.endpoint}/contentunderstanding/analyzers/{analyzer}:analyze?api-version={self.api_version}"
        resp = requests.post(url, headers=self._headers(), json={"inputs": inputs})
        if not resp.ok: return {"error": resp.text}
        op_url = resp.headers.get('Operation-Location')
        start = time.time()
        while time.time() - start < max_wait:
            r = requests.get(op_url, headers=self._headers()).json()
            if r.get('status') == 'Succeeded': return r
            if r.get('status') in ['Failed', 'Cancelled']: return {"error": r}
            time.sleep(poll_interval)
        return {"error": "Timeout"}

cu = CUClient(CU_ENDPOINT, credential, CU_API_VERSION)
print("‚úÖ CU client ready")

‚úÖ CU client ready


## Step 5: Analyze a Sample Document

In [None]:
# Sample NASA technical document (publicly accessible)
SAMPLE_PDF_URL = "https://ntrs.nasa.gov/api/citations/19720018364/downloads/19720018364.pdf"
SAMPLE_TITLE = "Apollo 14 Mission Report"

print(f"üìÑ Sample Document: {SAMPLE_TITLE}")
print(f"   URL: {SAMPLE_PDF_URL[:60]}...")

In [102]:
# Analyze document with prebuilt-layout (extracts text, tables, structure)
print(f"üìÑ Analyzing: {SAMPLE_TITLE}...")
doc_result = cu.analyze("prebuilt-layout", [{"url": SAMPLE_PDF_URL}])

if 'error' in doc_result:
    print(f"‚ùå Error: {doc_result['error']}")
else:
    contents = doc_result.get('result', {}).get('contents', [])
    markdown = contents[0].get('markdown', '') if contents else ''
    print(f"‚úÖ Extracted {len(markdown):,} characters from {len(contents)} content block(s)")

üìÑ Analyzing: Apollo 14 Mission Report...
‚úÖ Extracted 132,629 characters from 1 content block(s)


In [103]:
# Display extracted content preview
if 'error' not in doc_result and markdown:
    display(Markdown(f"### üìù Extracted Content Preview\n\n{markdown[:2000]}" + ("\n\n*... (truncated)*" if len(markdown) > 2000 else "")))

### üìù Extracted Content Preview

2 (mix)

NASA CR-120916

(NASA-CR-120916) - DESIGN OF A TF34 TURBOFAN
MIXER FOR REDUCTION OF FLAP IMPINGEMENT
NOISE Final Report A. Chamay, et al
(General Electric Co.) 2 Feb. 1972 131 p

N72-26014

Unclas
CSCL 21E G3/02 32002


![NASA](figures/1.1)


DESIGN OF A TF34 TURBOFAN MIXER FOR
REDUCTION OF FLAP IMPINGEMENT NOISE

FINAL REPORT

by A. Chamay, D.P. Edkins, R.B. Mishler and W.S. Clapper
Reproduced by
NATIONAL TECHNICAL
INFORMATION SERVICE
U S Department of Commerce
Springfield VA 22151

GENERAL ELECTRIC COMPANY
AIRCRAFT ENGINE GROUP
LYNN, MASSACHUSETTS/CINCINNATI OHIO

Prepared for
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION
February 2, 1972

NASA Lewis Research Center
Cleveland, Ohio
N.E. Samanich
Project Manager

CONTRACT NAS 3-14338 Modification 2

RECEIVED
JUN 1972
GISA STI FACILITY
INZUT BRAMEN
67 8 9 101112 13 14 1J

/3/2

<!-- PageBreak -->

<!-- PageHeader: NASA CR-120916 -->


# FINAL REPORT

DESIGN OF A TF34 TURBOFAN MIXER FOR REDUCTION
OF FLAP IMPINGEMENT NOISE

by

A. Chamay, D. P. Edkins, R. B. Mishler and W. S. Clapper

General Electric Company
Aircraft Engine Group
Lynn, Massachusetts/Cincinnati, Ohio

prepared for
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION
February 2, 1972

CONTRACT NAS3-14338 Modification 2

NASA Lewis Research Center
Cleveland, Ohio
N. E. Samanich - Project Manager

<!-- PageBreak -->

<!-- PageHeader: PRECEDING PAGE BLANK NOT FILMED -->


## TABLE OF CONTENTS


<table>
<tr>
<th></th>
<th></th>
<th>Page</th>
</tr>
<tr>
<td>ABSTRACT</td>
<td></td>
<td>iv</td>
</tr>
<tr>
<td>SUMMARY</td>
<td></td>
<td></td>
</tr>
<tr>
<td>INTRODUCTION</td>
<td></td>
<td>5</td>
</tr>
<tr>
<td>TECHNICAL DISCUSSION</td>
<td></td>
<td>7</td>
</tr>
<tr>
<td>Program Objectives</td>
<td></td>
<td>7</td>
</tr>
<tr>
<td>Program Description and Schedule</td>
<td></td>
<td>7</td>
</tr>
<tr>
<td>Task I Study</td>
<td></td>
<td>7</td>
</tr>
<tr>
<td>Review of Design Data</td>
<td></td>
<td>11</td>
</tr>
<tr>
<td>Engine Cycle Data</td>
<td></td>
<td>16<

*... (truncated)*

---
# Part B: Video Analysis
Use `prebuilt-videoSearch` to extract keyframes, transcripts, and segments.

In [104]:
## Step 6: Analyze NASA Video
NASA_VIDEO_URL = "https://images-assets.nasa.gov/video/KSC-19640101-MH-NAS01-0001-The_Crawler_Transporter_The_Beginning_Historical_Footage-B_2309/KSC-19640101-MH-NAS01-0001-The_Crawler_Transporter_The_Beginning_Historical_Footage-B_2309~orig.mp4"

print("üé¨ Analyzing NASA Crawler-Transporter footage (may take a few minutes)...")
video_result = cu.analyze("prebuilt-videoSearch", [{"url": NASA_VIDEO_URL}], poll_interval=5, max_wait=600)

if 'error' in video_result:
    print(f"‚ùå Error: {video_result['error']}")
else:
    contents = video_result.get('result', {}).get('contents', [])
    keyframes = sum(len(c.get('KeyFrameTimesMs', [])) for c in contents)
    phrases = sum(len(c.get('transcriptPhrases', [])) for c in contents)
    print(f"‚úÖ Found {len(contents)} segments, {keyframes} keyframes, {phrases} transcript phrases")

üé¨ Analyzing NASA Crawler-Transporter footage (may take a few minutes)...
‚úÖ Found 4 segments, 626 keyframes, 1 transcript phrases


In [105]:
# Display video segment summaries
if 'error' not in video_result:
    for i, seg in enumerate(contents):
        start_s, end_s = seg.get('startTimeMs', 0)//1000, seg.get('endTimeMs', 0)//1000
        summary = seg.get('fields', {}).get('Summary', {}).get('valueString', 'N/A')
        print(f"\nüé¨ Segment {i+1} ({start_s}s - {end_s}s):")
        print(f"   {summary[:200]}{'...' if len(summary) > 200 else ''}")


üé¨ Segment 1 (0s - 26s):
   The video segment begins with a color test pattern screen labeled 'KSC-TV' and no visible action or people. This static test pattern continues for the first 26 seconds, indicating a broadcast or recor...

üé¨ Segment 2 (28s - 109s):
   The video shows detailed footage of a large tracked vehicle, likely a heavy construction or military machine, focusing on its massive tracks and mechanical components. The camera captures close-up vie...

üé¨ Segment 3 (111s - 330s):
   The video shifts to the interior control cabin of the tracked vehicle, showing the operator's console with various gauges, levers, and controls. The camera pans around the cabin, highlighting the cont...

üé¨ Segment 4 (330s - 627s):
   The video continues with detailed close-up shots of the tracked vehicle's large metal treads moving over gravel, showing the individual tread plates and their articulation. The camera captures various...


---
## Summary

**Content Understanding** extracts structured data from multimodal content:

| Analyzer | Input | Output |
|----------|-------|--------|
| `prebuilt-layout` | Documents (PDF, images) | Markdown text, tables, structure |
| `prebuilt-videoSearch` | Video files | Segments, keyframes, transcripts, summaries |
| `prebuilt-audioSearch` | Audio files | Transcription, speaker diarization |
| `prebuilt-imageSearch` | Images | Descriptions, object detection |

**Key concepts:**
- Requires GPT-4.1, GPT-4.1-mini, and text-embedding-3-large in the same resource
- Uses async polling pattern (submit ‚Üí poll Operation-Location ‚Üí get result)
- Supports both URL and base64-encoded content

In [106]:
# Cleanup (uncomment to delete resources)
# !az group delete -n "{RG}" --yes --no-wait