# Modality Fused Understanding
## Overview

This notebook implements the **third and final layer of live video understanding** by combining visual and audio insights using Amazon Bedrock with Anthropic Claude for comprehensive multi-modal content understanding.

The multi-modal fused understanding layer combines visual filmstrips and audio transcripts to detect topic boundaries, identify chapters, and create comprehensive content understanding. This fusion creates a complete picture of what's happening in your live broadcasts by analyzing both what's seen and what's heard together.

**This module leverages the components developed in the Visual Understanding and Audio Understanding modules for continuous processing of the live stream**, integrating their capabilities into a unified multi-modal analysis pipeline.

## Architecture

![Multi-modal Fused Understanding Architecture](images/acr_visual-audio-understanding.png)

The architecture shows how visual filmstrips and audio transcripts are synchronized and analyzed together using Amazon Bedrock with Anthropic Claude to create comprehensive multi-modal understanding, which is then pushed to AgentCore Memory for downstream agentic AI applications.


## Key Technologies Used

- **Amazon Bedrock with Anthropic Claude Sonnet 4**: Multi-modal fusion and comprehensive content understanding
- **Amazon Transcribe**: Real-time audio transcription with precise timestamps
- **OpenCV**: Video processing and filmstrip generation
- **FFmpeg**: Multi-stream video and audio ingestion
- **Prompt Caching**: Cost optimization for repeated analysis and providing better performance
- **Rolling Context Window**: Intelligent context management that maintains only recent chapters (n-1) to optimize latency and cost while preserving accuracy

**üìù System Prompt**: The multi-modal analysis is powered by a comprehensive system prompt that guides Claude's understanding of video content. You can review the complete prompt at: [`prompts/video_analysis_system_prompt.txt`](prompts/video_analysis_system_prompt.txt)

## Cost & Latency Optimization Techniques

This notebook implements two powerful optimization techniques that dramatically reduce costs and improve performance:

### 1. Prompt Caching - Dual Cache Breakpoint Strategy

**Prompt Caching** implements a **dual cache breakpoint strategy** that partitions content into two distinct memory regions:

- **Static Memory (First Breakpoint)**: Stores system prompts and instructions that remain constant across all API calls. This content is cached once and reused indefinitely, eliminating redundant token costs.

- **Growing Memory (Second Breakpoint)**: Contains the incremental conversation between user and agent, including  transcripts and analysis results. The cache breakpoint moves forward with each new interaction, allowing the growing context to be cached while only paying for the new incremental content.

This dual breakpoint approach enables up to 60% cost reduction on cached content while maintaining full conversational context. The static memory never changes, and the growing memory expands incrementally, with each new chunk only paying for the delta rather than reprocessing the entire history.

### 2. Smart Context Windowing

**Smart Context Windowing** employs chapter-based pruning that maintains recent context (n-1 chapters) while preventing context overflow. This novel approach to memory management enables unlimited video length processing with bounded memory, ensuring quality is maintained while optimizing both cost and latency.

## Real-time Integration with AgentCore Memory

As the notebook processes live video and creates comprehensive multi-modal understanding, the fused insights are automatically pushed to **Amazon Bedrock AgentCore Memory** in real-time. This enables downstream agentic AI applications that can query and use for intelligent decision-making, creating business value from the three-layer understanding you've built.

**Let's see how this works in practice!** 

## 1. Import Required Libraries

**Load required modules** for multi-modal fusion processing, real-time streaming, and AI analysis.

In [None]:
# Import required libraries
import asyncio
import subprocess
import time
import os
import queue
import threading
import json
import cv2
import numpy as np
import boto3
import base64
from datetime import datetime
from pathlib import Path
from PIL import Image
from io import BytesIO

# Audio processing imports
from amazon_transcribe.client import TranscribeStreamingClient
from amazon_transcribe.handlers import TranscriptResultStreamHandler
from amazon_transcribe.model import TranscriptEvent

print("‚úÖ Libraries imported successfully!")

## 2. Configuration Setup

**Load shared configuration** from prerequisites notebook and set up processing parameters.

### Configuration Components

- **Model Settings** - Claude model ID and AWS region
- **Memory Integration** - AgentCore Memory IDs for real-time storage
- **Output Organization** - structured folders for recordings and analysis
- **Processing Parameters** - chunk duration, FPS, and streaming ports
- **Optimization Settings** - prompt caching and context windowing

This ensures all fusion components work together seamlessly.

In [None]:
# Load shared configuration from prerequisites notebook
%store -r AUDIOVISUAL_MODEL_ID
%store -r AWS_REGION

# Use defaults if not set
if 'AUDIOVISUAL_MODEL_ID' not in globals():
    AUDIOVISUAL_MODEL_ID = "global.anthropic.claude-sonnet-4-20250514-v1:0"
    print("‚ö†Ô∏è  Using default AUDIOVISUAL_MODEL_ID (run prerequisites notebook to configure)")

if 'AWS_REGION' not in globals():
    AWS_REGION = 'us-east-1'
    print("‚ö†Ô∏è  Using default AWS_REGION (run prerequisites notebook to configure)")
else:
    print(f"‚úÖ Loaded Audiovisual Model ID: {AUDIOVISUAL_MODEL_ID}")
    print(f"‚úÖ Loaded AWS Region: {AWS_REGION}")

# Configuration
SOURCE_VIDEO = "Netflix_Open_Content_Meridian.mp4"
OUTPUT_DIR = "output"
CHUNK_DURATION = 20  # seconds
SOURCE_FPS = 30

# UDP Ports for three streams
UDP_PORT_RECORDING = "1234"    # Stream 1: Recording
UDP_PORT_PROCESSING = "1235"   # Stream 2: Video processing
UDP_PORT_TRANSCRIPTION = "1236" # Stream 3: Transcription

TRANSCRIBE_LANGUAGE_CODE = 'en-US'
TRANSCRIBE_SAMPLE_RATE = 16000

# Global buffers
SENTENCE_JSON_BUFFER = []
CHUNK_ANALYSIS_RESULTS = {}
FUSION_RESULTS = []

# Directory cleanup will be handled by cleanup utilities in next cell
# Create output directories structure
output_subdirs = {
    "chunks": f"{OUTPUT_DIR}/chunks",
    "filmstrips": f"{OUTPUT_DIR}/filmstrips", 
    "transcripts": f"{OUTPUT_DIR}/transcripts",
    "analysis": f"{OUTPUT_DIR}/analysis",
    "recording": f"{OUTPUT_DIR}/recording",
    "clips": f"{OUTPUT_DIR}/clips"
}

print("üöÄ Configuration complete!")

## 3. Import Components

**Load processing components** for multi-modal fusion pipeline.

**Shared Components** (reusable across modules):
- **RecordingManager**: Handles continuous video recording to MXF format
- **TranscriptionProcessor**: Manages real-time audio transcription via Amazon Transcribe
- **TranscriptionHandler**: Processes transcription results and detects sentence boundaries
- **ComponentMonitor**: Provides organized logging and activity tracking
- **FilmstripProcessor**: Creates enhanced filmstrip grids with shot detection

**Module-Specific Components** (Modality Fusion):
- **ChunkProcessor**: Creates 20-second video chunks and triggers filmstrip creation
- **FusionAnalyzer**: Performs multi-modal analysis using Amazon Bedrock with Anthropic Claude
- **StreamMonitor**: Monitors component activity and detects stream end

**AgentCore Memory**: Enables real-time knowledge storage for downstream agentic AI

These components work together to create comprehensive multi-modal understanding.

In [None]:
# Add project root to Python path to enable imports
import sys
import boto3
from pathlib import Path

# Get project root (parent of current directory)
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))
    print(f"‚úÖ Added project root to Python path: {project_root}")
else:
    print(f"‚úÖ Project root already in Python path: {project_root}")

# Import shared components (reusable across modules)
from src.shared import (
    RecordingManager,
    TranscriptionProcessor,
    TranscriptionHandler,
    ComponentMonitor,
    log_component,
    set_debug_logging,
    show_component_table,
    FilmstripProcessor,
    create_fusion_detector
)

# Import module-specific components (Modality Fusion)
from components import (
    ChunkProcessor,
    ChunkMonitor,
    FusionAnalyzer,
    StreamMonitor,
    CleanupUtils,
    cleanup_directory,
    cleanup_ffmpeg_processes,
    cleanup_all,
    ProcessingUtils,
    start_fusion_processing
)

# Initialize component monitor
component_monitor = ComponentMonitor()

memory_client = boto3.client('bedrock-agentcore')

# Load memory configuration from prerequisites notebook
%store -r video_analysis_mem_id
%store -r video_analysis_session_id
%store -r transcript_mem_id
%store -r trans_session_id
%store -r actor_id

# Clean up directories using utility class
cleanup_directory(OUTPUT_DIR, output_subdirs)

# Apply the setting
set_debug_logging("DISABLED")

print("\n‚úÖ Fusion components and Component Activity Monitor initialized!")
print("üìä Monitor will display component activities in organized table format")

## 4. Understanding Prompt Caching: Payload Comparison & Token Savings

**üéØ Purpose**: Understand how dual cache breakpoints optimize costs through payload structure and token savings.

**Key Concept**: Dual Cache Breakpoint Strategy
- **Breakpoint #1**: System prompt (static, always cached after first call)
- **Breakpoint #2**: Growing conversation (incremental caching)

### üì§ Payload Structure Comparison

**Color Legend:**
- üü° **Token counts** - Yellow highlighting
- üü¢ **CACHED content** - Green highlighting  
- üî¥ **BREAKPOINT markers** - Red highlighting
- üîµ **NEW BREAKPOINT** - Blue highlighting

<div style='display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 20px; margin: 20px 0;'>

<div>
<h4 style='color: #ff9800; margin-bottom: 10px;'>üî• Call #1 (Cold Start)</h4>
<div style='border: 2px solid #ff9800; border-radius: 8px; padding: 15px;'>

```json
{
  "system": [{
    "type": "text",
    "text": "You are an expert in video analysis...", // üü° ~1,200 tokens
    "cache_control": { "type": "ephemeral" } // üî¥ ‚ö° BREAKPOINT #1
  }],
  "messages": [{
    "role": "user",
    "content": [{
      "type": "text",
      "text": "Chunk 0 (0s-20s) Transcript information with precise timestamp...", // üü° ~1,600 tokens
      "cache_control": { "type": "ephemeral" } // üî¥ ‚ö° BREAKPOINT #2
    }, {
      "type": "image",
      "source": {"data": "base64..."} // üü° ~1,000 tokens
    }]
  }]
}
```
</div>
</div>

<div>
<h4 style='color: #2196f3; margin-bottom: 10px;'>‚ö° Call #2 (Payload)</h4>
<div style='border: 2px solid #2196f3; border-radius: 8px; padding: 15px;'>

```json
{
  "system": [{
    "type": "text",
    "text": "You are an expert in video analysis..." // üü° ~1,200 tokens üü¢ CACHED ‚ö°
  }],
  "messages": [
    // Previous conversation üü¢ CACHED ‚ö°
    {"role": "user", "content": "Chunk 0 (0s-20s) Transcript....."}, // üü° ~1,600 tokens üü¢ CACHED ‚ö°
    {"role": "assistant", "content": "Fused understanding response from model..."}, //  üü° 200 tokens üü¢ CACHED ‚ö°

    {
      "role": "user",
      "content": [{
        "type": "text",
        "text": "Chunk 1 (20s-40s) Transcript information with precise timestamp...",
        "cache_control": { "type": "ephemeral" } // üîµ ‚ö° NEW BREAKPOINT [ Moved to Call 2 User Input ]
      }, {
        "type": "image",
        "source": {"data": "base64..."} // üü° ~1,000 tokens
      }]
    }
  ]
}
```
</div>
</div>

<div>
<h4 style='color: #4caf50; margin-bottom: 10px;'>üöÄ Call #3 (Payload)</h4>
<div style='border: 2px solid #4caf50; border-radius: 8px; padding: 15px;'>

```json
{
  "system": [{
    "type": "text",
    "text": "You are an expert in video analysis..."   // üü¢ CACHED ‚ö°
  }],
  "messages": [
    // Extended history üü¢ ALL CACHED ‚ö°
    {"role": "user", "content": "Chunk 0..."}, // üü° ~1,200 tokens üü¢ CACHED ‚ö°
    {"role": "assistant", "content": "..."}, // üü° ~1,600 tokens üü¢ CACHED ‚ö°
    {"role": "user", "content": "Chunk 1..."}, // üü° ~200 tokens üü¢ CACHED ‚ö°
    {"role": "assistant", "content": "..."}, // üü° ~1,600 tokens üü¢ CACHED ‚ö°
    {
      "role": "user",
      "content": [{
        "type": "text",
        "text": "Chunk 2 (40s-60s)...",
        "cache_control": { "type": "ephemeral" } // üîµ ‚ö° NEW BREAKPOINT [ Moved to Call 3 User Input and so on.. ]
      }, {
        "type": "image",
        "source": {"data": "base64..."} // üü° ~1,000 tokens
      }]
    }
  ]
}
```
</div>
</div>

</div>

### üîç Token Breakdown by Call with Caching

<div style='display: flex; gap: 20px; margin: 20px 0; overflow-x: auto;'>

<div style='background-color: #fff3e0; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0;'>
<h4 style='color: #ff9800; margin-top: 0;'>üî• Call #1 Tokens</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>System Prompt:</strong> ~1,200 tokens</li>
<li><strong>Chunk 0 Text:</strong> ~1,600 tokens</li>
<li><strong>Image:</strong> ~1,000 tokens</li>
<li><strong>Total Input:</strong> 3,800 tokens</li>
<li><strong>Cache Write:</strong> 2,800 tokens</li>
<li><strong>Cache Read:</strong> 0 tokens</li>
<li><strong>Hit Ratio:</strong> 0%</li>
<li><strong>Token Savings:</strong> 0%</li>
<li><strong>Duration:</strong> 3.2s</li>
</ul>
</div>

<div style='background-color: #e3f2fd; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0;'>
<h4 style='color: #2196f3; margin-top: 0;'>‚ö° Call #2 Tokens</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>System Prompt:</strong> CACHED ‚ö°</li>
<li><strong>Old Conversations:</strong> CACHED ‚ö°</li>
<li><strong>Chunk 1 Text:</strong> ~1,600 tokens</li>
<li><strong>Image:</strong> ~1,000 tokens</li>
<li><strong>Total Input:</strong> 6,800 tokens</li>
<li><strong>Cache Write:</strong> 2,600 tokens</li>
<li><strong>Cache Read:</strong> 4,200 tokens</li>
<li><strong>Hit Ratio:</strong> 62%</li>
<li><strong>Token Savings:</strong> 56%</li>
<li><strong>Duration:</strong> 2.8s</li>
</ul>
</div>

<div style='background-color: #e8f5e9; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0;'>
<h4 style='color: #4caf50; margin-top: 0;'>üöÄ Call #3 Tokens</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>System Prompt:</strong> CACHED ‚ö°</li>
<li><strong>Old Coversations:</strong> CACHED ‚ö°</li>
<li><strong>Chunk 2 Text:</strong> ~1,600 tokens</li>
<li><strong>Image:</strong> ~1,000 tokens</li>
<li><strong>Total Input:</strong> 9,400 tokens</li>
<li><strong>Cache Write:</strong> 2,600 tokens</li>
<li><strong>Cache Read:</strong> 6,800 tokens</li>
<li><strong>Hit Ratio:</strong> 72%</li>
<li><strong>Token Savings:</strong> 65%</li>
<li><strong>Duration:</strong> 2.5s</li>
</ul>
</div>

</div>

### üìä Token Breakdown by Call without Caching

<div style='display: flex; gap: 20px; margin: 20px 0; overflow-x: auto;'>

<div style='background-color: #ffebee; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #f44336;'>
<h4 style='color: #d32f2f; margin-top: 0;'>üî• Call #1 (No Cache)</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>System Prompt:</strong> ~1,200 tokens</li>
<li><strong>Chunk 0 Text:</strong> ~1,600 tokens</li>
<li><strong>Image:</strong> ~1,000 tokens</li>
<li><strong>Total Input:</strong> 3,800 tokens</li>
<li><strong>Cache Write:</strong> 0 tokens</li>
<li><strong>Cache Read:</strong> 0 tokens</li>
<li><strong>Hit Ratio:</strong> 0%</li>
<li><strong>Token Savings:</strong> 0%</li>
<li><strong>Duration:</strong> 3.2s</li>
</ul>
</div>

<div style='background-color: #ffebee; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #f44336;'>
<h4 style='color: #d32f2f; margin-top: 0;'>üî• Call #2 (No Cache)</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>System Prompt:</strong> ~1,200 tokens</li>
<li><strong>Previous Conv:</strong> ~1,800 tokens</li>
<li><strong>Chunk 1 Text:</strong> ~1,600 tokens</li>
<li><strong>Image:</strong> ~1,000 tokens</li>
<li><strong>Total Input:</strong> 5,600 tokens</li>
<li><strong>Cache Write:</strong> 0 tokens</li>
<li><strong>Cache Read:</strong> 0 tokens</li>
<li><strong>Hit Ratio:</strong> 0%</li>
<li><strong>Token Savings:</strong> 0%</li>
<li><strong>Duration:</strong> 4.1s</li>
</ul>
</div>

<div style='background-color: #ffebee; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #f44336;'>
<h4 style='color: #d32f2f; margin-top: 0;'>üî• Call #3 (No Cache)</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>System Prompt:</strong> ~1,200 tokens</li>
<li><strong>Full History:</strong> ~3,400 tokens</li>
<li><strong>Chunk 2 Text:</strong> ~1,600 tokens</li>
<li><strong>Image:</strong> ~1,000 tokens</li>
<li><strong>Total Input:</strong> 7,200 tokens</li>
<li><strong>Cache Write:</strong> 0 tokens</li>
<li><strong>Cache Read:</strong> 0 tokens</li>
<li><strong>Hit Ratio:</strong> 0%</li>
<li><strong>Token Savings:</strong> 0%</li>
<li><strong>Duration:</strong> 4.8s</li>
</ul>
</div>

</div>

### üí° Caching vs No Caching Comparison

| Metric | Call #1 | Call #2 | Call #3 | Total |
|--------|---------|---------|---------|-------|
| **With Caching (Input)** | 3,800 tokens | 2,600 tokens | 2,600 tokens | **9,000 tokens** |
| **Without Caching (Input)** | 3,800 tokens | 5,600 tokens | 7,200 tokens | **16,600 tokens** |
| **Cache Efficiency** | Same | 54% fewer tokens | 64% fewer tokens | **46% total savings** |
| **Duration Savings** | Same | 32% faster | 48% faster | **Average 27% faster** |

**Key Insights:**
- üéØ **Caching dramatically reduces input tokens** (2,600 vs 5,600 in Call #2)
- üìà **Benefits compound with each call** (54% ‚Üí 64% token reduction)
- ‚ö° **Significant latency improvements** (up to 48% faster)
- üí∞ **46% total token savings** across all calls with caching


## 5. Smart Context Windowing: Chapter Management & Token Optimization

**üéØ Purpose**: Optimize token usage by intelligently managing finalized chapters in the context window while maintaining analysis accuracy.

**Key Concept**: Chapter-Based Rolling Window Strategy
- **Continuous Processing**: Each 20-second chunk is analyzed and mapped to chapters
- **Chapter Finalization**: When chapters are completed, windowing is triggered
- **Intelligent Retention**: Keep only configured number of finalized chapters (n=1 recommended)
- **Context Cleanup**: Remove old chapters to prevent context overflow

### üìä Chapter Context Comparison

**Processing Flow with Chunk-Chapter Mapping:**
```
Chunk 0 ‚Üí Chapter 1 (incomplete) [Chunks: 0]
Chunk 1 ‚Üí Chapter 1 (finalized) [Chunks: 0,1] + Chapter 2 (incomplete) [Chunks: 1]
Chunk 2 ‚Üí Chapter 2 (finalized) [Chunks: 1,2] + Chapter 3 (incomplete) [Chunks: 2]
Chunk 3 ‚Üí Chapter 3 (finalized) [Chunks: 2,3] + Chapter 4 (incomplete) [Chunks: 3]

Windowing Logic:
- When Chapter 1 is either being generated or finalized: No windowing needed as there are no prior chapters available.

- When Chapter 2 is being generated: Keep Chapter 1 (last finalized) chunks + Chapter 2 (current) chunks
- When Chapter 2 is finalized:
    - Find all the non-overlapping chunks in Chapter 1: Chunk 0 (not in Chapter 2)
    - Delete Chunk 0 messages from context

- When Chapter 3 is being generated: Keep Chapter 2 (last finalized) chunks + Chapter 3 (current) chunks
- When Chapter 3 is finalized:
    - Find all the non-overlapping chunks in Chapter 2: Chunk 1 (not in Chapter 2,3)
    - Delete Chunk 1 messages from context
```

### üîç Chapter Context with Rolling (keep_n_chapters = 1)

<div style='display: flex; gap: 20px; margin: 20px 0; overflow-x: auto;'>

<div style='background-color: #e8f5e9; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #4caf50;'>
<h4 style='color: #2e7d32; margin-top: 0;'>üìù After Chunk 0</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>Chapter 1:</strong> Incomplete [Chunk 0]</li>
<li><strong>Chapters in Context:</strong> 1</li>
<li><strong>Context Messages:</strong> 2</li>
<li><strong>Context Tokens:</strong> ~1,200</li>
<li><strong>Windowing Action:</strong> None (no finalized chapters)</li>
<li><strong>Memory Status:</strong> Growing</li>
</ul>
</div>

<div style='background-color: #e8f5e9; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #4caf50;'>
<h4 style='color: #2e7d32; margin-top: 0;'>üìù After Chunk 1</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>Chapter 1:</strong> ‚úÖ Finalized [Chunks 0,1]</li>
<li><strong>Chapter 2:</strong> Incomplete [Chunk 1]</li>
<li><strong>Chapters in Context:</strong> 2</li>
<li><strong>Context Messages:</strong> 4</li>
<li><strong>Context Tokens:</strong> ~3,800</li>
<li><strong>Windowing Action:</strong> Keep Chapter 1 (within limit)</li>
<li><strong>Memory Status:</strong> Controlled</li>
</ul>
</div>

<div style='background-color: #e8f5e9; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #4caf50;'>
<h4 style='color: #2e7d32; margin-top: 0;'>üìù After Chunk 2</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>Chapter 1:</strong> üóëÔ∏è Remove non-overlapping Chunk 0</li>
<li><strong>Chapter 2:</strong> ‚úÖ Finalized [Chunks 1,2] (kept)</li>
<li><strong>Chapter 3:</strong> Incomplete [Chunk 2]</li>
<li><strong>Chapters in Context:</strong> 2</li>
<li><strong>Context Messages:</strong> 4 ‚ö° Cleanup</li>
<li><strong>Context Tokens:</strong> ~3,200</li>
<li><strong>Windowing Action:</strong> Deleted Chunk 0 (non-overlapping)</li>
<li><strong>Memory Status:</strong> Bounded</li>
</ul>
</div>

</div>

### üìà Chapter Context without Rolling (keep_n_chapters = None)

<div style='display: flex; gap: 20px; margin: 20px 0; overflow-x: auto;'>

<div style='background-color: #ffebee; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #f44336;'>
<h4 style='color: #d32f2f; margin-top: 0;'>üìù After Chunk 0</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>Chapters in Context:</strong> 1</li>
<li><strong>Chapter 1:</strong> Incomplete (Chunk 0)</li>
<li><strong>Context Messages:</strong> 2</li>
<li><strong>Context Tokens:</strong> ~1,200</li>
<li><strong>Windowing Action:</strong> None</li>
<li><strong>Memory Status:</strong> Growing</li>
</ul>
</div>

<div style='background-color: #ffebee; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #f44336;'>
<h4 style='color: #d32f2f; margin-top: 0;'>üìù After Chunk 1</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>Chapters in Context:</strong> 2</li>
<li><strong>Chapter 1:</strong> ‚úÖ Finalized (kept)</li>
<li><strong>Chapter 2:</strong> Incomplete (Chunk 1)</li>
<li><strong>Context Messages:</strong> 4</li>
<li><strong>Context Tokens:</strong> ~3,800</li>
<li><strong>Windowing Action:</strong> None (keep all)</li>
<li><strong>Memory Status:</strong> Growing</li>
</ul>
</div>

<div style='background-color: #ffebee; border-radius: 8px; padding: 15px; min-width: 300px; flex-shrink: 0; border: 2px solid #f44336;'>
<h4 style='color: #d32f2f; margin-top: 0;'>üìù After Chunk 2</h4>
<ul style='margin: 0; padding-left: 20px;'>
<li><strong>Chapters in Context:</strong> 3</li>
<li><strong>Chapter 1:</strong> ‚úÖ Finalized (kept)</li>
<li><strong>Chapter 2:</strong> ‚úÖ Finalized (kept)</li>
<li><strong>Chapter 3:</strong> Incomplete (Chunk 2)</li>
<li><strong>Context Messages:</strong> 6</li>
<li><strong>Context Tokens:</strong> ~6,400</li>
<li><strong>Windowing Action:</strong> None (unbounded growth)</li>
<li><strong>Memory Status:</strong> Unbounded</li>
</ul>
</div>

</div>

### üí° Rolling vs No Rolling Comparison

| Metric | After Chunk 0 | After Chunk 1 | After Chunk 2 | Total Growth |
|--------|---------------|---------------|---------------|---------------|
| **With Rolling (Messages)** | 2 | 4 | 4 ‚ö° | **Bounded (4 max)** |
| **Without Rolling (Messages)** | 2 | 4 | 6 | **Unbounded (grows linearly)** |
| **With Rolling (Tokens)** | 1,200 | 3,800 | 3,200 ‚ö° | **28% token reduction** |
| **Without Rolling (Tokens)** | 1,200 | 3,800 | 6,400 | **Unlimited growth** |
| **Memory Efficiency** | Same | Same | 50% fewer tokens | **Scalable to unlimited length** |


## 6. Initialize Processing Components

**Set up the core processors** for multi-modal fusion pipeline.

### Processing Components

- **RecordingManager** - captures video stream to MXF format for archival
- **ChunkProcessor** - creates video chunks and filmstrips with shot detection
- **TranscriptionProcessor** - real-time audio transcription with memory integration
- **FusionAnalyzer** - multi-modal analysis using Claude with memory storage

Each processor is optimized for real-time processing and cost efficiency.

In [None]:
# Initialize all processors
recording_manager = RecordingManager(UDP_PORT_RECORDING, OUTPUT_DIR)
transcription_processor = TranscriptionProcessor(UDP_PORT_TRANSCRIPTION, AWS_REGION, SENTENCE_JSON_BUFFER, memory_client=memory_client, memory_id=transcript_mem_id, actor_id=actor_id, session_id=trans_session_id, output_dir=OUTPUT_DIR)
fusion_analyzer = FusionAnalyzer(AWS_REGION, SENTENCE_JSON_BUFFER, CHUNK_ANALYSIS_RESULTS, OUTPUT_DIR, memory_client=memory_client, memory_id=video_analysis_mem_id, actor_id=actor_id, session_id=video_analysis_session_id)

# Initialize chunk processor (FFmpeg only - no monitoring)
chunk_processor = ChunkProcessor(UDP_PORT_PROCESSING, OUTPUT_DIR, CHUNK_DURATION)

# Initialize chunk monitor (separate process for monitoring and processing chunks)
chunk_monitor = ChunkMonitor(
    output_dir=OUTPUT_DIR,
    chunk_duration=CHUNK_DURATION,
    fusion_analyzer=fusion_analyzer,
    check_interval=0.5  # Check for new chunks every 0.5 seconds
)

# Initialize chapters table
fusion_analyzer.initialize_display()

print("‚úÖ All processors initialized with fusion integration!")

## 7. Start Multi-modal Fusion Processing

**Execute the complete multi-modal fusion pipeline** with real-time processing and analysis.

**What Happens:**
1. **Component Monitor**: Displays organized activity log for all processors
2. **Parallel Processing**: Starts recording, chunk processing, and transcription simultaneously
3. **Multi-modal Fusion**: Analyzes visual filmstrips + audio transcripts every 20 seconds
4. **Chapter Detection**: Identifies topic boundaries and creates chapters
5. **Real-time Display**: Updates chapter table every 3 seconds
6. **AgentCore Memory**: Pushes fused understanding to memory in real-time
7. **Cost Optimization**: Uses prompt caching to reduce token costs

**üìù System Prompt**: The multi-modal analysis is powered by a comprehensive system prompt that guides Claude's understanding of video content. You can review the complete prompt at: [`prompts/video_analysis_system_prompt.txt`](prompts/video_analysis_system_prompt.txt)

### Stream Source Video With FFmpeg

In [None]:
import subprocess
import threading
import os
import time

# Change to sample_videos directory
video_dir = "../sample_videos"
video_file = "Netflix_Open_Content_Meridian.mp4"
video_path = os.path.join(video_dir, video_file)

if os.path.exists(video_path):
    print(f"üé¨ Starting FFmpeg stream from: {video_path}")
    print("‚è≥ Waiting 5 seconds before starting stream...")
    time.sleep(5)
    
    # FFmpeg command for multi-stream output
    ffmpeg_cmd = [
        "ffmpeg", "-re", "-i", video_path,
        "-c:v", "copy", "-c:a", "copy", "-f", "tee", 
        "-map", "0:v", "-map", "0:a",
        "[f=mpegts]udp://127.0.0.1:1234|[f=mpegts]udp://127.0.0.1:1235|[f=mpegts]udp://127.0.0.1:1236"
    ]
    
    def run_ffmpeg():
        try:
            # Redirect stdout and stderr to DEVNULL to suppress FFmpeg logs
            subprocess.run(
                ffmpeg_cmd, 
                check=True,
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL
            )
        except subprocess.CalledProcessError as e:
            print(f"‚ùå FFmpeg error: {e}")
        except KeyboardInterrupt:
            print("üõë FFmpeg stopped by user")
    
    # Start FFmpeg in background thread
    ffmpeg_thread = threading.Thread(target=run_ffmpeg, daemon=True)
    ffmpeg_thread.start()
    print("‚úÖ FFmpeg started in background thread")
    print("üì∫ Stream should now be available on UDP ports 1234, 1235, 1236")
else:
    print(f"‚ùå Video file not found: {video_path}")
    print("Please ensure the sample video is in the correct location")

<div style='background-color: #ffebee; border: 3px solid #c62828; border-radius: 5px; padding: 15px; margin: 10px 0;'>
<h3 style='color: #c62828; margin-top: 0;'>‚ö†Ô∏è IMPORTANT: Before executing the next cell...</h3>
<p style='margin: 10px 0;'><strong>Duration:</strong> In the below cell, the processing duration is set to <b>5 minutes</b> by default through the <code>duration_minutes</code> parameter.</p>
<p style='margin: 10px 0; color: #2e7d32; font-weight: bold;'>üì∫ The source video will be processed in real-time and you will see chapters and topics appearing within the chapters table as the live content is ingested and analyzed.</p>
<p style='margin: 10px 0; color: #f57c00; font-weight: bold;'>‚è≥ Note: The chapters table will continuously refresh during processing. Please wait for the processing to fully complete and allow some time for the table to settle down before interacting with it to analyze the final output.</p>
</div>

In [None]:
# Set up chapter table display
from IPython.display import display, HTML, clear_output
from ipywidgets import Output

# Create separate output widgets for logs and chapter table
chapter_table_output = Output()
logs_output = Output()

# Display chapter table widget first (refreshable)
display(chapter_table_output)

# Display logs widget second (persistent)
display(logs_output)

# Redirect component monitor logs to the logs widget
with logs_output:
    component_monitor.show_table()

def refresh_chapter_table():
    """Refresh only the chapter table, not the logs"""
    try:
        with chapter_table_output:
            clear_output(wait=True)
            html = fusion_analyzer.get_chapter_table_html()
            display(HTML(html))
    except Exception as e:
        pass  # Silently handle any display errors

# Execute the fusion processing using utility function
await start_fusion_processing(
    duration_minutes=5,
    recording_manager=recording_manager,
    chunk_processor=chunk_processor,
    chunk_monitor=chunk_monitor,
    fusion_analyzer=fusion_analyzer,
    transcription_processor=transcription_processor,
    stream_monitor_class=StreamMonitor,
    log_component=log_component,
    refresh_chapter_table=refresh_chapter_table
)

## 8. Cleanup - Stop FFmpeg Processes

**Clean up FFmpeg processes** to prevent conflicts with other modules and free system resources.

<div style='background-color: #fff3e0; border: 2px solid #f57c00; border-radius: 5px; padding: 15px; margin: 10px 0;'>
<h3 style='color: #f57c00; margin-top: 0;'>üßπ Cleanup Recommendation</h3>
<p style='margin: 10px 0;'><strong>Run this cell after processing is complete to ensure clean shutdown.</strong></p>
<p style='margin: 10px 0;'>This will:</p>
<ul style='margin: 5px 0;'>
<li>Kill any running FFmpeg processes</li>
<li>Free up UDP ports (1234, 1235, 1236)</li>
<li>Prevent conflicts with other notebook modules</li>
</ul>
</div>

In [None]:
# Set to False to skip this cleanup step
SKIP_CLEANUP = False

# Use cleanup utility function
cleanup_ffmpeg_processes(skip_cleanup=SKIP_CLEANUP)