# Dynamic Content Understanding: Advanced Video Analysis with Amazon Bedrock

Welcome to the video analysis module of our multimodal data processing journey with Amazon Bedrock Data Automation (BDA). In previous modules, we explored foundation building with document analysis and seeing beyond text with image analysis, and unlocking the voice of information with audio analysis. Now, we're diving into perhaps the most complex and information-rich modality: video.

## Why Video Analysis Matters

Video represents the most information-dense form of content, combining visual elements, motion, audio, and text into a temporal flow. This richness makes video exceptionally valuable but also challenging to process with traditional methods.

Consider that:
- A single minute of video contains approximately 1,800 frames (at 30 fps)
- The average enterprise has thousands of hours of video content that remains largely unsearchable
- Manual video analysis costs $15-25 per minute of processed content
- Only 1-2% of video content is typically leveraged in business intelligence systems

Amazon Bedrock Data Automation transforms this landscape by enabling us to automatically:
- Detect and analyze distinct scenes and shots
- Generate comprehensive video summaries and chapter breakdowns
- Extract text visible within the video frames
- Identify logos and brands
- Apply content moderation across visual and audio components
- Classify content using standardized IAB categories

This capability fundamentally changes how we interact with video content, unlocking insights that were previously trapped in the visual medium.

## Setting Up Our Environment

Let's begin by installing required libraries and importing dependencies. We'll be using our enhanced utility functions that incorporate the "Dynamic Content Understanding" theme.

In [None]:
# Install required packages
%pip install "boto3>=1.37.4" "matplotlib" "moviepy" --upgrade -qq

# Import necessary libraries
import boto3
import json
import uuid
import time
import os
import matplotlib.pyplot as plt
from datetime import datetime
from IPython.display import Video, clear_output, HTML, display, Markdown
import warnings
warnings.filterwarnings('ignore')

# Import our video utilities from the consolidated utils module
from utils.utils import BDAVideoUtils, show_business_context, ensure_bda_results_dir

# Initialize our utility class
bda_utils = BDAVideoUtils()
print(f"Setup complete. BDA utilities initialized for region: {bda_utils.current_region}")
print(f"Using S3 bucket: {bda_utils.default_bucket}")

# Display business context for video analysis
show_business_context("video_complete")

## 1. Prepare Sample Video

First, we'll download a sample video and upload it to S3 for processing with BDA. We'll use a short video that contains various elements that BDA can analyze, including different scenes, spoken content, and visual elements.

The video will be stored in an S3 bucket that BDA can access. This step follows the same pattern we used for document, image, and audio processing, where we first need to have the content accessible in S3.

In [None]:
# Download sample video using our enhanced utility function
sample_video = 'content-moderation-demo.mp4'
source_url = 'https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/335119c4-e170-43ad-b55c-76fa6bc33719/NetflixMeridian.mp4'

# Download the video with enhanced error handling
try:
    bda_utils.download_video(source_url, sample_video)
    print(f"Successfully downloaded video to {sample_video}")
except Exception as e:
    print(f"Error downloading video: {e}")

# Display the video in the notebook for preview
display(Video(sample_video, width=800))

# Upload to S3 for BDA processing
s3_key = f'{bda_utils.data_prefix}/{sample_video}'
s3_uri = bda_utils.upload_to_s3(sample_video, s3_key)
print(f"Uploaded video to S3: {s3_uri}")

## 2. Define BDA Configuration and Create Project

Now we'll define the standard output configuration for video analysis and create a BDA project. This configuration determines what information BDA will extract from the video.

### Video Processing Capabilities

BDA offers specialized processing options for video content:

In [None]:
# Display business context for video chapters
show_business_context("chapter_detection")

# Define standard output configuration for video processing
standard_output_config = {
    'video': {
        'extraction': {
            'category': {
                'state': 'ENABLED',
                'types': [
                    'CONTENT_MODERATION',  # Detect inappropriate content
                    'TEXT_DETECTION',      # Extract text from the video
                    'TRANSCRIPT',          # Generate transcript of spoken content
                    'LOGOS'                # Identify brand logos
                ]
            },
            'boundingBox': {
                'state': 'ENABLED'         # Include bounding boxes for detected elements
            }
        },
        'generativeField': {
            'state': 'ENABLED',
            'types': [
                'VIDEO_SUMMARY',           # Generate overall video summary
                'CHAPTER_SUMMARY',         # Generate summaries for each chapter
                'IAB'                      # Classify into IAB categories
            ]
        }
    }
}

# Create a BDA project with our standard output configuration
print("Creating BDA project for video analysis...")
response = bda_utils.bda_client.create_data_automation_project(
    projectName=f'bda-workshop-video-project-{str(uuid.uuid4())[0:4]}',
    projectDescription='BDA workshop video sample project',
    projectStage='DEVELOPMENT',
    standardOutputConfiguration=standard_output_config
)

# Get the project ARN
video_project_arn = response.get("projectArn")
print(f"BDA project created with ARN: {video_project_arn}")

## 3. Process Video with BDA

Now we'll use the `invoke_data_automation_async` API to process our video with BDA. As we've seen with document, image, and audio processing, BDA operates asynchronously due to the complexity and processing time required for rich media.

In [None]:
# Invoke BDA to process the video
print(f"Processing video: {s3_uri}")
print(f"Results will be stored at: s3://{bda_utils.default_bucket}/{bda_utils.output_prefix}")

# Call the invoke_data_automation_async API
response = bda_utils.bda_runtime_client.invoke_data_automation_async(
    inputConfiguration={
        's3Uri': s3_uri  # The S3 location of our video
    },
    outputConfiguration={
        's3Uri': f's3://{bda_utils.default_bucket}/{bda_utils.output_prefix}'  # Where to store results
    },
    dataAutomationConfiguration={
        'dataAutomationProjectArn': video_project_arn,  # The project we created
        'stage': 'DEVELOPMENT'                          # Must match the project stage
    },
    dataAutomationProfileArn=f'arn:aws:bedrock:{bda_utils.current_region}:{bda_utils.account_id}:data-automation-profile/us.data-automation-v1'
)

# Get the invocation ARN
invocation_arn = response.get("invocationArn")
print(f"Invocation ARN: {invocation_arn}")

# Wait for processing to complete using our enhanced pattern
# This uses the same flexible pattern we developed for audio processing
status_response = bda_utils.wait_for_completion(
    get_status_function=bda_utils.bda_runtime_client.get_data_automation_status,
    status_kwargs={'invocationArn': invocation_arn},
    completion_states=['Success'],
    error_states=['ClientError', 'ServiceError'],
    status_path_in_response='status',
    max_iterations=20,  # Video might take longer than other modalities
    delay=10
)

# Check if processing was successful
if status_response['status'] == 'Success':
    output_config_uri = status_response.get("outputConfiguration", {}).get("s3Uri")
    print(f"\nVideo processing completed successfully!")
    print(f"Output configuration: {output_config_uri}")
else:
    print(f"\nVideo processing failed with status: {status_response['status']}")
    if 'error_message' in status_response:
        print(f"Error message: {status_response['error_message']}")

## 4. Retrieve and Explore BDA Results

Now that the video has been processed, let's retrieve the results from S3 and explore the insights extracted by BDA.

In [None]:
# Load job metadata
config_data = bda_utils.read_json_from_s3(output_config_uri)

# Get standard output path
standard_output_path = config_data["output_metadata"][0]["segment_metadata"][0]["standard_output_path"]
result_data = bda_utils.read_json_from_s3(standard_output_path)

# Create bda-results directory if it doesn't exist
ensure_bda_results_dir()

# Save the result data to the bda-results directory
with open('../bda-results/video_result.json', 'w') as f:
    json.dump(result_data, f)
    
print(f"Saved video results to: ../bda-results/video_result.json")

### Exploring Video Metadata and Summary

Let's first look at the basic metadata and the overall video summary generated by BDA.

In [None]:
# Display video metadata and summary
print("=== Video Metadata ===\n")
metadata = result_data["metadata"]
print(f"Duration: {metadata.get('duration_seconds', 'N/A')} seconds")
print(f"Format: {metadata.get('format', 'N/A')}")
print(f"Resolution: {metadata.get('width_pixels', 'N/A')} x {metadata.get('height_pixels', 'N/A')}")
print(f"Frame Rate: {metadata.get('frame_rate', 'N/A')} fps")

print("\n=== Video Summary ===\n")
if "summary" in result_data["video"]:
    print(result_data["video"]["summary"])
else:
    print("No summary available")

### Retrieving Chapter Information

One of the most powerful capabilities of BDA for video analysis is automatic chapter detection and summarization. Let's retrieve the chapter structure of our video.

In [None]:
# Display detailed chapter information
print("=== Chapter Information ===\n")
for i, chapter in enumerate(result_data["chapters"]):
    start_time = chapter.get("start_timecode_smpte", "N/A")
    end_time = chapter.get("end_timecode_smpte", "N/A")
    print(f"\nChapter {i+1}: [{start_time} - {end_time}]")
    
    if "summary" in chapter:
        print(f"Summary: {chapter['summary']}")
    
    if "iab_categories" in chapter:
        categories = [iab["category"] for iab in chapter["iab_categories"]]
        print(f"IAB Categories: {', '.join(categories)}")

### Analyzing Shot Transitions

BDA also breaks down videos into individual shots, which are continuous segments from a single camera perspective. Let's analyze the shots detected in our video.

In [None]:
# Display information about scene detection
show_business_context("scene_detection")

# Display video shots with enhanced visualization
print("=== Video Shot Analysis ===\n")
print("Generating images for each shot in the video...")
shot_images = bda_utils.generate_shot_images(sample_video, result_data)
bda_utils.plot_shots(shot_images)

### Content Moderation Analysis

BDA can detect potentially sensitive or inappropriate content in videos. Let's examine the content moderation results for our video.

In [None]:
# Show business context for content moderation
show_business_context("content_moderation")

# Display content moderation results with enhanced visualization
print("=== Content Moderation Analysis ===\n")
print("Displaying visual content moderation results for the first chapter:")
bda_utils.plot_content_moderation(sample_video, result_data, 0)

### Text Detection in Video

BDA can detect and extract text that appears in video frames. This is useful for capturing information displayed on screen, such as titles, captions, or other textual content.

In [None]:
# Show business context for text detection
show_business_context("video_text_detection")

# Function to extract text lines from a frame
def extract_text_lines(frame):
    text_lines = []
    
    # Check all possible locations where text might be stored
    if "features" in frame and "text_lines" in frame["features"]:
        text_lines = frame["features"]["text_lines"]
    elif "text_detection" in frame:
        text_lines = frame["text_detection"]
    elif "text_lines" in frame:
        text_lines = frame["text_lines"]
    
    return text_lines

# Display detected text lines
print("=== Detected Text in Video Frames ===\n")
text_lines_found = False

# Check for text in the frames
for i, chapter in enumerate(result_data["chapters"]):
    for frame in chapter.get("frames", []):
        text_lines = extract_text_lines(frame)
        
        if text_lines:
            text_lines_found = True
            frame_time = frame["timestamp_millis"] / 1000
            print(f"\nText detected at {frame_time:.2f}s:")
            
            for text_line in text_lines:
                confidence = text_line.get("confidence", "N/A")
                detected_text = text_line.get("text", "")
                print(f"- \"{detected_text}\" (Confidence: {confidence})")

if not text_lines_found:
    print("No text detected in the video frames.")
else:
    print("\n🔤 Technical Win: BDA automatically extracted text from video frames!")
    print("This makes previously unsearchable text content in videos discoverable and analyzable.")

### Logo Detection

BDA can identify logos and brand marks that appear in videos. This is valuable for brand monitoring, competitive analysis, and content monetization.

In [None]:
# Show business context for logo detection
show_business_context("logo_detection")

# Function to extract logos from a frame
def extract_logos(frame):
    logos = []
    
    # Check all possible locations where logos might be stored
    if "features" in frame and "logos" in frame["features"]:
        logos = frame["features"]["logos"]
    elif "logos" in frame:
        if isinstance(frame["logos"], list):
            logos = frame["logos"]
    
    return logos

# Display detected logos
print("=== Detected Logos ===\n")
logos_found = False

# Check for logos in the frames
for i, chapter in enumerate(result_data["chapters"]):
    for frame in chapter.get("frames", []):
        logos = extract_logos(frame)
        
        if logos:
            logos_found = True
            frame_time = frame["timestamp_millis"] / 1000
            print(f"\nLogos detected at {frame_time:.2f}s:")
            
            for logo in logos:
                confidence = logo.get("confidence", "N/A")
                logo_name = logo.get("name", "Unknown logo")
                print(f"- \"{logo_name}\" (Confidence: {confidence})")

if not logos_found:
    print("No logos detected in the video.")
else:
    print("\n🏢 Technical Win: BDA automatically identified brand logos in the video!")
    print("This enables brand monitoring, competitive analysis, and content monetization opportunities.")

### IAB Category Analysis

BDA can classify video content according to the Internet Advertising Bureau (IAB) content taxonomy. This provides standardized categorization for content discovery, ad targeting, and organization.

In [None]:
# Show business context for IAB categorization
show_business_context("iab_categorization")

# Visualize IAB categories
bda_utils.visualize_iab_categories(result_data)

## Connecting the Dots: From Images to Audio to Video

Throughout this workshop series, we've explored how Amazon Bedrock Data Automation can extract structured insights from different content modalities:

1. **Document Analysis**: We began with document extraction, learning how to transform static PDFs into structured data.

2. **Image Analysis**: We moved beyond text to extract visual insights from images, detecting objects, text, and concepts.

3. **Audio Analysis**: We unlocked the voice of information by processing spoken content and identifying speakers.

4. **Video Analysis**: Now, we've seen how BDA can process the most complex modality - video - which combines visual, audio, and temporal elements into a rich information stream.

The power of BDA comes from its ability to handle these diverse modalities through a consistent API pattern:

```python
# Create a project with appropriate configuration
project_response = bda_client.create_data_automation_project(...)

# Process content asynchronously
invocation_response = bda_runtime_client.invoke_data_automation_async(...)

# Wait for completion using our flexible pattern
status_response = wait_for_completion(...)

# Analyze results
result_data = read_json_from_s3(output_path)
```

This consistent approach allows you to build applications that can extract insights from any content type, opening up new possibilities for content understanding, search, and analysis.

## Looking Forward: From Understanding to Intelligence

In the next module, we'll take the final step in our journey by combining all these modalities into a unified multimodal RAG (Retrieval-Augmented Generation) system. We'll see how the structured data extracted by BDA from documents, images, audio, and video can be integrated into a knowledge base for intelligent query answering.

You'll learn how to:
- Create a multimodal knowledge base that incorporates insights from all content types
- Build intelligent query capabilities that can reference cross-modal content
- Design applications that deliver comprehensive answers by synthesizing information from various sources
- Create truly intelligent systems that understand not just individual modalities, but their relationships and contexts

This final step will complete our journey from raw, unstructured data to actionable intelligence across the full spectrum of content types.