# Prerequisites

## Workshop overview

Welcome to the Video Understanding on AWS workshop!

The workshop is organized into two main parts: 1. Media Analysis using Bedrock Data Automation and 2. Media Analysis using Amazon Nova. Read on to get an overview of the different sections.  Part 1 and Part 2 can be run independently after running this notebook.

**Prerequisites**

Before running the main workshop, you'll set up the notebook environment using this notebook.

**Part 1: Media Analysis using Bedrock Data Automation (BDA):**

The notebooks in this section give an overview of BDA APIs and use cases.  They can be run in any order.  

1. [Extract and analyze a movie with BDA](1-media-analysis-using-bda/01-extract-analyze-a-movie.ipynb)
2. [Contextual Ad overlay](1-media-analysis-using-bda/02-contextual-ad-overlay.ipynb)

**Part 2: Media Analysis using Amazon Nova:**

In the foundation notebooks, you'll set up the notebook environment, prepare the sample video by breaking it down into clips, and you will experiment with using Foundation models to generate insights about video clips.  In the second part of the workshop, you will use the foundations to solve different video understanding use cases.  The use cases are independent and can be run in any order.

**Foundation (required before running use cases)**
1. [Visual video segments: frames, shots and scenes](2-media-analysis-using-amazon-nova/01A-visual-segments-frames-shots-scenes.ipynb) (20 minutes)
2. [Audio segments](2-media-analysis-using-amazon-nova/01B-audio-segments.ipynb) (10 minutes)

**Use cases (optional, run in any order):**

After running the Foundations notebooks, you can choose any use case.  If you are running at an AWS Workshop event, you will be able to complete foundations plus one use case in a 2 hour session:

* [Ad break detection and contextual Ad tartgeting](2-media-analysis-using-amazon-nova/02-ad-breaks-and-contextual-ad-targeting.ipynb) (20 minutes) - identify opportunities for ad insertion.  Use a standard taxonomy to match video content to ad content.
* [Video summarization](2-media-analysis-using-amazon-nova/03-video-summarization.ipynb) (20 minutes) - generate short form videos from a longer video
* [Semantic video search](2-media-analysis-using-amazon-nova/04-semantic-video-search.ipynb) (20 minutes) - search video using images and natural language to find relevant clips

**Resources**

The activities in this workshop are based on AWS Solution Guidance.  The [Additional Resources](./09-resources.ipynb) lab contains links to relevant reference architectures, code samples and blog posts.

# Install ffmpeg and python packages

- ffmpeg for video and image processing
- faiss for vector store
- webvtt-py for parsing subtitle file
- termcolor for formatting output

In [None]:
## install ffmpeg (Linux/SageMaker only)
# On macOS, install with: brew install ffmpeg
import platform
if platform.system() == 'Linux':
    !sudo apt update -y && sudo apt-get -y install ffmpeg
else:
    print(f"Skipping apt install on {platform.system()}. Install ffmpeg manually if needed.")

In [None]:
## Check if ffmpeg is installed
import shutil
if shutil.which('ffmpeg'):
    print(f"✓ ffmpeg found at: {shutil.which('ffmpeg')}")
else:
    print("✗ ffmpeg not found. Please install it:")
    print("  macOS: brew install ffmpeg")
    print("  Linux: sudo apt-get install ffmpeg")

In [None]:
%pip install -r requirements.txt

## Verify Package Imports

Verify that all required libraries can be imported successfully.

In [None]:
import sys
from importlib import import_module

# List of required packages to verify
required_packages = [
    ('boto3', 'AWS SDK'),
    ('websockets', 'Networking'),
    ('numpy', 'Data Processing'),
    ('pandas', 'Data Processing'),
    ('jupyter', 'Notebook Support'),
    ('IPython', 'Notebook Support'),
    ('ipywidgets', 'Notebook Widgets'),
    ('cv2', 'Video Processing (OpenCV)'),
    ('PIL', 'Image Processing (Pillow)'),
    ('dotenv', 'Environment Variables'),
    ('requests', 'HTTP Requests'),
]

print("Verifying core packages...")
failed_imports = []
for package, description in required_packages:
    try:
        mod = import_module(package)
        version = getattr(mod, '__version__', 'N/A')
        print(f"✓ {package:20s} (v{version})")
    except ImportError as e:
        failed_imports.append((package, description))
        print(f"✗ {package:20s} FAILED")

# Verify AWS Bedrock Agent packages
print("\nVerifying Bedrock Agent packages...")
bedrock_packages = ['strands', 'strands_tools', 'bedrock_agentcore']
bedrock_failed = []

for package in bedrock_packages:
    try:
        mod = import_module(package)
        version = getattr(mod, '__version__', 'N/A')
        print(f"✓ {package:30s} (v{version})")
    except ImportError as e:
        bedrock_failed.append(package)
        print(f"✗ {package:30s} FAILED")

# Final summary
total_packages = len(required_packages) + len(bedrock_packages)
total_failed = len(failed_imports) + len(bedrock_failed)

if total_failed == 0:
    print(f"\n✅ All {total_packages} packages imported successfully!")
else:
    print(f"\n⚠️ {total_failed} package(s) failed to import")

## Get SageMaker default resources

In [None]:
import boto3

sagemaker_resources = {}

# Try to get SageMaker execution role, fallback to boto3 session if not in SageMaker
try:
    import sagemaker
    sagemaker_resources["role"] = sagemaker.get_execution_role()
    sagemaker_resources["region"] = sagemaker.Session()._region_name
    print("Running in SageMaker environment")
except Exception:
    # Not in SageMaker, use boto3 session
    session = boto3.Session()
    sagemaker_resources["role"] = None  # Not needed outside SageMaker
    sagemaker_resources["region"] = session.region_name
    print("Running in local/non-SageMaker environment")

print(sagemaker_resources)

# Setup session AWS resources

The cell below loads AWS resources from the CloudFormation stack outputs. This works for both:
- AWS hosted events (using the full `workshop.yaml` stack)
- Your own AWS account (using the minimal `workshop-customer.yaml` stack)

Both stacks use the same stack name (`workshop`) and provide the same outputs, so this notebook works in either environment.

### Deployment Options

**Automatic deployment from notebook (recommended)**

Run the code cell below. If the stack doesn't exist, it will automatically deploy `workshop-customer.yaml` with your current user/role ARN for OpenSearch access.

**Note:** The `UserOrRoleArn` parameter is optional but recommended if you're running outside of SageMaker or need OpenSearch access from a specific IAM identity.

## Get CloudFormation stack outputs

In [None]:
import boto3
from IPython.display import JSON
from botocore.exceptions import ClientError

cf = boto3.client(service_name="cloudformation")
sts = boto3.client(service_name="sts")

try:
    stack = cf.describe_stacks(StackName='workshop')
    print("✓ Stack found")
except ClientError:
    print("Stack not found. Deploying workshop-customer.yaml...")
    
    # Get current user/role ARN for OpenSearch access
    try:
        caller_identity = sts.get_caller_identity()
        current_arn = caller_identity['Arn']
        print(f"  Detected identity: {current_arn}")
    except Exception as e:
        print(f"  Warning: Could not detect current ARN: {e}")
        current_arn = ""
    
    with open('workshop-customer.yaml', 'r') as f:
        template_body = f.read()
    
    # Create stack with current user ARN for OpenSearch access
    stack_params = {
        'StackName': 'workshop',
        'TemplateBody': template_body,
        'Capabilities': ['CAPABILITY_NAMED_IAM']
    }
    
    if current_arn:
        stack_params['Parameters'] = [
            {'ParameterKey': 'UserOrRoleArn', 'ParameterValue': current_arn}
        ]
    
    cf.create_stack(**stack_params)
    
    print("  Waiting for stack creation (5-10 minutes)...")
    cf.get_waiter('stack_create_complete').wait(StackName='workshop')
    
    stack = cf.describe_stacks(StackName='workshop')
    print("✓ Stack deployed successfully")

In [None]:
JSON(stack)

In [None]:
session = {}
session['bucket'] = next(item["OutputValue"] for item in stack['Stacks'][0]['Outputs'] if item["OutputKey"] == "S3BucketName")
session['MediaConvertRole'] = next(item["OutputValue"] for item in stack['Stacks'][0]['Outputs'] if item["OutputKey"] == "MediaConvertRole")
session["AOSSCollectionEndpoint"] = next(item["OutputValue"] for item in stack['Stacks'][0]['Outputs'] if item["OutputKey"] == "AOSSCollectionEndpoint")

print("\nWorkshop resources loaded:")
print(f"  S3 Bucket: {session['bucket']}")
print(f"  MediaConvert Role: {session['MediaConvertRole']}")
print(f"  OpenSearch Collection: {session['AOSSCollectionEndpoint']}")

## Configure Bedrock Models

Configure the Amazon Bedrock foundation model IDs for visual, audio, and audiovisual understanding.

### Subscribe to Claude Models

In [None]:
import json 

def subscribeTo3PBedrock(modelId):
    bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')
    bedrock.invoke_model(
        modelId=modelId,
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 200,
            "messages": [{"role": "user", "content": "ping"}]
        })
    )

modelsUsed = ['global.anthropic.claude-sonnet-4-20250514-v1:0']

for model in modelsUsed: 
    subscribeTo3PBedrock(model)

In [None]:
# Model Configuration
print("Configuring Bedrock Model IDs...")

# Visual Understanding Model (for image analysis)
VISUAL_MODEL_ID = "global.anthropic.claude-sonnet-4-20250514-v1:0"

# Audio Understanding Model (for audio transcription analysis)
AUDIO_MODEL_ID = "global.anthropic.claude-sonnet-4-20250514-v1:0"

# Audiovisual Understanding Model (for multimodal fusion)
AUDIOVISUAL_MODEL_ID = "global.anthropic.claude-sonnet-4-20250514-v1:0"

AWS_REGION = sagemaker_resources["region"]

print(f"✅ Visual Model: {VISUAL_MODEL_ID}")
print(f"✅ Audio Model: {AUDIO_MODEL_ID}")
print(f"✅ Audiovisual Model: {AUDIOVISUAL_MODEL_ID}")
print(f"✅ Region: {AWS_REGION}")

# Store variables for use in other notebooks
%store VISUAL_MODEL_ID
%store AUDIO_MODEL_ID
%store AUDIOVISUAL_MODEL_ID
%store AWS_REGION

print("\n✅ Model configuration saved!")

## Configure Bedrock AgentCore Memory and Knowledge Base

Deploy the live-vu-lab CloudFormation stack to create AgentCore Memory and Knowledge Base resources.

In [None]:
import boto3
from botocore.exceptions import ClientError

print("Configuring Bedrock AgentCore Memory and Knowledge Base...")

cfn_client = boto3.client('cloudformation', region_name=AWS_REGION)
stack_name = 'ws-intelligent-mw-agentic-ai-lab'

try:
    # Check if stack exists
    stack = cfn_client.describe_stacks(StackName=stack_name)
    print(f"✓ Stack '{stack_name}' found")
except ClientError:
    print(f"Stack '{stack_name}' not found. Deploying live-vu-lab.yaml...")
    
    # Read the CloudFormation template
    with open('live-vu-lab.yaml', 'r') as f:
        template_body = f.read()
    
    # Create stack
    cfn_client.create_stack(
        StackName=stack_name,
        TemplateBody=template_body,
        Capabilities=['CAPABILITY_NAMED_IAM']
    )
    
    print("  Waiting for stack creation (5-10 minutes)...")
    cfn_client.get_waiter('stack_create_complete').wait(StackName=stack_name)
    
    stack = cfn_client.describe_stacks(StackName=stack_name)
    print("✓ Stack deployed successfully")

# Extract outputs from stack
if stack['Stacks']:
    outputs = stack['Stacks'][0].get('Outputs', [])
    
    # Initialize variables
    video_analysis_mem_id = None
    transcript_mem_id = None
    kb_id = None
    ds_id = None
    data_bucket_name = None
    
    # Extract all outputs
    for output in outputs:
        if output['OutputKey'] == 'VideoAnalysisMemoryId':
            video_analysis_mem_id = output['OutputValue']
        elif output['OutputKey'] == 'LiveTranscriptionMemoryId':
            transcript_mem_id = output['OutputValue']
        elif output['OutputKey'] == 'ChapterMemoryId':
            chapter_mem_id = output['OutputValue']
        elif output['OutputKey'] == 'KnowledgeBaseId':
            kb_id = output['OutputValue']
        elif output['OutputKey'] == 'DataSourceID':
            ds_id = output['OutputValue']
        elif output['OutputKey'] == 'DataBucketName':
            data_bucket_name = output['OutputValue']
    
    # Session IDs and Actor ID
    video_analysis_session_id = "video-analysis"
    trans_session_id = "transcripts"
    chapter_session_id = "chapters"
    actor_id = "lvu"
    
    # Display results
    print(f"\n✅ Video Analysis Memory: {video_analysis_mem_id}")
    print(f"✅ Transcript Memory: {transcript_mem_id}")
    print(f"✅ Chapter Memory ID: {chapter_mem_id}")
    print(f"✅ Knowledge Base: {kb_id}")
    print(f"✅ Data Source: {ds_id}")
    print(f"✅ Data Bucket: {data_bucket_name}")
    
    # Store variables for use in other notebooks
    %store video_analysis_mem_id
    %store video_analysis_session_id
    %store transcript_mem_id
    %store trans_session_id
    %store chapter_mem_id
    %store chapter_session_id
    %store actor_id
    %store kb_id
    %store ds_id
    %store data_bucket_name
    
    print("\n✅ Configuration saved!")
else:
    print("⚠️ Could not retrieve stack outputs")

## Download & Process Sample Video Content

This section downloads the Netflix Open Content Meridian video file and creates a 2-minute sample clip for use in the workshop notebooks:

**Video Content:**
- **Full Video**: Netflix Open Content Meridian (complete film)
- **Sample Clip**: First 2 minutes for quick testing and demonstrations
- **Storage Location**: `sample_videos/` directory in the project root

**Files Created:**
- `sample_videos/Netflix_Open_Content_Meridian.mp4` - Full video file
- `sample_videos/netflix-2mins.mp4` - 2-minute sample clip

The sample clip is particularly useful for rapid testing and development without processing the entire film.

In [None]:
import os
import requests
import subprocess
from pathlib import Path

print("Setting up sample video content...")

# Setup paths
sample_videos_dir = Path("4-live-media-analysis-agent/sample_videos")
sample_videos_dir.mkdir(exist_ok=True)
full_video_path = sample_videos_dir / "Netflix_Open_Content_Meridian.mp4"
clip_video_path = sample_videos_dir / "netflix-2mins.mp4"
meridian_url = "https://ws-assets-prod-iad-r-pdx-f3b3f9f1a7d6a3d0.s3.us-west-2.amazonaws.com/7db2455e-0fa6-4f6d-9973-84daccd6421f/Netflix_Open_Content_Meridian.mp4"

# Download full video if needed
if not full_video_path.exists():
  print(f"Downloading {full_video_path.name}...")
  try:
      response = requests.get(meridian_url, stream=True)
      response.raise_for_status()
      with open(full_video_path, 'wb') as f:
          for chunk in response.iter_content(chunk_size=8192):
              if chunk:
                  f.write(chunk)
      print(f"✓ Downloaded ({full_video_path.stat().st_size:,} bytes)")
  except Exception as e:
      print(f"✗ Download failed: {e}")
else:
  print(f"✓ Full video exists ({full_video_path.stat().st_size:,} bytes)")

# Create 2-minute clip if needed
if full_video_path.exists() and not clip_video_path.exists():
  print(f"Creating 2-minute clip...")
  try:
      cmd = ['ffmpeg', '-i', str(full_video_path), '-t', '120', '-c', 'copy',
             '-avoid_negative_ts', 'make_zero', str(clip_video_path)]
      result = subprocess.run(cmd, capture_output=True, text=True)
      if result.returncode == 0:
          print(f"✓ Clip created ({clip_video_path.stat().st_size:,} bytes)")
      else:
          print(f"✗ Clip creation failed. Install ffmpeg if missing.")
  except FileNotFoundError:
      print("✗ ffmpeg not found. Install with: brew install ffmpeg (macOS) or apt install ffmpeg (Ubuntu)")
  except Exception as e:
      print(f"✗ Error: {e}")
elif clip_video_path.exists():
  print(f"✓ 2-minute clip exists ({clip_video_path.stat().st_size:,} bytes)")

# Summary
print(f"\nStatus: Full video {'✓' if full_video_path.exists() else '✗'} | 2-min clip {'✓' if clip_video_path.exists() else '✗'}")
if full_video_path.exists() and clip_video_path.exists():
  print("Ready to proceed with video analysis!")

# Save variables we will use in other notebooks

We will use this data in the next labs. In order to use this data we will store these variables so subsequent notebooks can use this data.

In [None]:
%store sagemaker_resources
%store session

# Find Amazon Q Developer

Jupyter notebooks in SageMaker Studio have Amazon Q Developer enabled.  

1. To use Q Developer click on the Q Developer chat icon in the left sidebar menu. The active side panel should now be Amazon Q Developer.
<br></br>
<img src="static/images/00-qdev-sidebar1.png" alt="Q Developer Sidebar" style="width: 600px;"/>
<br></br>
5. Try it out by asking a question.  For example, you could ask: `What kinds of questions can Q developer answer? Be brief.` You should get a response like this:
<br></br>
<img src="static/images/00-qdev-skills1.png" alt="Q Developer Skills" style="width: 600px;"/>
<br></br>

Throughout this workshop, you can use Q when you encounter errors or have questions about the code.  