# Building Agentic video RAG with Strands Agents and Aurora PostgreSQL - Local infraestructure

Build a comprehensive video content analysis system using [Amazon Bedrock](https://aws.amazon.com/bedrock/?trk=4f1e9f0e-7b21-4369-8925-61f67341d27c&sc_channel=el) with [Amazon Titan Multimodal Embeddings G1 model](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html), [Amazon Transcribe](https://aws.amazon.com/transcribe/?trk=4f1e9f0e-7b21-4369-8925-61f67341d27c&sc_channel=el) for speech-to-text conversion, and [Amazon Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html?trk=4f1e9f0e-7b21-4369-8925-61f67341d27c&sc_channel=el) with pgvector extension for vector storage and similarity search.

![Diagram](data/video-embedding.png)

This notebook integrates with [Strands Agents](https://strandsagents.com/?trk=4f1e9f0e-7b21-4369-8925-61f67341d27c&sc_channel=el) to create intelligent agents with custom Python tools. You can build **two distinct agent types**:

## 🤖 Agent Architecture

### 1. **Video Analysis Agent** 
> **Prerequisites**: ⚠️⚠️⚠️⚠️ Create Amazon Aurora PostgreSQL with this [Amazon CDK Stack](https://github.com/build-on-aws/langchain-embeddings/tree/main/create-aurora-pgvector). Follow steps in [05_create_audio_video_embeddings.ipynb](/05_create_audio_video_embeddings.ipynb) ⚠️⚠️⚠️⚠️

- **Purpose**: Processes and searches video content globally
- **Capabilities**: Analyzes visual frames, transcribed audio, technical content
- **Tools**: `video_embedding_local` for multimodal video search
- **Use Case**: Technical content analysis, finding specific moments in videos

![Diagram](data/agent_videoembedding_local.png)




### 2. **Memory-Enhanced Agent**
> **Prerequisites**: ⚠️⚠️⚠️⚠️ Create Amazon Aurora PostgreSQL with this [Amazon CDK Stack](https://github.com/build-on-aws/langchain-embeddings/tree/main/create-aurora-pgvector). Follow steps in [05_create_audio_video_embeddings.ipynb](/05_create_audio_video_embeddings.ipynb) and create an [Amazon S3 verctor bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-buckets-create.html) that will serve as the backend for your vector memory. ⚠️⚠️⚠️⚠️

- **Purpose**: Provides personalized, context-aware video analysis
- **Capabilities**: Remembers user preferences, learns from interactions, provides tailored responses
- **Tools**: `video_embedding_local` + `s3_vector_memory` for persistent user context
- **Use Case**: Personalized learning experiences, adaptive content recommendations

![Diagram](data/agent_videoembedding_local_memory.png)



In [None]:
import os
import boto3
from datetime import datetime

_region_name = "us-east-1"
ssm = boto3.client(service_name="ssm", region_name=_region_name)

def get_ssm_parameter(name):
    response = ssm.get_parameter(Name=name, WithDecryption=True)
    return response["Parameter"]["Value"]

os.environ['AURORA_CLUSTER_ARN'] = get_ssm_parameter("/videopgvector/cluster_arn")
os.environ['AURORA_SECRET_ARN'] = get_ssm_parameter("/videopgvector/secret_arn")
os.environ['AURORA_DATABASE_NAME'] = 'kbdata'
os.environ['AWS_S3_BUCKET'] = 'YOU-S3-BUCKET' 
os.environ['AWS_REGION'] = _region_name

print(f"✅ Configuration loaded from SSM at {datetime.now()}")

## 🎯 System Architecture

This system provides comprehensive video analysis with these features:

- **🎬 Video Processing**: Extract frames using FFmpeg
- **🧠 Multimodal Embeddings**: Generate embeddings with Amazon Bedrock Titan
- **🎤 Audio Transcription**: Convert audio to text with Amazon Transcribe
- **📊 Vector Storage**: Store in Aurora PostgreSQL with pgvector
- **🔍 Semantic Search**: Perform similarity-based queries
- **🤖 Agent Integration**: With Strands Agents framework

The system extracts key frames, transcribes audio, generates embeddings for visual and text content, then stores everything in a searchable vector database.

In [None]:
import sys
sys.path.append('tools')
from video_embedding_tool_local import video_embedding_local

print("✅ Local video embedding tool imported")

## 🔧 Tool Configuration

The `video_embedding_local` tool accepts these parameters:

| Parameter | Description | Default |
|-----------|-------------|----------|
| `video_path` | Path to video (local or S3) | Required |
| `user_id` | User identifier | Required |
| `action` | 'process', 'search', 'list' | 'process' |
| `similarity_threshold` | Similarity threshold (0.0-1.0) | 0.8 |
| `frames_per_second` | Frame extraction rate | 1 |
| `query` | Search query (for search action) | None |

### Performance Optimization:
- **High precision**: `frames_per_second: 2`, `similarity_threshold: 0.7`
- **Balanced**: `frames_per_second: 1`, `similarity_threshold: 0.8`
- **Fast processing**: `frames_per_second: 0.5`, `similarity_threshold: 0.9`

In [None]:
USER_ID = "langchain_test_user_2"
VIDEO_PATH = "videos/video.mp4"

print(f"🎬 Processing video: {VIDEO_PATH}")
print(f"👤 User ID: {USER_ID}")

result = video_embedding_local(
    video_path=VIDEO_PATH,
    user_id=USER_ID,
    action="process",
    similarity_threshold=0.8,
    frames_per_second=1,
    region=_region_name,
)

print(f"\n📊 Processing Result: {result.get('status')}")
if result.get('status') == 'success':
    print(f"Video S3 URI: {result.get('video_s3_uri')}")
    print(f"Total frames: {result.get('total_frames')}")
    print(f"Embeddings stored: {result.get('total_stored')}")
else:
    print(f"Error: {result.get('message')}")

## 🎯 Model Configuration Options

Strands supports multiple model configuration approaches:

### Option 1: Default Configuration
```python
from strands import Agent
agent = Agent()  # Uses Claude 4 Sonnet by default
```

### Option 2: Specify Model ID
```python
agent = Agent(model="anthropic.claude-sonnet-4-20250514-v1:0")
```

### Option 3: BedrockModel (Recommended)
```python
from strands.models import BedrockModel

model = BedrockModel(
    model_id="anthropic.claude-sonnet-4-20250514-v1:0",
    temperature=0.3,
    top_p=0.8
)
agent = Agent(model=model)
```

### Option 4: Anthropic Direct
```python
from strands.models.anthropic import AnthropicModel

model = AnthropicModel(
    model_id="claude-sonnet-4-20250514",
    max_tokens=1028,
    params={"temperature": 0.7}
)
```

You can also use other model providers:
- [LiteLLM](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/litellm/)
- [llama.cpp](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/llamacpp/)
- [Llama API](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/llamaapi/)
- [Mistral AI](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/mistral/)
- [Ollama](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/ollama/)
- [OpenAI](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/openai/)
- [Cohere](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/openai/)

**BedrockModel Benefits:**
- Native AWS integration
- Guardrails support
- Prompt caching capabilities

In [None]:
from strands import Agent
from strands.models import BedrockModel
from s3_memory import s3_vector_memory
from video_image_display import display_video_images

# S3 Vectors Configuration
os.environ['VECTOR_BUCKET_NAME'] = 'YOUR-S3-BUCKET'  # Your S3 Vector bucket
os.environ['VECTOR_INDEX_NAME'] = 'YOUR-VECTOR-INDEX'        # Your vector index
os.environ['AWS_REGION'] = 'us-east-1'                       # AWS region
os.environ['EMBEDDING_MODEL'] = 'amazon.titan-embed-text-v2:0' # Bedrock embedding model

model = BedrockModel(model_id="us.anthropic.claude-3-5-sonnet-20241022-v2:0")

VIDEO_SYSTEM_PROMPT = """You are a video processing AI assistant.

Available actions:
- process: Upload and process videos 
- search: Search video content using semantic similarity
- list: List all processed videos

Use video_embeddings_aws for all cloud video operations.
Use display_video_images to show search results.
"""

video_agent = Agent(
    model=model, 
    tools=[video_embedding_local, display_video_images],
    system_prompt=VIDEO_SYSTEM_PROMPT
)

memory_agent = Agent(
    model=model, 
    tools=[video_embedding_local, s3_vector_memory, display_video_images],
    system_prompt=VIDEO_SYSTEM_PROMPT
)

print("🤖 Agents created")

## 🎥 Test Video Content

This notebook uses **AWS re:Invent 2024 session on "AI self-service support with knowledge retrieval using PostgreSQL"** ([YouTube link](https://www.youtube.com/watch?v=fpi3awGakyg)).

**Video Topics:**
- Vector databases and embeddings for AI applications
- Amazon Aurora PostgreSQL with pgvector for scalable vector storage
- RAG (Retrieval Augmented Generation) implementations
- Amazon Bedrock Agents for intelligent customer support
- Real-world use cases and technical demonstrations

This content provides excellent testing material for our video analysis system with technical presentations including visual slides and detailed explanations.

In [None]:
response = video_agent(f"""What is the video about in {VIDEO_PATH}? """)

print(response)

## 🧠 S3 Vector Memory Tool

The `s3_vector_memory` tool provides AWS-native memory management using Amazon S3 with automatic user isolation. This tool complements video embedding functionality by storing user preferences and context.

### Key Features:
- **User Isolation**: Each user's memories are stored separately using `user_id`
- **Persistent Storage**: Memories persist across agent sessions
- **Semantic Search**: Find relevant memories using natural language queries
- **AWS Native**: Uses Amazon S3 and Bedrock for embeddings

### Available Actions:
- `store`: Save new memory content for a user
- `retrieve`: Search and retrieve relevant memories for a user
- `list`: List all memories for a specific user

### Tool Parameters:
- `user_id`: **Required** - User identifier for memory isolation
- `action`: Operation to perform (store/retrieve/list)
- `content`: Content to store (for store action)
- `query`: Search query (for retrieve action)
- `top_k`: Maximum results to return (default: 20)

This creates personalized agent experiences where the agent remembers user preferences, previous conversations, and context across multiple interactions.

## 🔧 Tool Comparison

Understanding how the two tools work differently:

### `video_embedding_local` Tool:
- **Search Scope**: Searches across ALL video content in the database
- **User Context**: Uses `user_id` only for processing/listing operations
- **Search Method**: Semantic search across video transcriptions and visual content
- **Data Source**: Aurora PostgreSQL with pgvector

### `s3_vector_memory` Tool:
- **Search Scope**: Searches only within a specific user's memories
- **User Context**: **Requires** `user_id` for all operations (security isolation)
- **Search Method**: Semantic search across stored user memories
- **Data Source**: Amazon S3

This combination provides:
1. **Global video search** across all content
2. **Personal memory management** for individual users
3. **Context-aware responses** that combine both data sources

In [None]:
response = memory_agent(f"""I'm interested in learning about AI and database technologies. 
Store this preference for user {USER_ID}, then search the video in {VIDEO_PATH} content for technical discussions 
about vector databases, embeddings, and how they're used in AI applications.""")

print("🧠 Memory Agent - Store Preferences & Search:")
print(response)

In [None]:
response = memory_agent(f""" What did the user  {USER_ID} ask before?""")

print("🧠 Memory Agent - Store Preferences & Search:")
print(response)

In [None]:
USER_ID_1 = "langchain_test_user_1"

response = memory_agent(f""" I'm interested in learning about RAG and SAAS. 
My user is {USER_ID_1}, explain the video in {VIDEO_PATH}.
                        """)

print(response)

In [None]:
response = memory_agent(f""" Mi nombre es Eli y quiero saber si el video en  {VIDEO_PATH} habla de whatsapp. 
My user is {USER_ID_1}.
                        """)