# From Insights to Intelligence: Multimodal RAG with Amazon Bedrock

This notebook demonstrates how to build a Multimodal Retrieval-Augmented Generation (RAG) application using Amazon Bedrock Data Automation (BDA) and Bedrock Knowledge Bases (KB). The application can analyze and generate insights from multiple data modalities, including documents, images, audio, and video.

## Setup and Configuration

Let's start by setting up the necessary dependencies and AWS clients.

In [None]:
%pip install "boto3>=1.37.4" s3fs tqdm retrying packaging --upgrade -qq

import boto3
import json
import uuid
import time
import os
import random
import sagemaker
import logging
import mimetypes
from botocore.exceptions import ClientError
import warnings
warnings.filterwarnings('ignore')

# Import utils and access the business context function
from utils.utils import BDARAGUtils

# Create utility instance to use its methods
rag_utils = BDARAGUtils()

# Display comprehensive business context for RAG
rag_utils.show_business_context("rag_complete")

# Configure logging
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize AWS clients and session
session = sagemaker.Session()
default_bucket = session.default_bucket()

sts_client = boto3.client('sts')
account_id = sts_client.get_caller_identity()["Account"]
region_name = boto3.session.Session().region_name

s3_client = boto3.client('s3')
bedrock_agent_client = boto3.client('bedrock-agent')
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime')

print(f"Setup complete!")
print(f"Using AWS region: {region_name}")

## 1. Prepare Data for Multimodal Knowledge Base

In this step, we'll prepare our data sources for the knowledge base. We have two options:

1. **Use BDA Output Files** from previous notebooks in this workshop (document, image, audio, video analysis)
2. **Use Sample Files** as a fallback if no BDA outputs are available

In [None]:
# Import our BDARAGUtils class
from utils.utils import BDARAGUtils

# Create a directory for sample files
os.makedirs('examples', exist_ok=True)

# Define the S3 prefix for our dataset
s3_prefix = 'bda/dataset/'

# Check for and upload BDA outputs from previous notebooks
bda_outputs_exist, bucket_name_kb = BDARAGUtils.check_and_upload_bda_outputs(s3_client, region_name=region_name)

# If no BDA outputs found, download and use sample files instead
if not bda_outputs_exist:
    print("\nNo BDA output files found from previous modules. Downloading sample files instead...")
    BDARAGUtils.download_sample_files(output_dir='./examples')
    
    # Upload the sample files to S3
    print("\nUploading sample files to S3...")
    for file_name in os.listdir('./examples/'):
        local_path = os.path.join('./examples/', file_name)
        s3_key = s3_prefix + file_name
        s3_client.upload_file(local_path, bucket_name_kb, s3_key)
        print(f"Uploaded {file_name} to s3://{bucket_name_kb}/{s3_key}")

## 3. Create Multimodal Knowledge Base

Now we'll create a Knowledge Base that can handle our multimodal data.

In [None]:
# Display business context for Knowledge Base creation
rag_utils.show_business_context("knowledge_base")

# Create a timestamp-based suffix for unique resource names
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(time.time()))[-7:]
kb_suffix = f"{timestamp_str}"

# Define Knowledge Base parameters
knowledge_base_name = f"multimodal-rag-kb-{kb_suffix}"
knowledge_base_description = "Multimodal RAG Knowledge Base for the BDA Workshop"

# Define data sources
data_sources = [{
    "type": "S3", 
    "bucket_name": bucket_name_kb,
    "inclusionPrefixes": [s3_prefix]
}]

# Create the Knowledge Base
print(f"🏗️ Creating Knowledge Base: {knowledge_base_name}")
print("This may take several minutes to complete...")

try:
    knowledge_base = BDARAGUtils(
        kb_name=knowledge_base_name,
        kb_description=knowledge_base_description,
        data_sources=data_sources,
        multi_modal=True,
        # If using BDA output files, we don't need BDA as the parser
        # If using raw files, we need BDA as the parser
        parser=None if bda_outputs_exist else 'BEDROCK_DATA_AUTOMATION',
        chunking_strategy="FIXED_SIZE",
        suffix=kb_suffix
    )
    
    knowledge_base.setup_resources()
    
    kb_id = knowledge_base.get_knowledge_base_id()
    print(f"\nKnowledge Base created successfully!")
    print(f"Knowledge Base ID: {kb_id}")
except Exception as e:
    print(f"\nError creating Knowledge Base: {e}")

## 4. Start Data Ingestion

Now that we've created our Knowledge Base, we need to ingest the multimodal data. This process transforms our files into vector embeddings that can be efficiently searched.

In [None]:

print("Starting data ingestion...")
print("This process may take several minutes depending on the amount and size of data.")

# Display business context for data ingestion process
rag_utils.show_business_context("data_ingestion")

try:
    # Start the ingestion job
    knowledge_base.start_ingestion_job()
    print("\nData ingestion completed successfully!")
except Exception as e:
    print(f"\nError during data ingestion: {e}")

## 5. Query the Knowledge Base

Now that our data is ingested, we can query the Knowledge Base using natural language. We'll use Amazon Bedrock's RetrieveAndGenerate API.

In [None]:
# Display business context for semantic search and querying
rag_utils.show_business_context("semantic_search")

def query_kb(query, model_id="amazon.nova-micro-v1:0", num_results=5):
    """
    Query the knowledge base using real AWS API calls and display the results
    
    Args:
        query: The query to send to the knowledge base
        model_id: The foundation model to use for generating the response
        num_results: Number of results to retrieve from the knowledge base
    """
    print(f"🔍 Query: {query}")
    print(f"⏳ Processing...")
    
    try:
        # Use the real AWS API to query the knowledge base
        response = knowledge_base.query_knowledge_base(
            query=query,
            model_id=model_id,
            num_results=num_results
        )
            
        # Return the raw response
        return response
    
    except Exception as e:
        print(f"\nError querying Knowledge Base: {e}")
        return None

### Query 1: Audio Content

Let's start by querying information from the audio content.

In [None]:
# Query about the audio content
audio_query = "What key topics were discussed in the AWS podcast?"

audio_response = query_kb(audio_query)

### Query 2: Visual Content

Now let's query information from the image.

In [None]:
# Query about visual content
visual_query = "What were the priducts shown at the Airport?"

visual_response = query_kb(visual_query)

### Query 3: Document Content

Let's explore information from document content.

In [None]:
# Query about document content
document_query = "What are the key callouts from the treasury statement?"

document_response = query_kb(document_query)

### Query 4: Video Content

Now let's ask a question from the Video.

In [None]:
# Query requiring cross-modal integration
cross_modal_query = "What hapened in El Matador beach?"

cross_modal_response = query_kb(
    query=cross_modal_query,
    num_results=8  # Increase results to capture information from multiple modalities
)

## Summary

In this notebook, we demonstrated how to build a Multimodal RAG application using Amazon Bedrock Data Automation and Bedrock Knowledge Bases. We covered the key steps:

1. **Data Preparation**: We checked for existing BDA outputs or downloaded sample files
2. **Knowledge Base Creation**: We created a Knowledge Base using real AWS API calls
3. **Data Ingestion**: We ingested our multimodal data into the Knowledge Base
4. **Querying**: We queried the Knowledge Base across different modalities using real AWS API calls

This approach allows you to build powerful multimodal applications that can extract insights from different data types (documents, images, audio, and video) and provide unified access through natural language queries.