## Step 4: Putting everything together

### Introduction

Throughout our previous explorations, we've delved into a comprehensive process that encompasses several key steps in working with text/image data and Generative AI driven content generation. We began by examining how to describe images using a multi-modal approach, which allows us to capture both visual and textual information about an image. We then progressed to ingesting these images, along with their associated metadata, into OpenSearch, a powerful and flexible search and analytics engine. This step enables us to create a searchable database of visual content. Building upon this foundation, we explored querying the database effectively, allowing us to retrieve relevant images based on specific criteria or descriptions. Finally, we ventured into the realm of video generation, learning how to create dynamic visual content based on textual prompts with and without an accompanying image.

In this culminating notebook, we will synthesize all of these components into a streamlined, end-to-end process. By leveraging the code, resources, and insights we've accumulated throughout our journey, we'll construct a unified system that seamlessly integrates each step. The ultimate goal of this notebook is to empower you with the ability to input a single prompt and, in response, receive a generated video that not only aligns with your prompt but also incorporates relevant imagery sourced from our OpenSearch database. This holistic approach represents the convergence of various cutting-edge technologies, including natural language processing, image analysis, database management, and AI-driven video generation. By the conclusion of this notebook, you'll have a powerful tool at your disposal, capable of transforming simple text inputs into rich, visually compelling video content, all while leveraging a vast repository of pre-existing visual data."

## Table of Contents

1. [Setup and Dependencies](#Setup-and-Dependencies)
2. [Configuration and Initialization](#Configuration-and-Initialization)
3. [Define Prompts](#Define-Prompts)
4. [Helper Function](#Helper-Function)
5. [VRAG Function](#VRAG-Function)
6. [Loading and Formatting Prompts](#Loading-and-Formatting-Prompts)
7. [Processing Multiple Videos](#Processing-Multiple-Videos)

***

## Setup and Dependencies

First, we'll import the necessary libraries and modules. We'll also import functions from our previous notebooks to avoid code duplication.

In [1]:
import nbimporter
import boto3
import os
from _00_image_processing import resize_and_encode 
from _01_oss_ingestion import query_open_search, get_oss_client
from _02_video_gen_text_only import download_and_display_video, start_video_generation, monitor_video_generation
from _03_video_gen_text_image import build_model_input
from IPython.display import HTML, display



sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/ec2-user/.config/sagemaker/config.yaml


## Configuration and Initialization

Next, we'll retrieve our stored variables from previous notebooks and initialize our AWS clients:

In [2]:
%store -r REGION
%store -r COLLECTION_ENDPOINT
%store -r INDEX_NAME
%store -r OUTPUT_DIR
%store -r BUCKET

bedrock_runtime = boto3.client("bedrock-runtime", REGION)
s3_client = boto3.client('s3')

## Define Prompts

In this crucial step of our video generation pipeline, we focus on creating two types of prompts for producing our desired videos:

1. **OBJECT_PROMPT**:
   This prompt is a description of the main subject or object that we want to feature in our video. It should be specific enough to guide the image retrieval process from our OpenSearch database.

2. **ACTION_PROMPT**:
   This prompt describes the camera movement or action we want to see in the generated video. It's essential for creating dynamic and engaging content.


In [3]:
OBJECT_PROMPT = "red shoes"
ACTION_PROMPT = "Small glowing particles drift in the background behind the shoes."

<div class="alert alert-info">
<b>Note:</b> Engineering a descriptive prompt can highly impact the quality of the generated video. It is recommended to experiment with different iterations of the prompt to get the best version of the video.
</div>

## Helper Function

The function `read_s3_text()`:
- Retrieves and decodes a **text file stored in S3**.
- Handles potential errors if the file is missing or cannot be accessed.

This is useful for reading **stored image metadata** before generating a video.

In [4]:
def read_s3_text(bucket_name: str, file_key: str) -> str:
    """
    Reads a text file from S3 and returns its content.

    Args:
        bucket_name (str): S3 bucket name.
        file_key (str): Key (path) of the file in S3.

    Returns:
        str: The content of the file.
    """
    try:
        response = s3_client.get_object(Bucket=bucket_name, Key=file_key)
        return response['Body'].read().decode('utf-8')
    except Exception as e:
        print(f"Error reading file from S3: {e}")
        return None

## V-RAG Function

The `vrag` (Video Retrieval Augmented Generation) function combines multiple processes into a streamlined workflow for creating AI-generated videos. This function handles everything from finding the right image to producing the final video content.

Here are the steps it follows:

1. **OpenSearch Query and Image Discovery**
   Using the provided `image_prompt`, the function searches our OpenSearch database to find the most relevant matching image.

2. **Image Data Extraction and Processing**
   Once an image is found, the function gets its S3 file path and retrieves the Base64-encoded version, preparing it for video generation.

3. **Model Input Construction**
   The function combines the retrieved image with the video prompt to create a complete package that tells the AI model what to create.

4. **AWS Bedrock Video Generation**
   Using this input package, the function starts the video generation process through AWS Bedrock's AI models.

5. **Generation Monitoring and Status Tracking**
   The function keeps track of the generation process until the video is complete, providing updates along the way.

If no matching images are found in OpenSearch, the function skips video generation entirely to avoid errors. This design makes the `vrag` function both efficient and reliable, turning what would normally be several separate steps into one smooth operation.


In [5]:
def vrag(image_prompt: str, camera_prompt: str) -> str:
    """
    Retrieves an image from OpenSearch and generates a video based on the given prompts.

    Args:
        image_prompt (str): The prompt used to search for an image in OpenSearch.
        camera_prompt (str): The video generation prompt.

    Returns:
        str: The generated video's S3 URI.
    """
    oss_client = get_oss_client(COLLECTION_ENDPOINT, REGION)
    results = query_open_search(
        bedrock_runtime=bedrock_runtime, 
        oss_client=oss_client, 
        index_name=INDEX_NAME, 
        prompt=image_prompt, 
        top_k=1
    )

    if not results:
        print(f"Warning: No images found for '{image_prompt}', skipping video generation.")
        return None

    # Extract the image from S3
    file_key = "/".join(results[0]['_source']['image_base64_s3_uri'].split('/')[3:])
    image_base64 = read_s3_text(BUCKET, file_key)

    # Build model input for video generation
    model_input = build_model_input(
        camera_prompt, 
        image_base64=image_base64, 
        duration=6, 
        fps=24, 
        resolution="1280x720", 
        seed=0
    )

    # Start video generation
    invocation = start_video_generation(bedrock_runtime, BUCKET, model_input)
    return monitor_video_generation(bedrock_runtime, invocation["invocationArn"])

## Loading and Formatting Prompts

The `load_and_format_prompts` function serves as a component in our automated video generation pipeline. This function is designed to streamline the process of creating diverse and engaging video content by dynamically formatting pre-defined prompt templates. At its core, the function begins by accessing a file named `prompts.txt`. This file contains a collection of carefully crafted prompt templates, each designed to guide the generation of a unique video scene. These templates are not static; instead, they include placeholders that allow for dynamic customization. The two primary placeholders used in our templates are `<object_prompt>` and `<action_prompt>`. These act as variables that can be filled with a wide range of objects and actions, respectively. For example, a template might read: "Show a <object_prompt> <action_prompt> in a bustling city street." This template could then be transformed into numerous specific prompts by filling in different objects and actions.

Once the function has read the templates from the file, it proceeds to the formatting stage. Here, it systematically replaces the placeholders with actual values. These values could be randomly selected from predefined lists, chosen based on user input, or even generated by another AI model. This replacement process breathes life into the templates, turning them into specific, varied instructions for video generation. The beauty of this approach lies in its flexibility and scalability. With a single set of templates, we can generate an almost limitless variety of prompts. This is particularly valuable in scenarios where we need to create multiple unique video scenes without repetitive manual input. After processing all the templates, the function compiles the fully formatted prompts into a list. This list becomes a valuable resource for the subsequent stages of our video generation workflow. Each prompt in the list represents a potential video scene, complete with specific objects and actions to be depicted.

By automating this prompt generation and formatting process, we significantly enhance the efficiency of our video creation pipeline. It allows for rapid iteration and experimentation, enabling us to generate a wide array of video concepts quickly. This approach not only saves time but also opens up possibilities for creating more diverse and creative video content.


In [16]:
def load_and_format_prompts(file_path: str = "prompts.txt", object_prompt: str = "red shoes", action_prompt: str = "Camera rotates counter clockwise in slow motion"):
    """
    Reads prompt templates from a file and replaces placeholders with provided object and action prompts.

    Args:
        file_path (str): Path to the text file containing prompt templates.
        object_prompt (str): Object to be placed in the video (e.g., 'red shoes').
        action_prompt (str): Action to be performed (e.g., 'Camera rotates counter clockwise in slow motion').

    Returns:
        list: List of formatted video prompts.
    """
    formatted_prompts = []
    with open(file_path, "r") as file:
        for line in file:
            prompt_template = line.strip()
            formatted_prompt = (
                prompt_template.replace("<object_prompt>", object_prompt)
                               .replace("<action_prompt>", action_prompt)
            )
            formatted_prompts.append(formatted_prompt)
    return formatted_prompts

In [17]:
# Load prompts from file
formatted_prompts = load_and_format_prompts("prompts.txt", OBJECT_PROMPT, ACTION_PROMPT)

## Processing Multiple Videos

The `process_videos_from_prompts` function orchestrates the creation of multiple videos based on a list of prompts. This function streamlines the video generation pipeline, handling everything from image retrieval to final video production.

Key steps in the function:

1. **Prompt Iteration**
   The function loops through each formatted prompt in the provided list, initiating a separate video generation process for each.

2. **Image Search via OpenSearch**
   For each prompt, the function queries OpenSearch using the `OBJECT_PROMPT`. This step aims to find a relevant image that matches the main subject of the video.

3. **Input Payload Construction**
   Once an appropriate image is found, the function assembles the necessary input data for video generation. This includes the formatted prompt, the retrieved image, and various video parameters.

4. **Video Generation Initiation**
   With the input payload ready, the function kicks off the video generation job through AWS Bedrock.

5. **Progress Monitoring and URI Collection**
   The function keeps track of each video's generation process, collecting the S3 URIs of the completed videos.

An important feature of this function is its error handling: if OpenSearch fails to find a matching image for a particular prompt, that video generation is skipped, allowing the process to continue with the remaining prompts.

The function's code demonstrates its efficiency:

- It uses a single OpenSearch client for all queries, improving performance.
- It extracts the necessary image data from S3 and builds the model input for each video.
- It leverages helper functions like `build_model_input()`, `start_video_generation()`, and `monitor_video_generation()` to modularize the process.
- The function returns a list of tuples, each containing the final video URI and its corresponding formatted prompt, providing a comprehensive output for further use or analysis.

This function exemplifies how complex, multi-step processes can be automated and streamlined, enabling the efficient production of multiple AI-generated videos from a set of prompts.


In [18]:
def process_videos_from_prompts(prompt_list, object_prompt):
    """
    Processes multiple video generations using formatted prompts.

    Args:
        prompt_list (list): List of formatted prompts.
        object_prompt (str): The object to search for in OpenSearch.

    Returns:
        list: List of tuples containing (video URI, formatted prompt).
    """
    video_uris = []
    oss_client = get_oss_client(COLLECTION_ENDPOINT, REGION)

    for i, formatted_prompt in enumerate(prompt_list):
        print(f"Processing video {i+1}/{len(prompt_list)}: {formatted_prompt}")

        # Query OpenSearch for the object only (not the full sentence)
        results = query_open_search(
            bedrock_runtime=bedrock_runtime, 
            oss_client=oss_client, 
            index_name=INDEX_NAME, 
            prompt=object_prompt, 
            top_k=1
        )

        if not results:
            print(f"Warning: No images found for '{object_prompt}', skipping this prompt.")
            continue

        # Extract image from S3
        file_key = "/".join(results[0]['_source']['image_base64_s3_uri'].split('/')[3:])
        image_base64 = read_s3_text(BUCKET, file_key)

        # Build model input
        model_input = build_model_input(
            formatted_prompt, 
            image_base64=image_base64, 
            duration=6, 
            fps=24, 
            resolution="1280x720", 
            seed=0
        )

        # Start video generation
        invocation = start_video_generation(bedrock_runtime, BUCKET, model_input)
        video_uri = monitor_video_generation(bedrock_runtime, invocation["invocationArn"])
        video_uris.append((video_uri, formatted_prompt))

    return video_uris

In [19]:
# Generate multiple videos
video_uris = process_videos_from_prompts(formatted_prompts, OBJECT_PROMPT)

Processing video 1/5: A tight ground-level shot capturing red shoes surrounded by low fog, while Small glowing particles drift in the background behind the shoes..


Video generation job started successfully.
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/ezhmru1lm2b1 is in progress. Started at: 2025-05-02 21:54:30+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/ezhmru1lm2b1 is in progress. Started at: 2025-05-02 21:54:30+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/ezhmru1lm2b1 is in progress. Started at: 2025-05-02 21:54:30+00:00
Video generation completed. Video available at: s3://vragtest2-s3bucket-nydb69qplxtd/ezhmru1lm2b1/output.mp4
Processing video 2/5: A moody wide shot of red shoes positioned near flowing smoke trails, while soft ambient lighting shifts slowly.


Video generation job started successfully.
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/304i5i85egj3 is in progress. Started at: 2025-05-02 21:57:33+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/304i5i85egj3 is in progress. Started at: 2025-05-02 21:57:33+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/304i5i85egj3 is in progress. Started at: 2025-05-02 21:57:33+00:00
Video generation completed. Video available at: s3://vragtest2-s3bucket-nydb69qplxtd/304i5i85egj3/output.mp4
Processing video 3/5: A grounded cinematic shot of red shoes placed in shallow water, while gentle waves reflect moving light above.


Video generation job started successfully.
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/6plaz8psuw96 is in progress. Started at: 2025-05-02 22:00:37+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/6plaz8psuw96 is in progress. Started at: 2025-05-02 22:00:37+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/6plaz8psuw96 is in progress. Started at: 2025-05-02 22:00:37+00:00
Video generation completed. Video available at: s3://vragtest2-s3bucket-nydb69qplxtd/6plaz8psuw96/output.mp4
Processing video 4/5: A tight front-facing composition of red shoes backlit by dancing flames in the distance, while sparks drift nearby.


Video generation job started successfully.
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/acue5j47fllx is in progress. Started at: 2025-05-02 22:03:40+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/acue5j47fllx is in progress. Started at: 2025-05-02 22:03:40+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/acue5j47fllx is in progress. Started at: 2025-05-02 22:03:40+00:00
Video generation completed. Video available at: s3://vragtest2-s3bucket-nydb69qplxtd/acue5j47fllx/output.mp4
Processing video 5/5: A clean cinematic shot of red shoes placed under falling snow, while the environment stays silent and still.


Video generation job started successfully.
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/0r0ennseda8p is in progress. Started at: 2025-05-02 22:06:44+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/0r0ennseda8p is in progress. Started at: 2025-05-02 22:06:44+00:00
Job arn:aws:bedrock:us-east-1:344659571055:async-invoke/0r0ennseda8p is in progress. Started at: 2025-05-02 22:06:44+00:00
Video generation completed. Video available at: s3://vragtest2-s3bucket-nydb69qplxtd/0r0ennseda8p/output.mp4


## Generating and Displaying Videos

The final stage of our video generation pipeline handles the presentation of our created videos within the Jupyter notebook environment. This process ensures that users can immediately view and validate the results of their video generation requests.

Key Steps in the Process:

1. **Video Download Management**
   The function iterates through each video URI in our collection, downloading the generated videos from their S3 storage locations to local paths for viewing.

2. **HTML Display Construction**
   For each successfully downloaded video, the code constructs an HTML segment that includes:
   - A numbered heading with the video's corresponding prompt text
   - A video player with standard playback controls
   - Appropriate sizing (640x360) for comfortable viewing

3. **Error Handling**
   The process includes robust error management:
   - Skips any videos that failed to generate (indicated by missing URIs)
   - Catches and reports any download errors that might occur
   - Continues processing remaining videos even if some fail

The implementation code shows how this is achieved through careful error handling and HTML construction, ensuring a smooth viewing experience within the notebook. Each video is presented with its associated prompt, making it easy to verify that the generation matched the intended instructions.

This final step completes our video generation pipeline, providing immediate visual feedback on the success of our video creation process while maintaining stability through comprehensive error handling.


In [20]:
# Display all generated videos
video_html = ""
for i, (video_uri, formatted_prompt) in enumerate(video_uris):
    if not video_uri:
        print(f"Skipping Video {i+1} due to missing video URI.")
        continue

    local_video_path = os.path.join(OUTPUT_DIR, f"video_{i+1}.mp4")
    
    try:
        s3_client.download_file(BUCKET, "/".join(video_uri.split('/')[3:]), local_video_path)

        video_html += f"""
        <div style="margin-bottom: 20px;">
            <h3>Video {i+1}: {formatted_prompt}</h3>
            <video width="640" height="360" controls>
                <source src="{local_video_path}" type="video/mp4">
                Your browser does not support the video tag.
            </video>
        </div>
        """
    except Exception as e:
        print(f"Error downloading video {i+1}: {e}")

display(HTML(video_html))

<div class="alert alert-success">
<b>🎉 Congratulations!</b> You have successfully completed the video generation pipeline notebook!

Key accomplishments:
- ✅ Combined multiple processes into a streamlined workflow
- ✅ Defined a prompt for VRAG
- ✅ Implemented a single function for VRAG
- ✅ Created dynamic prompt formatting
- ✅ Processed and displayed multiple videos

You have now created a powerful tool capable of transforming simple text inputs into rich, visually compelling video content while leveraging a vast repository of pre-existing visual data.
</div>
