# Step 3: Video Generation with Text and Image

## Image-Conditioned Video Generation with AWS Bedrock

### Introduction

Welcome to this comprehensive notebook on Image-Conditioned Video Generation using AWS Bedrock's `nova-reel` model. This notebook demonstrates how to generate videos that are influenced by both textual descriptions and reference images, providing greater control over the visual style and content of the generated output.

Image-conditioned video generation represents an advanced application of generative AI, combining the capabilities of text-to-video and image-to-video generation. By providing both a text prompt and a reference image, we can guide the model to create videos that maintain visual consistency with the reference image while incorporating the actions or transformations described in the text.

This approach offers several advantages over text-only video generation:

* **Visual Consistency**: The generated video maintains the visual style, colors, and composition of the reference image
* **Subject Preservation**: Specific subjects or objects from the reference image can be preserved in the video
* **Style Control**: The aesthetic qualities of the reference image influence the generated video
* **Scene Extension**: The video can extend or animate a static scene captured in the reference image

In this notebook, we'll build upon the text-to-video generation capabilities explored in the previous notebook, adding the ability to condition the generation process with a reference image.

## Table of Contents

1. [Setup and Dependencies](#Setup-and-Dependencies)
2. [Configuration and Initialization](#Configuration-and-Initialization)
3. [Preparing the Reference Image](#Preparing-the-Reference-Image)
4. [Building the Model Input](#Building-the-Model-Input)
5. [Generating the Video](#Generating-the-Video)
6. [Results and Next Steps](#Results-and-Next-Steps)

***

## Setup and Dependencies

First, we'll import the necessary libraries and modules for image-conditioned video generation. We'll also import functions from our previous notebooks to avoid code duplication.

In [None]:
# Import required libraries
import boto3
import json
import sagemaker
import time
from pathlib import Path
import base64
from IPython.display import Video, Image
import botocore.exceptions
import nbimporter
from _00_image_processing import resize_and_encode
from _02_video_gen_text_only import monitor_video_generation, download_and_display_video, start_video_generation, monitor_video_generation, \
monitor_video_generation

## Configuration and Initialization

Next, we'll retrieve our stored variables from previous notebooks and initialize our AWS clients:

In [None]:
%store -r OUTPUT_DIR
%store -r BUCKET

Let's define our text prompt and the path to our reference image:

In [None]:
PROMPT = "camera tilt up from the road to the sky"
INPUT_IMAGE_PATH = "images/road.jpg"

Now we'll initialize the AWS clients we'll need for video generation:

In [None]:
# Initialize AWS clients
bedrock_runtime = boto3.client("bedrock-runtime")
s3_client = boto3.client("s3")
execution_role = sagemaker.get_execution_role()

In [None]:
print(f"Execution Role: {execution_role}")

## Preparing the Reference Image

### Displaying the Reference Image

First, let's display the reference image to see what we're working with:

In [None]:
Image(INPUT_IMAGE_PATH)

### Encoding the Reference Image

Next, we need to encode the image to Base64 format for the Bedrock API. We'll use the `resize_and_encode` function from our image processing notebook to ensure the image is properly formatted:

<div class="alert alert-info">
<b>Note:</b> The image is resized to 1280x720 resolution to match the output video dimensions, ensuring visual consistency between the reference image and the generated video.
</div>

In [None]:
# Encode input image
input_image_base64 = resize_and_encode(INPUT_IMAGE_PATH, output_size=(1280, 720))

## Building the Model Input

### Function: `build_model_input`

This function constructs the payload for the Bedrock API, incorporating both the text prompt and the reference image. It's similar to the function in the previous notebook but with modifications to handle the image input.

#### Key Parameters

- `prompt`: The **text prompt** describing the desired scene or action.
- `image_base64`: The **reference image** in Base64 format.
- `duration`: The **length** of the video in seconds.
- `fps`: **Frame rate** of the video.
- `resolution`: **Output video resolution** (default: `1280x720`).
- `seed`: A **random seed** for consistent generation.

#### Functionality

- Constructs the `textToVideoParams` object with both text and image inputs.
- Defines video generation parameters such as duration, FPS, and resolution.
- Returns a complete payload ready for submission to the Bedrock API.

In [None]:
def build_model_input(prompt, image_base64=None, duration=6, fps=24, resolution="1280x720", seed=0):
    """
    Constructs the input payload for the Bedrock video generation model.

    Args:
        prompt (str): Text prompt for video generation.
        image_base64 (str): Base64 encoded image data.
        duration (int): Duration of the video in seconds.
        fps (int): Frames per second for the video.
        resolution (str): Resolution of the video.
        seed (int): Seed for deterministic generation.
    
    Returns:
        dict: The formatted model input payload.
    """
    if image_base64:
        image_parameter = [{"format": "png", "source": {"bytes": image_base64}}]
    else:
        image_parameter = []
    return {
        "taskType": "TEXT_VIDEO",
        "textToVideoParams": {
            "text": prompt,
            "images": image_parameter
        },
        "videoGenerationConfig": {
            "durationSeconds": duration,
            "fps": fps,
            "dimension": resolution,
            "seed": seed
        },
    }

Now we'll build the model input payload using our prompt and reference image:

In [None]:
# Build the model input payload
model_input = build_model_input(PROMPT, input_image_base64)

## Generating the Video

Now we'll use the functions imported from the previous notebook to generate our video. The process involves three main steps:

1. **Starting the Video Generation Job**: Submitting our payload to the Bedrock API
2. **Monitoring the Job Status**: Waiting for the video generation to complete
3. **Downloading and Displaying the Result**: Retrieving the video from S3 and displaying it in the notebook

<div class="alert alert-warning">
<b>Important:</b> The following cells will start the video generation process, which may take several minutes to complete. Make sure you're using the conda_python3 environment to avoid any issues with the Bedrock API.
</div>

In [None]:
# Start video generation / If start_async_invoke fails here make sure conda_python3 env is selected in the notebook.
invocation = start_video_generation(bedrock_runtime, BUCKET, model_input)

Now we'll monitor the job status until it completes:

In [None]:
# Track job status
video_uri = monitor_video_generation(bedrock_runtime, invocation["invocationArn"])

Finally, we'll download and display the generated video:

In [None]:
# Download and display the video
download_and_display_video(s3_client, BUCKET, video_uri, OUTPUT_DIR)

## Results and Next Steps

<div class="alert alert-success">
<b>🎉 Congratulations!</b> You have successfully completed the image-conditioned video generation notebook!

Key accomplishments:
- ✅ Prepared a reference image for video generation
- ✅ Combined text and image inputs in the model payload
- ✅ Generated a video that maintains visual consistency with the reference image
- ✅ Downloaded and displayed the generated video
</div>

### Comparing Text-Only vs. Image-Conditioned Generation

The key difference between this notebook and the previous one is the addition of a reference image to guide the video generation process. This provides several advantages:

1. **Visual Consistency**: The generated video maintains the visual style and elements of the reference image
2. **Greater Control**: The combination of text and image inputs allows for more precise control over the generated content
3. **Subject Preservation**: Specific subjects or objects from the reference image are preserved in the video

### Next Steps

In the next notebooks, we'll explore more advanced video generation techniques, including:

- Multi-step video generation for more complex sequences
- Inpainting techniques for modifying specific regions of videos
- Combining multiple techniques for advanced video creation

Proceed to the next notebook to continue exploring AWS Bedrock's video generation capabilities.