# Video Understanding Using Amazon Nova Models

In this notebook we will demonstrate how to use [Amazon Nova](https://aws.amazon.com/ai/generative-ai/nova/) models for the task of video understanding.

To execute the cells in this notebook you need to enable access to the following models on Bedrock:

* Amazon Nova Pro
* Amazon Nova Reel

see [Add or remove access to Amazon Bedrock foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html) to manage the access to models in Amazon Bedrock.

## Understanding Videos with Less Than 25MB Size

With the following cell we are going to analyze a videos contained in our workspace.

We define `analyze_video` function for this.

We need to provide the user prompt and the file name to use.

We will get a textual analysis anwering the user request.

In [1]:
# !pip install boto3 --upgrade # Install the latest version of boto3

In [3]:
import base64
#import boto3
from boto3 import session
import json

from sagemaker import Session, get_execution_role
from sagemaker.s3 import S3Downloader # import S3Downloader



sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


In [95]:
AWS_REGION = "us-east-1"

boto3_session = session.Session(region_name=AWS_REGION)

sm_session = Session(boto3_session)

bedrock_runtime = boto3_session.client("bedrock-runtime")

def analyze_video(file_name:str, user_prompt:str):
    # Open the image you'd like to use and encode it as a Base64 string.
    file_path = f"videos/{file_name}.mov"
    with open(file_path, "rb") as video_file:
        binary_data = video_file.read()
        base_64_encoded_data = base64.b64encode(binary_data)
        base64_string = base_64_encoded_data.decode("utf-8")

    messages = [{
            "role": "user",
            "content": [
                {
                    "video": {
                        "format": "mov",
                        "source": {"bytes": base64_string}}},
                {
                    "text": "You are an expert media analyst. Identify if the bag in the video is broken"}]}]

    # Invoke the model and extract the response body.
    response = bedrock_runtime.invoke_model(modelId='amazon.nova-pro-v1:0',
                                body=json.dumps({
                                    "messages": messages,
                                    "inferenceConfig": {
                                        "max_new_tokens": 300,
                                        "top_p": 0.1,
                                        "temperature": 0.3}}))
    model_response = json.loads(response["body"].read())
    # Pretty print the response JSON.
    #print("[Full Response]")
    print(json.dumps(model_response, indent=2))
    # Print the text content for easy readability.
    content_text = model_response["output"]["message"]["content"][0]["text"]
    #print("\n[Response Content Text]")
    #return content_text


We use the previously defined function in the following cell:

In [96]:
file_name = "damaged_bag"
user_prompt = "You are an expert media analyst. Identify if the bag in the video is broken"
analyze_video(file_name=file_name, user_prompt=user_prompt)

{
  "output": {
    "message": {
      "content": [
        {
          "text": "The bag in the video appears to be broken. There is a noticeable tear or hole in the fabric, which suggests damage."
        }
      ],
      "role": "assistant"
    }
  },
  "stopReason": "end_turn",
  "usage": {
    "inputTokens": 2898,
    "outputTokens": 25,
    "totalTokens": 2923
  }
}


We now use another video and ask the same question:

In [97]:
file_name = "bag"
user_prompt = "You are an expert media analyst. Identify if the bag in the video is broken"
analyze_video(file_name=file_name, user_prompt=user_prompt)

{
  "output": {
    "message": {
      "content": [
        {
          "text": "The bag in the video does not appear to be broken. It is being pulled along the ground without any visible damage or malfunction."
        }
      ],
      "role": "assistant"
    }
  },
  "stopReason": "end_turn",
  "usage": {
    "inputTokens": 2322,
    "outputTokens": 26,
    "totalTokens": 2348
  }
}


## Understanding Videos with More Than 25MB Size
(Maximum size is 1GB)

In the following cell we are defininf a function to analyze a video.

In order to process more than 25MB videos we need to upload them in a S3 bucket.

In the `analyze_video_from_s3` function, we need to provide the S3 URI, the account id owner of this bucket and the user prompt.



In [98]:
def analyze_video_from_s3(s3_uri,bucket_owner,user_prompt):
    message_list = [
        {
            "role": "user",
            "content": [
                {
                    "video": {
                        "format": "mov",
                        "source": {
                            "s3Location": {
                                "uri": s3_uri, 
                                #"bucketOwner": bucket_owner
                            }
                        }
                    }
                },
                {
                    "text": user_prompt
                }
            ]
        }
    ]
    # Configure the inference parameters.
    inf_params = {"max_new_tokens": 300, "top_p": 0.1, "top_k": 20, "temperature": 0.3}

    native_request = {
        "schemaVersion": "messages-v1",
        "messages": message_list,
        "system": system_list,
        "inferenceConfig": inf_params,
    }
    # Invoke the model and extract the response body.
    response = client.invoke_model(modelId=MODEL_ID, body=json.dumps(native_request))
    model_response = json.loads(response["body"].read())
    # Pretty print the response JSON.
    #print("[Full Response]")
    print(json.dumps(model_response, indent=2))
    # Print the text content for easy readability.
    content_text = model_response["output"]["message"]["content"][0]["text"]
    #print("\n[Response Content Text]")
    #print(content_text)

In the following cell we are using the previously defined function:

In [99]:
s3_uri = "s3://bedrock-video-generation-us-east-1-y5s9fj/video_understanding/nfl.mov"
bucket_owner = "912212378130"
user_prompt = "You are an expert media analyst. Summarize what is happening in the video"
analyze_video_from_s3(s3_uri=s3_uri, bucket_owner=bucket_owner, user_prompt=user_prompt)

{
  "output": {
    "message": {
      "content": [
        {
          "text": "The video depicts a football game in progress. Initially, two teams are lined up on the field, preparing for a play. The players are in their respective positions, with the offensive team ready to execute a play. The quarterback takes the snap and throws a pass, which is caught by a receiver. The receiver is then tackled by a defender, leading to a change in possession. The teams reset their formations, and the play continues with the new offensive team taking over. The video captures the dynamic and strategic nature of football, highlighting the coordination and skill required by the players."
        }
      ],
      "role": "assistant"
    }
  },
  "stopReason": "end_turn",
  "usage": {
    "inputTokens": 3490,
    "outputTokens": 115,
    "totalTokens": 3605
  }
}
