# 01_Getting Started with Amazon Nova Models

Amazon Nova is a new generation of multimodal understanding and creative content generation models that offer state-of-the-art quality, unparalleled customization, and the best price-performance. Amazon Nova models incorporate the same secure-by-design approach as all AWS services, with built-in controls for the safe and responsible use of AI.

Amazon Nova has two categories of models: 
 - **Understanding models** —These models are capable of reasoning over several input modalities, including text, video, and image, and output text. 
- **Creative Content Generation models** —These models generate images or videos based on a text or image prompt.
  
### Amazon Nova Models at Glance
![media/model_intro.png](media/model_intro.png)

**Multimodal Understanding Models**
- **Amazon Nova Micro**: Lightening fast, cost-effective text-only model
- **Amazon Nova Lite**: Fast, affordable multimodal FM for general intelligence tasks
- **Amazon Nova 2.0 Lite**: Fastest, agentic-forward reasoning FM in the industry for its intelligence tier
- **Amazon Nova Pro**:  The fastest, most cost-effective, state-of-the-art multimodal model in the industry
- **Amazon Nova Premier**:  Most capable multimodal model for complex tasks and the best teacher for distilling custom models for cost-effective applications. 

**Creative Content Generation Models**
- **Amazon Nova Canvas**:State-of-the-art image generation model
- **Amazon Nova Reel**:State-of-the-art video generation model


The following notebooks will be focused primarily on Amazon Nova Understanding Models. 

**Amazon Nova Multimodal understanding** foundation models (FMs) are a family of models that are capable of reasoning over several input modalities, including text, video, documents and/or images, and output text. You can access these models through the Bedrock Converse API and InvokeModel API.


## 2 When to Use What?

### 2.1 When to Use Amazon Nova Micro 1.0 Model

Amazon Nova Micro (Text Input Only) is the fastest and most affordable option, optimized for large-scale, latency-sensitive deployments like conversational interfaces, chats, and high-volume tasks, such as classification, routing, entity extraction, and document summarization.

### 2.2 When to Use Amazon Nova Lite 2.0 Model

Amazon Nova Lite balances intelligence, latency, and cost-effectiveness. It’s optimized for complex scenarios where low latency (minimal delay) is crucial, such as interactive agents that need to orchestrate multiple tool calls simultaneously. Amazon Nova Lite 2 supports larger context window, image understanding, video understanding, and text inputs and outputs. 

### 2.3 When to Use Amazon Nova Pro 1.0 Model
Amazon Nova Pro is designed for highly complex use cases requiring advanced reasoning, creativity, and code generation. Amazon Nova pro supports image, video, and text inputs and outputs text. 


### 2.4 When to Use Amazon Nova Premier 1.0 Model
Nova Premier is our best model for complex tasks like software development, multi-step function calling, and orchestrating multi-agent workflows. It is also our most capable teacher model and can be used with Amazon Bedrock Model Distillation to create custom distilled models for specific needs


---

**In this notebook, we will explore Nova lite 2.0 model**

### Prerequisites

Run the cells in this section to install the required packages. ⚠️ You may see pip dependency errors, you can safely ignore these errors. ⚠️

_IGNORE ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts._


In [None]:
%pip install -q --force-reinstall \
    "botocore>=1.40.26" \
    "awscli>=1.29.57" \
    "requests" \
    "boto3>=1.40.26"

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [None]:
import boto3
import json
import base64
from datetime import datetime
from botocore.config import Config
from pprint import pprint

LITE_MODEL_ID = "us.amazon.nova-2-lite-v1:0"

# Create a Bedrock Runtime client
client = boto3.client("bedrock-runtime", 
                      region_name="us-east-1", 
                      config=Config(read_timeout=10000))


### InvokeModel body and output

The invoke_model() method of the Amazon Bedrock runtime client (InvokeModel API) will be the primary method we use for most of our Text Generation and Processing tasks

Although the method is shared, the format of input and output varies depending on the foundation model used - as described below:


```python
{
  "system": [
    {
      "text": string
    }
  ],
  "messages": [
    {
      "role": "user",# first turn should always be the user turn
      "content": [
        {
          "text": string
        },
        {
          "image": {
            "format": "jpeg"| "png" | "gif" | "webp",
            "source": {
            # source can be s3 location or base64 bytes based on size of input file. 
               "s3Location": {
                "uri": string, #  example: s3://my-bucket/object-key
                "bucketOwner": string #  (Optional) example: 123456789012)
               }
              "bytes": "base64EncodedImageDataHere..." #  base64-encoded binary
            }
          }
        },
        {
          "video": {
            "format": "mkv" | "mov" | "mp4" | "webm" | "three_gp" | "flv" | "mpeg" | "mpg" | "wmv",
            "source": {
            # source can be s3 location or base64 bytes based on size of input file. 
               "s3Location": {
                "uri": string, #  example: s3://my-bucket/object-key
                "bucketOwner": string #  (Optional) example: 123456789012)
               }
              "bytes": "base64EncodedImageDataHere..." #  base64-encoded binary
            }
          }
        }]}, 
      {
      "role": "assistant",
      "content": [
        {
          "text": string # prefilling assistant turn
        }
      ]
    }
  ],
 "inferenceConfig":{ # all Optional
    "max_new_tokens": int, #  greater than 0, equal or less than 5k (default: dynamic*)
    "temperature": float, # greater then 0 and less than 1.0 (default: 0.7)
    "top_p": float, #  greater than 0, equal or less than 1.0 (default: 0.9)
    "top_k": int #  0 or greater (default: 50)
    "stopSequences": [string]
 },
  "additionalModelRequestFields": { # This section ONLY applies to Nova Lite 2 model
    "reasoningConfig": {
      "type": "enabled",
      "maxReasoningEffort": "low" # Supported: low, medium and high 
    },
  "toolConfig": { #  all Optional
        "tools": [
                {
                    "toolSpec": {
                        "name": string # menaingful tool name (Max char: 64)
                        "description": string # meaningful description of the tool
                        "inputSchema": {
                            "json": { # The JSON schema for the tool. For more information, see JSON Schema Reference
                                "type": "object",
                                "properties": {
                                    <args>: { # arguments 
                                        "type": string, # argument data type
                                        "description": string # meaningful description
                                    }
                                },
                                "required": [
                                    string # args
                                ]
                            }
                        }
                    }
                }
            ],
   "toolChoice": "auto" | "tool" | "any"
        }
    }
}
```

The following are required parameters.

* `system` – (Optional) The system prompt for the request.
    A system prompt is a way of providing context and instructions to Amazon Nova, such as specifying a particular goal or role.
* `messages` – (Required) The input messages.
    * `role` – The role of the conversation turn. Valid values are user and assistant. 
    * `content` – (required) The content of the conversation turn.
        * `type` – (required) The type of the content. Valid values are image, text. , video
            * if chosen text (text content)
                * `text` - The content of the conversation turn. 
            * If chosen Image (image content)
                * `source` – (required) The base64 encoded image bytes for the video or S3 URI and bucket owner as shown in the above schema
                * `format` – (required) The type of the image. You can specify the following image formats. 
                    * `jpeg`
                    * `png`
                    * `webp`
                    * `gif`
            * If chosen video: (video content)
                * `source` – (required) The base64 encoded image bytes for the video or S3 URI and bucket owner as shown in the above schema
                * `format` – (required) The type of the video. You can specify the following video formats. 
                    * `mkv`
                    *  `mov`  
                    *  `mp4`
                    *  `webm`
                    *  `three_gp`
                    *  `flv`  
                    *  `mpeg`  
                    *  `mpg`
                    *  `wmv`
* `inferenceConfig`: These are inference config values that can be passed in inference.
    * `max_new_tokens` – (Optional) The maximum number of tokens to generate before stopping.
        Note that Amazon Nova models might stop generating tokens before reaching the value of max_tokens. Maximum New Tokens value allowed is 5K.
    * `temperature` – (Optional) The amount of randomness injected into the response.
    * `top_p` – (Optional) Use nucleus sampling. Amazon Nova computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
    * `top_k` – (Optional) Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
    * `stopSequences` – (Optional) Array of strings containing step sequences. If the model generates any of those strings, generation will stop and response is returned up until that point. 
    * `toolConfig` – (Optional) JSON object following ToolConfig schema,  containing the tool specification and tool choice. This schema is the same followed by the Converse API
* `additionalModelRequestFields`: These are additional values should be passed in inference requests for Nova Lite 2 model
    * `reasoningConfig` –  (required) The maximum number of tokens the model can use to reason for a task
    * `type` –  (required) set it to "enabled" 
    * `maxReasoningEffort` –  (required) Allowed values are  or "low", "medium" or "high" 
       Choose “High” for complex tasks and for less complicated tasks “Low” should be selected.




### 3. Text Understanding
The examples below demonstrates text understanding capabilities
Note: Below examples are using Nova Lite for Illustrative Purposes.

In [None]:
# 
# Utility functions to make synchronous and async calls to invoke the Nova models
# 

def sync_nova_2_model_invocation(client, modelId, system_list, message_list, inf_params):
    # Invoke the model synchrously and extract the response body.
    native_request = {
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
    }

    response = client.invoke_model(modelId=modelId, body=json.dumps(native_request))
    request_id = response["ResponseMetadata"]["RequestId"]
    print(f"Request ID: {request_id}")
    # print the output
    response = response['body'].read().decode('utf-8')
    json_output = json.loads(response)
    return json_output



def async_nova_2_model_invocation(client, modelId, system_list, message_list, inf_params):
    
    time_to_first_token = None
    native_request = {
        "messages": message_list,
        "system": system_list,
        "inferenceConfig": inf_params,
        }
    start_time = datetime.now()    
    # Invoke the model with the response stream
    response = client.invoke_model_with_response_stream(modelId=modelId, body=json.dumps(native_request))
    request_id = response.get("ResponseMetadata").get("RequestId")
    print(f"Request ID: {request_id}")
    print("Awaiting first token...")
    chunk_count = 0
    # Process the response stream
    stream = response.get("body")
    if stream:
        for event in stream:
            content = event['chunk']['bytes'].decode('utf-8')
            content_json = json.loads(content)
            if 'messageStart' in content_json:
                pass
            elif 'contentBlockDelta' in content_json:
                content_block_delta = content_json["contentBlockDelta"]
                if content_block_delta:
                    if time_to_first_token is None:
                        time_to_first_token = datetime.now() - start_time
                        print(f"Time to first token: {time_to_first_token}")
                    chunk_count += 1
                    current_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S:%f")
                #    # print(f"{current_time} - ", end="")
                    print(content_block_delta.get("delta").get("text"), end="")
    else:
        print("No response stream received.")
    return chunk_count, time_to_first_token

In [None]:

#
# Utility function to download contents of a web page
# 

import requests

def get_raw_html(url):
    """
    Fetches the raw HTML content of a given URL.

    Args:
        url (str): The URL of the web page.

    Returns:
        str: The raw HTML content of the page, or None if an error occurs.
    """
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for bad status codes
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error fetching URL {url}: {e}")
        return None


#### 3.1 Invoke_model() API Call

The example below demonstrates how to use a text-based prompt with the invoke_model API.

In [None]:
modelId = LITE_MODEL_ID

# Define your system prompt(s).
system_list = [
    { "text": "You are expert in summarizing contents from long text provided as context. You do not reference any external source to answer the question." }
]

letter_text = get_raw_html("https://www.aboutamazon.com/news/company-news/amazon-ceo-andy-jassy-2024-letter-to-shareholders")

# Define one or more messages using the "user" and "assistant" roles.
message_list = [
    {"role": "user", "content": [{"text": f"""list the "new whys" disucssed in the text below
                                             {letter_text}
                                             """}]},
]

# Configure the inference parameters.
inf_params = {
    "maxTokens": 2048, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "low"
    }
} 

json_output = sync_nova_2_model_invocation(client=client, 
                                            modelId=modelId,
                                            system_list=system_list,
                                            message_list=message_list,
                                            inf_params=inf_params)
print(json_output['output']['message']['content'][1]['text'])


#### 3.2 Streaming Invocation API call

The example below demonstrates how to use a text-based prompt with the streaming API.

In [None]:

# Define your system prompt(s).
system_list = [
    { "text": "Act as a creative writing assistant. When the user provides you with a topic, write a short story about that topic in less than 200 words" }
]

# Define one or more messages using the "user" and "assistant" roles.
message_list = [{"role": "user", "content": [{"text": "A camping trip"}]}]


# Configure the inference parameters.    
inf_params = {
    "maxTokens": 1024, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "low"
    }
} 

# Invoke the model with the response stream
total_chunks, time_to_first_token = async_nova_2_model_invocation(client=client,
                                            modelId=modelId, 
                                            system_list=system_list, 
                                            message_list=message_list, 
                                            inf_params=inf_params)
print(f"\n\nTime to first token: {time_to_first_token}")
print(f"\n\nTotal chunks: {total_chunks}")




### 4. Multimodal Understanding 

The following examples show how to pass various media types to the model.
Amazon Nova models allow you to include multiple images in the payload with a limitation of total payload size to not go beyond 25MB. However, you can specify an Amazon S3 URI for image understanding. This approach enables you to leverage the model for larger images as well as multiple images without being constrained by the overall payload size limitation. .Amazon Nova models can analyze the passed images and answer questions, classify an image, as well as summarize images based on provided instructions.


#### 4.1 Image Understanding

Lets see how Nova model does on image understanding use cases. 


![A Sunset Image](media/kites_and_plane.png)

In [None]:


# Open the image you'd like to use and encode it as a Base64 string.
with open("media/kites_and_plane.png", "rb") as image_file:
    binary_data = image_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")

# Define your system prompt(s).
system_list = [
    { "text": "You are an expert artist. When the user provides you with an image, provide 3 potential art titles" }
]

# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {"bytes": base64_string},
                }
            },
            {"text": "Identify the main objects in the image and Provide art titles for this image."},
        ],
    }
]

# Configure the inference parameters.
inf_params = {
    "maxTokens": 1000, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "low"
    }
} 

json_output = sync_nova_2_model_invocation(client=client, 
                                            modelId=modelId,
                                            system_list=system_list,
                                            message_list=message_list,
                                            inf_params=inf_params)
print(json_output['output']['message']['content'][1]['text'])


In [None]:
## Another example of image understanding 

# Open the image you'd like to use and encode it as a Base64 string.
with open("media/nutritional_benifits.png", "rb") as image_file:
    binary_data = image_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")

# Define your system prompt(s).
system_list = [
    { "text": "You are an expert in extracing text from an image" }
]

# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {"bytes": base64_string},
                }
            },
            {"text": "Read the text from the image and list the contents as a json payload"},
        ],
    }
]

# Configure the inference parameters.
inf_params = {
    "maxTokens": 10000, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "medium"
    }
} 

json_output = sync_nova_2_model_invocation(client=client, 
                                            modelId=modelId,
                                            system_list=system_list,
                                            message_list=message_list,
                                            inf_params=inf_params)
print(json_output['output']['message']['content'][1]['text'])

#### 4.2 Multi-Image Understanding

There can be multiple image contents. In this example we ask the model to find what two images have in common:

![](media/kites_and_plane.png)

![](media/kites_and_plane2.png)

In [None]:

# Open the image you'd like to use and encode it as a Base64 string.
with open("media/kites_and_plane.png", "rb") as image_file:
    binary_data = image_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    image1_base64_string = base_64_encoded_data.decode("utf-8")

with open("media/kites_and_plane2.png", "rb") as image_file:
    binary_data = image_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    image2_base64_string = base_64_encoded_data.decode("utf-8")

# Define your system prompt(s).
system_list = [
    { "text": "You are an expert artist and very good at identifying objects in an image," }
]


# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {"bytes": image1_base64_string},
                }
            },
            {
                "image": {
                    "format": "png",
                    "source": {"bytes": image2_base64_string},
                }
            },
            {"text": "What do these two images have in common?"},
        ],
    }
]

# Configure the inference parameters.
inf_params = {
    "maxTokens": 1000, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "low"
    }
} 


json_output = sync_nova_2_model_invocation(client=client, 
                                            modelId=modelId,
                                            system_list=system_list,
                                            message_list=message_list,
                                            inf_params=inf_params)
print(json_output['output']['message']['content'][1]['text'])

#### 4.3 Image Understanding using S3 Path

Replace the S3 URI below with the S3 URI where your image is located

In [None]:

# Define your system prompt(s).
system_list = [
    { 
        "text": "You are an expert artist. When the user provides you with an image, provide 3 potential art titles"
    }
]

# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "image": {
                    "format": "png",
                    "source": {
                        "s3Location": {
                            # Replace the S3 URI
                            "uri": "s3://s3-demo-bucket-nova-2/bluesky.png"
                        }
                    },
                }
            },
            {"text": "Provide 3 potential art titles for this image."},
        ],
    }
]

# Configure the inference parameters.
inf_params = {
    "maxTokens": 1000, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "low"
    }
} 


json_output = sync_nova_2_model_invocation(client=client, 
                                            modelId=modelId,
                                            system_list=system_list,
                                            message_list=message_list,
                                            inf_params=inf_params)
print(json_output['output']['message']['content'][1]['text'])

### 5 Video Understanding

The Amazon Nova models allow you to include a single video in the payload, which can be provided either in base64 format or through an Amazon S3 URI. When using the base64 method, the overall payload size must remain within 25MB. However, you can specify an Amazon S3 URI for video understanding. This approach enables you to leverage the model for longer videos (up to 1GB in size) without being constrained by the overall payload size limitation. Amazon Nova models can analyze the passed video and answer questions, classify a video, and summarize information in the video based on provided instructions.

#### 5.1 Video Understanding using local file Path

In [None]:
from IPython.display import Video

Video("media/ducks_in_pond.mp4")

In [None]:

# Open the image you'd like to use and encode it as a Base64 string.
with open("media/ducks_in_pond.mp4", "rb") as video_file:
    binary_data = video_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")

# Define your system prompt(s).
system_list = [
    { "text": "You are an expert media analyst. When the user provides you with a video, You identify objects of interest in the image" }
]

# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {"bytes": base64_string},
                }
            },
            {"text": "identify and describe the objects of interest in less than 300 words."},
        ],
    }
]

# Configure the inference parameters.
inf_params = {
    "maxTokens": 2048, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "low"
    }
} 

json_output = sync_nova_2_model_invocation(client=client, 
                                            modelId=modelId,
                                            system_list=system_list,
                                            message_list=message_list,
                                            inf_params=inf_params)
print(json_output['output']['message']['content'][1]['text'])

#### 5.2 Video Understanding using S3 Path
Replace the S3 URI below with the S3 URI where your video is located

In [None]:

# Define your system prompt(s).
system_list = [
    { "text": "You are an expert media analyst. When the user provides you with a video, provide 3 potential video titles" }
]

# Define a "user" message including both the image and a text prompt.
message_list = [
    {
        "role": "user",
        "content": [
            {
                "video": {
                    "format": "mp4",
                    "source": {
                        "s3Location": {
                            # Replace the S3 URI
                            "uri": "s3://s3-demo-bucket-nova-2/the-sea.mp4"
                        }
                    },
                }
            },
            {"text": "Provide video titles for this clip."},
        ],
    }
]

# Configure the inference parameters.
inf_params = {
    "maxTokens": 1000, 
    "topP": 0.9, 
    "temperature": 0.7,
    "reasoningConfig": {
        "type": "enabled",
        "maxReasoningEffort": "low"
    }
} 

native_request = {
    "messages": message_list,
    "system": system_list,
    "inferenceConfig": inf_params,
}

json_output = sync_nova_2_model_invocation(client=client, 
                                            modelId=modelId,
                                            system_list=system_list,
                                            message_list=message_list,
                                            inf_params=inf_params)
print(json_output['output']['message']['content'][1]['text'])