# Let us implement an in-vehicle agent to assist drivers based on user manuals

Amazon Nova is a new generation of multimodal understanding and creative content generation models that offer state-of-the-art quality, unparalleled customization, and the best price-performance. Amazon Nova models incorporate the same secure-by-design approach as all AWS services, with built-in controls for the safe and responsible use of AI.

Amazon Nova has two categories of models: 
 - **Understanding models** —These models are capable of reasoning over several input modalities, including text, video, and image, and output text. 
- **Creative Content Generation models** —These models generate images or videos based on a text or image prompt.
  
### Amazon Nova Models at Glance

**Multimodal Understanding Models**
- **Amazon Nova Micro**: Lightening fast, cost-effective text-only model
- **Amazon Nova Lite**: Fastest, most affordable multimodal FM in the industry for its intelligence tier
- **Amazon Nova Pro**:  The fastest, most cost-effective, state-of-the-art multimodal model in the industry

**Creative Content Generation Models**
- **Amazon Nova Canvas**:State-of-the-art image generation model
- **Amazon Nova Reel**:State-of-the-art video generation model


The following workshop will be focused primarily on Amazon Nova Understanding Models. 

**Amazon Nova Multimodal understanding** foundation models (FMs) are a family of models that are capable of reasoning over several input modalities, including text, video, documents and/or images, and output text. You can access these models through the Bedrock Converse API and InvokeModel API.

---

## 1. Setup

**Step 1: Gain Access to the Model**: If you have not yet requested for model access in Bedrock, you do so [request access following these instructions](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html).


---
## 2 When to Use What?

## 2.1 When to Use Amazon Nova Micro Model

Amazon Nova Micro (Text Input Only) is the fastest and most affordable option, optimized for large-scale, latency-sensitive deployments like conversational interfaces, chats, and high-volume tasks, such as classification, routing, entity extraction, and document summarization.

## 2.2 When to Use Amazon Nova Lite Model

Amazon Nova Lite balances intelligence, latency, and cost-effectiveness. It’s optimized for complex scenarios where low latency (minimal delay) is crucial, such as interactive agents that need to orchestrate multiple tool calls simultaneously. Amazon Nova Lite supports image, video, and text inputs and outputs text. 

## 2.3 When to Use Amazon Nova Pro Model
Amazon Nova Pro is designed for highly complex use cases requiring advanced reasoning, creativity, and code generation. Amazon Nova pro supports image, video, and text inputs and outputs text. 

---

## Prerequisites

Run the cells in this section to install the packages needed by the notebooks in this workshop. ⚠️ You will see pip dependency errors, you can safely ignore these errors. ⚠️

_IGNORE ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts._


In [24]:
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

Collecting boto3>=1.28.57
  Using cached boto3-1.36.0-py3-none-any.whl.metadata (6.6 kB)
Collecting awscli>=1.29.57
  Using cached awscli-1.37.0-py3-none-any.whl.metadata (11 kB)
Collecting botocore>=1.31.57
  Using cached botocore-1.36.0-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3>=1.28.57)
  Using cached jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.12.0,>=0.11.0 (from boto3>=1.28.57)
  Using cached s3transfer-0.11.0-py3-none-any.whl.metadata (1.7 kB)
Collecting docutils<0.17,>=0.10 (from awscli>=1.29.57)
  Using cached docutils-0.16-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting PyYAML<6.1,>=3.10 (from awscli>=1.29.57)
  Using cached PyYAML-6.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.1 kB)
Collecting colorama<0.4.7,>=0.2.5 (from awscli>=1.29.57)
  Using cached colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting rsa<4.8,>=3.1.2 (from awscli>=1.29.57)
  Using cached rsa-4.7

In [25]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

## Create the boto3 client

Interaction with the Bedrock API is done via the AWS SDK for Python: [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html).

#### Use different clients
The boto3 provides different clients for Amazon Bedrock to perform different actions. The actions for [`InvokeModel`](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [`InvokeModelWithResponseStream`](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) are supported by Amazon Bedrock Runtime where as other operations, such as [ListFoundationModels](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListFoundationModels.html), are handled via [Amazon Bedrock client](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock.html).


#### Use the default credential chain

If you are running this notebook from [Amazon Sagemaker Studio](https://aws.amazon.com/sagemaker/studio/) and your Sagemaker Studio [execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) has permissions to access Bedrock you can just run the cells below as-is. This is also the case if you are running these notebooks from a computer whose default AWS credentials have access to Bedrock.


In [26]:
from IPython.display import display, Markdown, Latex
import base64
from datetime import datetime
import json
import os
import sys

import boto3

In [27]:
boto3_bedrock_control_plane = boto3.client('bedrock', region_name='us-west-2')
boto3_bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

In [28]:
#using us inference profile see: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html
MICRO_MODEL_ID = "us.amazon.nova-micro-v1:0"
LITE_MODEL_ID = "us.amazon.nova-lite-v1:0"
PRO_MODEL_ID = "us.amazon.nova-pro-v1:0"
CLAUDE_SONNET_MODEL_ID="us.anthropic.claude-3-5-sonnet-20241022-v2:0"

#### Validate the connection

We can check the client works by trying out the `list_foundation_models()` method, which will tell us all the models available for us to use 

In [29]:
[model['modelId'] for model in boto3_bedrock_control_plane.list_foundation_models()['modelSummaries'] if 'anthropic' in model['modelId'] or model['modelId'].startswith('amazon.nova')]

['amazon.nova-pro-v1:0',
 'amazon.nova-lite-v1:0',
 'amazon.nova-micro-v1:0',
 'anthropic.claude-3-5-sonnet-20241022-v2:0:18k',
 'anthropic.claude-3-5-sonnet-20241022-v2:0:51k',
 'anthropic.claude-3-5-sonnet-20241022-v2:0:200k',
 'anthropic.claude-3-5-sonnet-20241022-v2:0',
 'anthropic.claude-3-5-haiku-20241022-v1:0',
 'anthropic.claude-instant-v1:2:100k',
 'anthropic.claude-instant-v1',
 'anthropic.claude-v2:0:18k',
 'anthropic.claude-v2:0:100k',
 'anthropic.claude-v2:1:18k',
 'anthropic.claude-v2:1:200k',
 'anthropic.claude-v2:1',
 'anthropic.claude-v2',
 'anthropic.claude-3-sonnet-20240229-v1:0:28k',
 'anthropic.claude-3-sonnet-20240229-v1:0:200k',
 'anthropic.claude-3-sonnet-20240229-v1:0',
 'anthropic.claude-3-haiku-20240307-v1:0:48k',
 'anthropic.claude-3-haiku-20240307-v1:0:200k',
 'anthropic.claude-3-haiku-20240307-v1:0',
 'anthropic.claude-3-opus-20240229-v1:0:12k',
 'anthropic.claude-3-opus-20240229-v1:0:28k',
 'anthropic.claude-3-opus-20240229-v1:0:200k',
 'anthropic.claude-

### InvokeModel body and output

The invoke_model() method of the Amazon Bedrock runtime client (InvokeModel API) will be the primary method we use for most of our Text Generation and Processing tasks

Although the method is shared, the format of input and output varies depending on the foundation model used - as described below:


```python
{
  "system": [
    {
      "text": string
    }
  ],
  "messages": [
    {
      "role": "user",# first turn should always be the user turn
      "content": [
        {
          "text": string
        },
        {
          "image": {
            "format": "jpeg"| "png" | "gif" | "webp",
            "source": {
              "bytes": "base64EncodedImageDataHere..."#  base64-encoded binary
            }
          }
        },
        {
          "video": {
            "format": "mkv" | "mov" | "mp4" | "webm" | "three_gp" | "flv" | "mpeg" | "mpg" | "wmv",
            "source": {
            # source can be s3 location of base64 bytes based on size of input file. 
               "s3Location": {
                "uri": string, #  example: s3://my-bucket/object-key
                "bucketOwner": string #  (Optional) example: 123456789012)
               }
              "bytes": "base64EncodedImageDataHere..." #  base64-encoded binary
            }
          }
        },
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "text": string # prefilling assistant turn
        }
      ]
    }
  ],
 "inferenceConfig":{ # all Optional
    "max_new_tokens": int, #  greater than 0, equal or less than 5k (default: dynamic*)
    "temperature": float, # greater then 0 and less than 1.0 (default: 0.7)
    "top_p": float, #  greater than 0, equal or less than 1.0 (default: 0.9)
    "top_k": int #  0 or greater (default: 50)
    "stopSequences": [string]
  },
  "toolConfig": { #  all Optional
        "tools": [
                {
                    "toolSpec": {
                        "name": string # menaingful tool name (Max char: 64)
                        "description": string # meaningful description of the tool
                        "inputSchema": {
                            "json": { # The JSON schema for the tool. For more information, see JSON Schema Reference
                                "type": "object",
                                "properties": {
                                    <args>: { # arguments 
                                        "type": string, # argument data type
                                        "description": string # meaningful description
                                    }
                                },
                                "required": [
                                    string # args
                                ]
                            }
                        }
                    }
                }
            ],
   "toolChoice": "any" //Amazon Nova models ONLY support tool choice of "any"
        }
    }
}
```

The following are required parameters.

* `system` – (Optional) The system prompt for the request.
    A system prompt is a way of providing context and instructions to Amazon Nova, such as specifying a particular goal or role.
* `messages` – (Required) The input messages.
    * `role` – The role of the conversation turn. Valid values are user and assistant. 
    * `content` – (required) The content of the conversation turn.
        * `type` – (required) The type of the content. Valid values are image, text. , video
            * if chosen text (text content)
                * `text` - The content of the conversation turn. 
            * If chosen Image (image content)
                * `source` – (required) The base64 encoded image bytes for the image.
                * `format` – (required) The type of the image. You can specify the following image formats. 
                    * `jpeg`
                    * `png`
                    * `webp`
                    * `gif`
            * If chosen video: (video content)
                * `source` – (required) The base64 encoded image bytes for the video or S3 URI and bucket owner as shown in the above schema
                * `format` – (required) The type of the video. You can specify the following video formats. 
                    * `mkv`
                    *  `mov`  
                    *  `mp4`
                    *  `webm`
                    *  `three_gp`
                    *  `flv`  
                    *  `mpeg`  
                    *  `mpg`
                    *  `wmv`
* `inferenceConfig`: These are inference config values that can be passed in inference.
    * `max_new_tokens` – (Optional) The maximum number of tokens to generate before stopping.
        Note that Amazon Nova models might stop generating tokens before reaching the value of max_tokens. Maximum New Tokens value allowed is 5K.
    * `temperature` – (Optional) The amount of randomness injected into the response.
    * `top_p` – (Optional) Use nucleus sampling. Amazon Nova computes the cumulative distribution over all the options for each subsequent token in decreasing probability order and cuts it off once it reaches a particular probability specified by top_p. You should alter either temperature or top_p, but not both.
    * `top_k` – (Optional) Only sample from the top K options for each subsequent token. Use top_k to remove long tail low probability responses.
    * `stopSequences` – (Optional) Array of strings containing step sequences. If the model generates any of those strings, generation will stop and response is returned up until that point. 
    * `toolConfig` – (Optional) JSON object following ToolConfig schema,  containing the tool specification and tool choice. This schema is the same followed by the Converse API




## 3 Document Understanding [Only Applicable using ConverseAPI)

The Amazon Nova models allow users to include document(s) in the payload through ConverseAPI document support, which can be provided in bytes in the API. 

In [30]:
manual_file="manuals/XC90_owners_manual_MY06_EN_tp8193.pdf"

In [31]:
from IPython.display import IFrame
IFrame(manual_file, width=600, height=300)

### 3.1 Split & Compress documents
Any text documents (csv, xls, xlsx, html, txt, md, or doc) that you include in Nova must not exceed 4.5MB per document. All included media documents, including pdf and docx files, must not exceed 18MB in total. You can include a maximum of 5 documents. We split the documents into 50 pages sub documents and compress them removing annotations and reducing image quality. The reduced quality is still more than enough for Nova to interpret.

In [32]:
!pip install PyPDF2



In [33]:
def split_pdf(pdf_path, doc_size_in_pages=50):
    from PyPDF2 import PdfReader, PdfWriter
    import os
    
    # Create output directory if it doesn't exist
    output_dir = 'manuals'
    
    # Use absolute path for reading
    pdf = PdfReader(os.path.abspath(pdf_path))
    total_pages = len(pdf.pages)
    
    # Calculate number of documents needed
    num_documents = (total_pages + doc_size_in_pages - 1) // doc_size_in_pages
    
    filenames = []
    
    for doc_num in range(num_documents):
        pdf_writer = PdfWriter()
        start_page = doc_num * doc_size_in_pages
        end_page = min(start_page + doc_size_in_pages, total_pages)
        
        # Add pages for this chunk
        for page_num in range(start_page, end_page):
            pdf_writer.add_page(pdf.pages[page_num])
        
        # Create output filename
        output_filename = f'{output_dir}/{pdf_path.split("/")[-1].split(".")[0]}_{end_page}.pdf'
        
        # Write the PDF
        with open(output_filename, 'wb') as out:
            pdf_writer.write(out)
            print(f'Created: {output_filename}')
            filenames.append(output_filename)
    
    return filenames

In [34]:
doc_size_in_pages=60
splitted_file_names=split_pdf(manual_file, doc_size_in_pages)

Created: manuals/XC90_owners_manual_MY06_EN_tp8193_60.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_120.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_180.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_240.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_254.pdf


In [44]:
directory_path="compressed_manuals"
if not os.path.exists(directory_path):
    # Create the directory
       os.makedirs(directory_path)
        print(f"Directory created: {directory_path}")
else:
        print(f"Directory already exists: {directory_path}")

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 6)

In [35]:
from PIL import Image
import io

def remove_annotations(page):
    if "/Annots" in page:
        del page["/Annots"]
    return page

def compress_images_in_pdf(page):
    if "/Resources" in page and "/XObject" in page["/Resources"]:
        xObject = page["/Resources"]["/XObject"].get_object()
        
        for obj in xObject:
            if xObject[obj]["/Subtype"] == "/Image":
                size = (xObject[obj]["/Width"], xObject[obj]["/Height"])
                data = xObject[obj].get_data()
                
                if xObject[obj]["/ColorSpace"] == "/DeviceRGB":
                    mode = "RGB"
                else:
                    mode = "P"
                
                if "/Filter" in xObject[obj]:
                    if xObject[obj]["/Filter"] == "/FlateDecode":
                        try:
                            img = Image.frombytes(mode, size, data)
                            # Convert P mode to RGB before saving as JPEG
                            if img.mode == 'P':
                                img = img.convert('RGB')
                            
                            img_compressed = io.BytesIO()
                            img.save(img_compressed, format="JPEG", quality=50, optimize=True)
                            xObject[obj]._data = img_compressed.getvalue()
                        except Exception as e:
                            print(f"Warning: Could not compress image: {str(e)}")
                            continue


def deep_compress_pdf(input_file_path, output_file_path):
    reader = PdfReader(input_file_path)
    writer = PdfWriter()

    for page in reader.pages:
        # Remove annotations
        page = remove_annotations(page)
        # Compress images
        compress_images_in_pdf(page)
        writer.add_page(page)

    writer._compress = True
    
    with open(output_file_path, "wb") as output_file:
        writer.write(output_file)


In [36]:
from PyPDF2 import PdfReader, PdfWriter
import os
compressed_file_names=[]
for i in splitted_file_names:
    deep_compress_pdf(i, "compressed_"+i)
    compressed_file_names.append("compressed_"+i)
    

In [37]:
compressed_file_names

['compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_60.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_120.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_180.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_240.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_254.pdf']

### 4.2 Prepare Payload
Now we can prepare the payload to contain all the compressed files.

In [38]:
#save files in doc_bytes array
doc_bytes_list = []
for splitted_file_name in splitted_file_names[:5]: #use compressed_file_names if any file is bigger than 4.5MB (limits might increase in the future)
    with open(splitted_file_name, "rb") as file:
        doc_bytes_list.append(file.read()) 

In [39]:
#Test if the partial documents are saved correctly
with open('manuals/temp.pdf', "wb") as file:
    file.write(doc_bytes_list[0])
#display temp.pdf file
IFrame("manuals/temp.pdf", width=600, height=300)

In [40]:
def build_pdf_request(doc_bytes_list, text):
    # Create document content list by iterating through doc_bytes in the list
    document_content = [
        {
            "document": {
                "format": "pdf",
                "name": f"DocumentPDFmessages{i}",
                "source": {
                    "bytes": doc_bytes
                }
            }
        }
        for i, doc_bytes in enumerate(doc_bytes_list)
    ]
    
    # Add the text question at the end
    document_content.append({
        "text": text
    })
    
    # Create the final messages structure
    messages = [{
        "role": "user",
        "content": document_content
    }]
    
    return messages

In [41]:
system_prompt=[{"text": "Act as a driving manual assistant. When the user asks a question, answer only based on the documents provided. Document pages are located at the left or right lower corners of each page. Give a reference to do document section."}]

user_prompts= ["Can you switch the control display on/off?",  "How can I call from the memory?", "Where are the side airbags located?","Where is the on/off button of the audio panel located with respect to keypad of audio panel?"]


In [42]:
messages=build_pdf_request(doc_bytes_list, user_prompts[3]) 
inf_params = {"maxTokens": 1024, "topP": 0.1, "temperature": 0}

model_response = boto3_bedrock_runtime.converse(modelId=PRO_MODEL_ID, system=system_prompt,
                                 messages=messages, 
                                 inferenceConfig=inf_params)

print("\n[Response Content Text]")
print(model_response['output']['message']['content'][0]['text'])


[Response Content Text]
The on/off button of the audio panel is located above the keypad of the audio panel. (Page 199)
