# Let us implement an in-vehicle agent to assist drivers based on user manuals

In this notebook, we use multimodal understanding capabilities of Nova models to build a generative AI component which can answer questions of car drivers based on the underlying car manual document for a specific car.
  
### Amazon Nova Models at Glance

Amazon Nova is a new generation of multimodal understanding and creative content generation models that offer state-of-the-art quality, unparalleled customization, and the best price-performance. Amazon Nova models incorporate the same secure-by-design approach as all AWS services, with built-in controls for the safe and responsible use of AI.

Amazon Nova has two categories of models: 
 - **Understanding models** —These models are capable of reasoning over several input modalities, including text, video, and image, and output text. 
- **Creative Content Generation models** —These models generate images or videos based on a text or image prompt.

**Multimodal Understanding Models**
- **Amazon Nova Micro**: Lightening fast, cost-effective text-only model
- **Amazon Nova Lite**: Fastest, most affordable multimodal FM in the industry for its intelligence tier
- **Amazon Nova Pro**:  The fastest, most cost-effective, state-of-the-art multimodal model in the industry

**Creative Content Generation Models**
- **Amazon Nova Canvas**:State-of-the-art image generation model
- **Amazon Nova Reel**:State-of-the-art video generation model


The following workshop will be focused primarily on Amazon Nova Understanding Models. 

**Amazon Nova Multimodal understanding** foundation models (FMs) are a family of models that are capable of reasoning over several input modalities, including text, video, documents and/or images, and output text. You can access these models through the Bedrock Converse API and InvokeModel API.

---

## 1. Setup

**Step 1: Gain Access to the Model**: If you have not yet requested for model access in Bedrock, you do so [request access following these instructions](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html).


---
### 1.1 Install Packages

In this section, we prepare the environment.

In [1]:
%pip install --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

Collecting boto3>=1.28.57
  Using cached boto3-1.36.2-py3-none-any.whl.metadata (6.6 kB)
Collecting awscli>=1.29.57
  Using cached awscli-1.37.2-py3-none-any.whl.metadata (11 kB)
Collecting botocore>=1.31.57
  Using cached botocore-1.36.2-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3>=1.28.57)
  Using cached jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.12.0,>=0.11.0 (from boto3>=1.28.57)
  Using cached s3transfer-0.11.1-py3-none-any.whl.metadata (1.7 kB)
Collecting docutils<0.17,>=0.10 (from awscli>=1.29.57)
  Using cached docutils-0.16-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting PyYAML<6.1,>=3.10 (from awscli>=1.29.57)
  Using cached PyYAML-6.0.2-cp310-cp310-win_amd64.whl.metadata (2.1 kB)
Collecting colorama<0.4.7,>=0.2.5 (from awscli>=1.29.57)
  Using cached colorama-0.4.6-py2.py3-none-any.whl.metadata (17 kB)
Collecting rsa<4.8,>=3.1.2 (from awscli>=1.29.57)
  Using cached rsa-4.7.2-py3-none-any.whl.metadata (3.6

In [2]:
# restart kernel for changes to take effect
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

### 1.2 Create the boto3 client

Interaction with the Bedrock API is done via the AWS SDK for Python: [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html).

#### Use different clients
The boto3 provides different clients for Amazon Bedrock to perform different actions. The actions for [`InvokeModel`](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) and [`InvokeModelWithResponseStream`](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) are supported by Amazon Bedrock Runtime where as other operations, such as [ListFoundationModels](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListFoundationModels.html), are handled via [Amazon Bedrock client](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Operations_Amazon_Bedrock.html).


#### Use the default credential chain

If you are running this notebook from [Amazon Sagemaker Studio](https://aws.amazon.com/sagemaker/studio/) and your Sagemaker Studio [execution role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) has permissions to access Bedrock you can just run the cells below as-is. This is also the case if you are running these notebooks from a computer whose default AWS credentials have access to Bedrock.


In [3]:
from IPython.display import display, Markdown, Latex
import base64
from datetime import datetime
import json
import os
import sys

import boto3

boto3_bedrock_control_plane = boto3.client('bedrock', region_name='us-west-2')
boto3_bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-west-2")

In [4]:
#using us inference profile see: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-support.html
MICRO_MODEL_ID = "us.amazon.nova-micro-v1:0"
LITE_MODEL_ID = "us.amazon.nova-lite-v1:0"
PRO_MODEL_ID = "us.amazon.nova-pro-v1:0"
CLAUDE_SONNET_MODEL_ID="us.anthropic.claude-3-5-sonnet-20241022-v2:0"

#### Validate the connection

We can check the client works by trying out the `list_foundation_models()` method, which will tell us all the models available for us to use 

In [5]:
[model['modelId'] for model in boto3_bedrock_control_plane.list_foundation_models()['modelSummaries'] if 'anthropic' in model['modelId'] or model['modelId'].startswith('amazon.nova')]

['amazon.nova-pro-v1:0',
 'amazon.nova-lite-v1:0',
 'amazon.nova-micro-v1:0',
 'anthropic.claude-3-5-sonnet-20241022-v2:0:18k',
 'anthropic.claude-3-5-sonnet-20241022-v2:0:51k',
 'anthropic.claude-3-5-sonnet-20241022-v2:0:200k',
 'anthropic.claude-3-5-sonnet-20241022-v2:0',
 'anthropic.claude-3-5-haiku-20241022-v1:0',
 'anthropic.claude-instant-v1:2:100k',
 'anthropic.claude-instant-v1',
 'anthropic.claude-v2:0:18k',
 'anthropic.claude-v2:0:100k',
 'anthropic.claude-v2:1:18k',
 'anthropic.claude-v2:1:200k',
 'anthropic.claude-v2:1',
 'anthropic.claude-v2',
 'anthropic.claude-3-sonnet-20240229-v1:0:28k',
 'anthropic.claude-3-sonnet-20240229-v1:0:200k',
 'anthropic.claude-3-sonnet-20240229-v1:0',
 'anthropic.claude-3-haiku-20240307-v1:0:48k',
 'anthropic.claude-3-haiku-20240307-v1:0:200k',
 'anthropic.claude-3-haiku-20240307-v1:0',
 'anthropic.claude-3-opus-20240229-v1:0:12k',
 'anthropic.claude-3-opus-20240229-v1:0:28k',
 'anthropic.claude-3-opus-20240229-v1:0:200k',
 'anthropic.claude-

## 2 Document Understanding [Only Applicable using ConverseAPI)

The Amazon Nova models allow users to include document(s) in the payload through ConverseAPI document support, which can be provided in bytes in the API. 

In [6]:
manual_file="manuals/XC90_owners_manual_MY06_EN_tp8193.pdf"

In [7]:
#display document
from IPython.display import IFrame
IFrame(manual_file, width=600, height=300)

### 2.1 Split documents
As of 21.01.2025, any text documents (csv, xls, xlsx, html, txt, md, or doc) that you include in Nova must not exceed 4.5MB per document. All included media documents, including pdf and docx files, must not exceed 18MB in total. You can include a maximum of 5 documents. We split the documents into 60 pages sub documents. The reduced quality is still more than enough for Nova to interpret. See https://docs.aws.amazon.com/nova/latest/userguide/modalities-document.html for most recent limits.

In [8]:
#Install packages for pdf and image handling
!pip install pypdf



In [9]:
def split_pdf(pdf_path, doc_size_in_pages=50):
    from pypdf import PdfReader, PdfWriter
    import os
    
    output_dir = 'manuals'
    
    # Use absolute path for reading
    pdf = PdfReader(os.path.abspath(pdf_path))
    total_pages = len(pdf.pages)
    
    # Calculate number of documents needed
    num_documents = (total_pages + doc_size_in_pages - 1) // doc_size_in_pages
    
    filenames = []
    
    for doc_num in range(num_documents):
        pdf_writer = PdfWriter()
        start_page = doc_num * doc_size_in_pages
        end_page = min(start_page + doc_size_in_pages, total_pages)
        
        # Add pages for this chunk
        for page_num in range(start_page, end_page):
            pdf_writer.add_page(pdf.pages[page_num])
        
        # Create output filename
        output_filename = f'{output_dir}/{pdf_path.split("/")[-1].split(".")[0]}_{end_page}.pdf'
        
        # Write the PDF
        with open(output_filename, 'wb') as out:
            pdf_writer.write(out)
            print(f'Created: {output_filename}')
            filenames.append(output_filename)
    
    return filenames

In [10]:
doc_size_in_pages=60
splitted_file_names=split_pdf(manual_file, doc_size_in_pages)

Created: manuals/XC90_owners_manual_MY06_EN_tp8193_60.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_120.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_180.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_240.pdf
Created: manuals/XC90_owners_manual_MY06_EN_tp8193_254.pdf


### 2.2 (Optional) Compress documents

You can optionally choose to compress the documents if the document sizes are still above the limit. Note that in this example, compression was not actually necessary as each document is already below the limit.

In [11]:
directory_path="compressed_manuals"
if not os.path.exists(directory_path):
    # Create the directory
        os.makedirs(directory_path)
        print(f"Directory created: {directory_path}")
else:
        print(f"Directory already exists: {directory_path}")

Directory created: compressed_manuals


In [12]:
from pypdf import PdfWriter
import os

def compress_pdf(input_path, output_path, image_quality=80):
    """
    Compress PDF file by reducing image quality and compressing content streams.
    
    Args:
        input_path (str): Path to input PDF file
        output_path (str): Path to save compressed PDF file
        image_quality (int): Quality of images (0-100), default 80
    """
    try:
        # Create PDF writer object
        writer = PdfWriter(clone_from=input_path)
        
        # Compress each page
        for page in writer.pages:
            # Compress images
            for img in page.images:
                img.replace(img.image, quality=image_quality)
            
            # Compress content streams
            page.compress_content_streams()
        
        # Save the compressed PDF
        with open(output_path, "wb") as output_file:
            writer.write(output_file)
            
        # Print compression results
        input_size = os.path.getsize(input_path) / 1024  # KB
        output_size = os.path.getsize(output_path) / 1024  # KB
        reduction = (1 - (output_size / input_size)) * 100
        
        print(f"Original size: {input_size:.2f} KB")
        print(f"Compressed size: {output_size:.2f} KB")
        print(f"Reduction: {reduction:.2f}%")
        
    except Exception as e:
        print(f"Error compressing PDF: {str(e)}")


In [13]:
compressed_file_names=[]
for i in splitted_file_names:
    compress_pdf(i, "compressed_"+i)
    compressed_file_names.append("compressed_"+i)

Original size: 3333.57 KB
Compressed size: 2334.56 KB
Reduction: 29.97%
Original size: 1886.59 KB
Compressed size: 1392.69 KB
Reduction: 26.18%
Original size: 1667.29 KB
Compressed size: 1273.41 KB
Reduction: 23.62%
Original size: 1616.30 KB
Compressed size: 1352.09 KB
Reduction: 16.35%
Original size: 1321.76 KB
Compressed size: 1076.27 KB
Reduction: 18.57%


In [14]:
compressed_file_names

['compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_60.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_120.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_180.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_240.pdf',
 'compressed_manuals/XC90_owners_manual_MY06_EN_tp8193_254.pdf']

### 2.3 Prepare Payload
Now we can prepare the payload to contain all the files. If you use compression, feel free to <code>use compressed_file_names</code> below.

In [15]:
#save files in doc_bytes array
doc_bytes_list = []
for splitted_file_name in splitted_file_names: #use compressed_file_names if any file is bigger than 4.5MB (limits might increase in the future)
    with open(splitted_file_name, "rb") as file:
        doc_bytes_list.append(file.read()) 

In [16]:
#Test if the partial documents are saved correctly
with open('manuals/temp.pdf', "wb") as file:
    file.write(doc_bytes_list[0])
#display temp.pdf file
IFrame("manuals/temp.pdf", width=600, height=300)

In [17]:
def build_pdf_request(doc_bytes_list, text):
    # Create document content list by iterating through doc_bytes in the list
    document_content = [
        {
            "document": {
                "format": "pdf",
                "name": f"DocumentPDFmessages{i}",
                "source": {
                    "bytes": doc_bytes
                }
            }
        }
        for i, doc_bytes in enumerate(doc_bytes_list)
    ]
    
    # Add the text question at the end
    document_content.append({
        "text": text
    })
    
    # Create the final messages structure
    messages = [{
        "role": "user",
        "content": document_content
    }]
    
    return messages

We use a simple prompt for demonstration purposes here. For a detailed guideline check [Amazon Nova - Prompting Understanding Models](https://docs.aws.amazon.com/nova/latest/userguide/prompting.html)

In [18]:
system_prompt=[{"text": "Act as a driving manual assistant. When the user asks a question, answer only based on the documents provided. DO NOT USE INFORMATION THAT IS NOT IN THE GIVEN DOCUMENTS! Document pages are located at the left or right lower corners of each page. Give a reference to the document section."}]

user_prompts= ["Can you switch the control display on/off?",  "How can I call from the memory?", "Where are the side airbags located?","Where is the on/off button of the audio panel located with respect to keypad of audio panel?"]


In [20]:
messages=build_pdf_request(doc_bytes_list, user_prompts[3]) 
inf_params = {"maxTokens": 1024, "topP": 0.1, "temperature": 0}

model_response = boto3_bedrock_runtime.converse(modelId=PRO_MODEL_ID, system=system_prompt,
                                 messages=messages, 
                                 inferenceConfig=inf_params)

print("\n[Response Content Text]")
print(model_response['output']['message']['content'][0]['text'])


[Response Content Text]
The on/off button of the audio panel is located above the keypad of the audio panel. (Page 199)
