// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: MIT-0

# Inference with Customized Amazon Nova Models 

This notebook walk-through how to conduct inference on fine-tuned Amazon Nova models. We first demonstrate a single example followed by example scripts for running batch inference.

# Prerequisites

- Make sure you have executed 01_Amazon_Nova_Finetuning_Walkthrough.ipynb notebook.
- Make sure you are using the same kernel and instance as 01_Amazon_Nova_Finetuning_Walkthrough.ipynb notebook.

In [None]:
!pip install -qU -r requirements.txt

In [13]:
# restart kernel for packages to take effect
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

# Setup

In [45]:
import boto3 
from botocore.config import Config
import sys
import pandas as pd
import matplotlib.pyplot as plt
import json
import time 
import concurrent.futures
import shortuuid
import tqdm
import os

In [46]:
my_config = Config(
    region_name = 'us-east-1', 
    signature_version = 'v4',
    retries = {
        'max_attempts': 5,
        'mode': 'standard'
    })

bedrock = boto3.client(service_name="bedrock", config=my_config)

# Construct model input 

Before invoking the customized models, we need to construct model input following the format needed by Amazon Nova models.

In [42]:
# API setting constants
API_MAX_RETRY = 16
API_RETRY_SLEEP = 10
API_ERROR_OUTPUT = "$ERROR$"


def create_nova_messages(prompt):
    """
    Create messages array for Amazon Nova models from conversation

    Args:
    conv (object): Conversation object containing messages

    Returns:
    list: List of formatted messages for Amazon Nova model
    """
    messages = []
    
    messages.append({
            "role": "user",
            "content": [{"text": prompt}]
        })

    return messages

def chat_completion_aws_bedrock_nova(model, conv, temperature, max_tokens, aws_region="us-east-1"):
    """
    Call AWS Bedrock API for chat completion using Amazon Nova models

    Args:
    model (str): Model ID
    conv (object): Conversation object containing messages
    temperature (float): Temperature parameter for response generation
    max_tokens (int): Maximum tokens in response
    api_dict (dict, optional): API configuration dictionary
    aws_region (str, optional): AWS region, defaults to "us-west-2"

    Returns:
    str: Generated response text or error message
    """

    # Configure AWS client 
    bedrock_rt_client = boto3.client(
            service_name='bedrock-runtime',
            region_name=aws_region,
        )

    
    # Retry logic for API calls
    for _ in range(API_MAX_RETRY):
        try:
            # Create messages from conversation
            messages = create_nova_messages(conv)
            inferenceConfig = {
                "max_new_tokens": max_tokens,
                "temperature": temperature, 
            }

            # Prepare request body
            model_kwargs = {"messages": messages,
                            "inferenceConfig": inferenceConfig}
            body = json.dumps(model_kwargs)

            # Call Bedrock API
            response = bedrock_rt_client.invoke_model(
                body=body,
                modelId=model,
                accept='application/json',
                contentType='application/json'
            )

            # Parse response
            response_body = json.loads(response.get('body').read())
            
            output = response_body['output']['message']['content'][0]['text']
            break

        except Exception as e:
            print(type(e), e)
            ## Uncomment time.sleep if encounter Bedrock invoke throttling error
            # time.sleep(API_RETRY_SLEEP)

    return output

# Inference on customized Amazon Nova model (individual example)

In [51]:
# [Important!] Update `base_model_id` to `provisioned_model_id` based on the previous jupyter notebook
base_model_id = 'amazon.nova-lite-v1:0'
temperature = 0.2
max_tokens = 1024

ques = "What specific details are collected and sent to AWS when anonymous operational metrics are enabled for an Amazon EFS file system?"

print(chat_completion_aws_bedrock_nova(base_model_id, ques, temperature+0.01, max_tokens, aws_region="us-east-1"))      

When anonymous operational metrics are enabled for an Amazon Elastic File System (EFS) file system, AWS collects and sends specific details to Amazon CloudWatch. These metrics are designed to help you monitor the performance and usage of your EFS file systems without requiring you to provide personally identifiable information. Here are the specific details that are collected and sent:

1. **File System-Level Metrics**:
    - **FileSystemSize**: The total size of the file system in bytes.
    - **FreeStorageCapacity**: The amount of available storage capacity in bytes.
    - **BurstingCredits**: The number of bursting credits available for the file system.
    - **BurstBalance**: The current balance of bursting credits.
    - **ThroughputMode**: The throughput mode of the file system (e.g., Bursting, Provisioned).

2. **Network Metrics**:
    - **NetworkThroughput**: The amount of network throughput in bytes per second.
    - **NetworkLatency**: The average latency of network requests 

# Batch inference with customized Amazon Nova model 

In this section, we provide code snippets for efficiently running batch inference using the same `chat_completion_aws_bedrock_nova` function as above.

In [52]:
# Load test cases 
question_file = f"dataset/test_set/question_short.jsonl"

questions = []
with open(question_file, "r", encoding="utf-8") as ques_file:
    for line in ques_file:
        if line:
            questions.append(json.loads(line))

print(questions[0]["turns"])

[' "What specific details are collected and sent to AWS when anonymous operational metrics are enabled for an Amazon EFS file system?', "What's required for a successful AWS CloudFormation launch?"]


In [53]:
# Helper function that helps organize answers from customized Amazon Nova model

def get_answer(
    question: dict, model_id: str, num_choices: int, max_tokens: int, temperature: float, answer_file: str
):

    choices = []

    for i in range(num_choices):
        conv = ""
        turns = []
        
        for j in range(len(question["turns"])):
            conv += question["turns"][j]
            output = chat_completion_aws_bedrock_nova(model_id, conv, temperature+0.01, max_tokens, aws_region="us-east-1")        
            turns.append(output)

        choices.append({"index": i, "turns": turns})

    # Dump answers
    ans = {
        "question_id": question["question_id"],
        "answer_id": shortuuid.uuid(),
        "model_id": model,
        'use_rag': False,
        "choices": choices,
        "tstamp": time.time(),
    }

    os.makedirs(os.path.dirname(answer_file), exist_ok=True)
    with open(answer_file, "a", encoding="utf-8") as f:
        f.write(json.dumps(ans) + "\n")

In [54]:
# Run batch inference and save model output

## [Important!] Update `base_model_id` to `provisioned_model_id` based on the previous jupyter notebook
model_id = 'amazon.nova-lite-v1:0'
num_choices = 1 
max_tokens = 1024
temperature = 0.2
    
answer_file = f"dataset/model_answer/{model_id}_V2.jsonl"
print(f"Output to {answer_file}")

with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    futures = []
    for question in questions:
        future = executor.submit(
            get_answer,
            question,
            model_id,
            num_choices,
            max_tokens,
            temperature,
            answer_file,
        )
        futures.append(future)

    for future in tqdm.tqdm(
        concurrent.futures.as_completed(futures), total=len(futures)
    ):
        future.result()

Output to dataset/model_answer/amazon.nova-lite-v1:0_V2.jsonl


100%|██████████| 10/10 [01:19<00:00,  8.00s/it]


# [Optional] Plot training loss

Optionally, you can also plot training loss using the `step_wise_training_metrics.csv` file generated from the finetuning job. This csv file and other model artifacts can be found under Amazon Bedrock -> Custom model -> Custom model name -> Output data (S3 location)  

In [None]:
def plot_training_loss(input_file, output_file):
    ''' This function plots training loss using the default model output file 'step_wise_training_metrics.csv' generated from the finetuning job'''
    
    # Read the CSV file
    df = pd.read_csv(input_file)
    
    # Create the plot
    plt.figure(figsize=(10, 6))
    plt.plot(df['step_number'], df['training_loss'], 'b-', linewidth=2)
    
    # Customize the plot
    plt.title('Training Loss vs Step Number', fontsize=14)
    plt.xlabel('Step Number', fontsize=12)
    plt.ylabel('Training Loss', fontsize=12)
    plt.grid(True, linestyle='--', alpha=0.7)
    
    # Add some padding to the axes
    plt.margins(x=0.02)
    
    # Save the plot
    plt.savefig(output_file, dpi=300, bbox_inches='tight')
    plt.close()
    
    print(f"Plot saved as {output_file}")


# Example usage

plot_training_loss(input_file = 'model_training_loss/aws-ft-nova-lite/step_wise_training_metrics_epoch5_lr_1e-06.csv', 
                   output_file = 'model_training_loss/aws-ft-nova-lite/training_loss_epoch5_lr_1e-06.png')



# Conclusion

In this and last notebook, we provided a detailed walkthrough on how to fine-tune, host, and conduct inference with customized Amazon Nova through the Amazon Bedrock API. Please refer to the [guidelines](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-guidelines.html) for more tips on fine-tuning Amazon Nova models to meet your need.

# Delete provisioned throughput

<b>Warning</b>: Please make sure to delete providsioned throughput as there will cost incurred if its left in running state, even if you are not using it.

In [None]:
bedrock.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_id)