# Amazon Nova 2 Lite Custom Model Deployment

After a custom model is created with a model customization job or import a SageMaker AI-trained custom Amazon Nova model, the next step is to up on-demand inference for the model. With on-demand inference, you only pay for what you use and you don't need to set up provisioned compute resources.

To set up on-demand inference for a custom model, you deploy it with a custom model deployment. After you deploy your custom model, you use the deployment's Amazon Resource Name (ARN) as the modelId parameter when you submit prompts and generate responses with model inference.

## 1. Getting Started

You can customize Amazon Nova models with Amazon Bedrock, or SageMaker AI, or SageMaker Hyperpod. Once this model is customized, it must then be hosted for inference.

A cost effective solution is Bedrock OnDemand Inference (ODI).  ODI allows a custom model to be hosted and made accessbile quickly via Bedrock.

On-demand (OD) inference allows you to run inference on your custom Amazon Nova models without maintaining provisioned throughput endpoints. This helps you optimize costs and scale efficiently. With On-demand inference, you are charged based on usage, measured in tokens, both in and out.

An alternate choices is Provision Throughput on Bedrock, but this solution tends to be more expensive and should only be chosen after careful analysis


This notebook will demonstrate deploying a custom model.

[Customizing Amazon Nova Models](https://docs.aws.amazon.com/nova/latest/userguide/customization.html)

![Nova Customization](./images/NovaCustomization.png)

**Important** <br>To complete this notebook, the SFT notebook must be completed first. In that workbook, a model is trained and relevent artifacts defined.  The model artifact information are carried over for use in this notebook.  Specific items from the SFT notebook, used in this notebook, are called out below.

## 2. Prerequisites and Dependencies


### Dependencies
Several python packages will need to be installed in order to execute this notebook.  Please review the packages in requirements.txt. 

botocore, boto3, sagemaker are required for the training jobs, while the other packages are used to help visualize results.


In [None]:
! pip install -r ./requirements.txt --upgrade

### Prerequisite: SFT Notebook
The SFT notebook walks through training a custom model. As a result of that training, output model artifacts are created.  We will use those output artifacts to host the model in Bedrock. 

**Important** <br>To complete this notebook, the SFT notebook must be completed before this notebook. This notebook will use the artifacts created in the SFT notebook and are called out below.

**--------------- STOP ---------------** <br><br>To complete this notebook, the SFT notebook must be completed first. This notebook will use the artifacts created in the SFT notebook and are called out below.
<br><br>

Either restore or set these values.

In [None]:
# These values are obtained as result of executing the SFT notebook
sm_training_job_name = ""
checkpoint_s3_bucket = ""

%store -r sm_training_job_name 
%store -r checkpoint_s3_bucket 

print(sm_training_job_name)
print(checkpoint_s3_bucket)

Note, we will not use sm_training_job_name in the notebook directly, but a good practice to document this so as to refer back to this job for all details as needed.

### Credentials, Sessions, Roles, and more!

This section sets up the necessary AWS credentials and SageMaker session to run the notebook. You'll need proper IAM permissions to use SageMaker.


If you are going to use Sagemaker in a local environment, you will need access to an IAM Role with the required permissions for Sagemaker. Learn more about it here [AWS Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html).

For more details on other Nova pre-requisites needed check out [AWS Documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/nova-model-general-prerequisites.html)

The code initializes a SageMaker session, sets up the IAM role, and configures the S3 bucket for storing training data artifacts.

In [None]:
import sagemaker
import boto3

sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name='us-east-1'))

try:
    role = sagemaker.get_execution_role()
    
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

print(f"sagemaker role arn: {role}")
print(f"sagemaker session region: {sagemaker_session.boto_region_name}")

## 3. Deployment / Inference

Now that the model has been trained, it is time to deploy that model.  The current options include deployment through Bedrock:
- [On-Demand Inference (ODI)](https://docs.aws.amazon.com/nova/latest/userguide/custom-fine-tune-odi.html)
- [Provisions Throughout (PT)](https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html)


Create the fine-tuned model in Bedrock and deploy to perform inference:
- Create model in Bedrock
- Deploy model in Bedrock
- Evaluate performance

After training and evaluating our model, we want to make it available for inference. Amazon Bedrock provides a serverless endpoint for model deployment, allowing us to serve the model without managing infrastructure.

The Bedrock Custom Model feature of Amazon Bedrock lets us import our fine-tuned model and access it through the same API as other foundation models. This provides:

### 3.1 Deployment - On-Demand Inference on Bedrock

Create the fine-tuned model in Bedrock and deploy to perform inference:
- Create model in Bedrock
- Deploy model in Bedrock
- Evaluate performance

After training and evaluating our model, we want to make it available for inference. Amazon Bedrock provides a serverless endpoint for model deployment, allowing us to serve the model without managing infrastructure.

The Bedrock Custom Model feature of Amazon Bedrock lets us import our fine-tuned model and access it through the same API as other foundation models.

Bedrock offers 2 clients to interact. The "bedrock" client is for the control plane, used for managing models and customization jobs.  The "bedrock-runtime" client is for the data plane.  We will use the "bedrock" client for deploying our model.

In [None]:
# Initialize the Bedrock client
bedrock = boto3.client("bedrock", region_name=sagemaker_session.boto_region_name)

In [None]:
from datetime import datetime

# Define a unique name for imported model
timestamp = datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")[:-3]
imported_model_name = f"imported-model-name-{timestamp}"

print(f"imported_model_name: \n{imported_model_name}")

#### 3.1.1 Create a Custom Model in Bedrock

In [None]:
# Estimated time to complete:  10 minutes

request_params = {
    "modelName": imported_model_name,
    "modelSourceConfig": {"s3DataSource": {"s3Uri": checkpoint_s3_bucket}},
    "roleArn": role,
    "clientRequestToken": "NovaRecipeSageMaker",
}

# Create the model import job
response = bedrock.create_custom_model(**request_params)

model_arn = response["modelArn"]

# Output the model ARN
print(f"Model import job created with ARN: \n{model_arn}")

#### 3.1.2. Monitoring the Create Custom Model status

After initiating the model import, we need to monitor its progress. The status goes through several states:

* CREATING: Model is being imported
* ACTIVE: Import successful
* FAILED: Import encountered errors

This cell polls the Bedrock API every 60 seconds to check the status of the model import, continuing until it reaches a terminal state (ACTIVE or FAILED). Once the import completes successfully, we'll have the model ARN which we can use for inference.

In [None]:
# Estimated time to complete:  10 minutes
import time

from IPython.display import clear_output

# Check CMI job status
while True:
    response = bedrock.list_custom_models(sortBy='CreationTime',sortOrder='Descending')
    model_summaries = response["modelSummaries"]
    status = ""
    for model in model_summaries:
        if model["modelName"] == imported_model_name:
            status = model["modelStatus"].upper()
            model_arn = model["modelArn"]
            print(f'{model["modelStatus"].upper()} {model["modelArn"]} ...')
            if status in ["ACTIVE", "FAILED"]:
                break
    if status in ["ACTIVE", "FAILED"]:
        break
    clear_output(wait=True)
    time.sleep(10)
    
print(f"model_arn: \n{model_arn}")

#### 3.1.3 Deploy the Custom Model in Bedrock using OnDemand Inference (ODI)

In [None]:
def create_custom_model_deployment(bedrock_client, custom_model_arn, imported_model_name, tags = None):
    """Create a custom model deployment
    Args:
        bedrock_client: A boto3 Amazon Bedrock client for making API calls

    Returns:
        str: The ARN of the new custom model deployment

    Raises:
        Exception: If there is an error creating the deployment
    """

    if tags is None or len(tags) == 0:
        tags=[
                {'key': 'Environment', 'value': 'Production'},
                {'key': 'Team', 'value': 'Amazon'},
                {'key': 'Project', 'value': 'SFT-PEFT'}
            ]
    
    try:
        response = bedrock_client.create_custom_model_deployment(
            modelDeploymentName=f"{imported_model_name}-deployment",
            modelArn=custom_model_arn,
            description="Deployment description",
            tags=tags
        )

        deployment_arn = response['customModelDeploymentArn']
        print(f"Deployment created: {deployment_arn}")
        return deployment_arn

    except Exception as e:
        print(f"Error creating deployment: {str(e)}")
        raise

In [None]:
# Estimated time to complete:  2 minutes

deployment_arn = create_custom_model_deployment(bedrock, model_arn, imported_model_name)

print(f"deployment_arn: \n{deployment_arn}")

#### 3.1.4 Monitor the Custom Model Deployment in Bedrock

In [None]:
# Estimated time to complete:  2 minutes
import time

# Check CMD job status
while True:
    response = bedrock.list_custom_model_deployments()
    model_deployment_summaries = response["modelDeploymentSummaries"]
    status = ""
    for deployment in model_deployment_summaries:
        if deployment["modelArn"] == model_arn:
            status = deployment["status"].upper()
            customModelDeploymentName = deployment["customModelDeploymentName"]
            customModelDeploymentArn = deployment["customModelDeploymentArn"]
            print(f'{deployment["status"].upper()} {deployment["modelArn"]} ...')
            if status in ["ACTIVE", "FAILED"]:
                break
    if status in ["ACTIVE", "FAILED"]:
        break

    clear_output(wait=True)
    time.sleep(10)

print(f"customModelDeploymentName: \n{customModelDeploymentName}\n")
print(f"customModelDeploymentArn: \n{customModelDeploymentArn}")


#### 3.1.5 Testing the Deployed Model

Now that our model is deployed to Amazon Bedrock, we can invoke it for inference. We'll set up the necessary clients and functions to interact with our model through the Bedrock Runtime API.

Inference Setup Components:
* Bedrock Runtime Client: AWS SDK client for making inference calls
* Helper Function: To handle retry logic and properly format requests

Applies the proper chat template to user messages
* Handles retry logic for robustness
* Sets appropriate generation parameters like temperature and top-p

This setup allows us to easily test how well our training worked by sending queries to the model and evaluating its responses.

In [None]:
import boto3
from botocore.config import Config

# Initialize Bedrock Runtime client
boto3_session = boto3.Session()

client = boto3_session.client(
    service_name="bedrock-runtime",
    region_name=sagemaker_session.boto_region_name,
    config=Config(
        connect_timeout=300,  # 5 minutes
        read_timeout=300,  # 5 minutes
        retries={"max_attempts": 3},
    ),
)

In [None]:
import sys
import os
import sys


sys.path.append(os.path.join(os.path.dirname(os.getcwd()), 'utils'))

from utils import generate_2_0


system_prompt = f"""
You are a helpful AI assistant that can answer questions and provide information.
"""

messages = [
    {"role": "user", "content": [{"text": "What is the weather in Rome, Italy?"}]},
]

response = generate_2_0(
    client=client,
    model_id=customModelDeploymentArn,
    system_prompt=system_prompt,
    messages=messages,
    temperature=0.1,
    top_p=0.9,
)

response["output"]

#### 3.1.6 Bedrock Console Testing
As the model is now deployed to Bedrock, it will be available in the Chat / Text Playground.

Access the model by going to the Bedrock console, and selecting the Chat / Text Playground.

![Bedrock Chat / Text Playground](./images/playground.png)

Next, click Select model.  In the Select model dialog, choose Custom models - and this will show the list of custom models available on Bedrock.  Select your custom model and then click Apply.

![Select model](./images/select-model.png)

Now, you will be back in the Chat / Text playground and able to send prompts to the custom model!

### 3.2 Deployment - Provisioned Throughput (PT) on Bedrock
This notebook will not cover PT on Bedrock at this time.