# Mistral Model Deployment on Inferentia2 using Amazon Sagemaker and Amazon Bedrock

In this notebook, we’ll walk through the complete process of deploying a fine-tuned Mistral model for the insurance domain. The model will be optimized and deployed on Inferentia 2 instances using AWS Neuron with Amazon SageMaker. Additionally, we’ll convert the model to Hugging Face’s `safetensors` format, enabling seamless import into Amazon Bedrock for usage with the Converse API.

By the end of this notebook, you’ll have both models accessible through a custom chatbot application built in Streamlit. This application allows you to interact with the models deployed on both Amazon SageMaker and Amazon Bedrock, providing flexibility and hands-on experience with two deployment environments.

## Section 1: Import Required Libraries

Import necessary libraries to facilitate model conversion, deployment, and API interactions across AWS services.


In [None]:
!pip install -U transformers sagemaker boto3 tiktoken torch blobfile sentencepiece

In [2]:
import os
import boto3
import json
from datetime import datetime
from transformers import AutoModelForCausalLM, AutoTokenizer
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
import sagemaker


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


## Section 2: Export Variables for SageMaker Deployment

Define essential variables for model deployment, including model details, batch size, sequence length, and AWS-specific configurations.


In [3]:
MODEL_ID = "bitext/Mistral-7B-Insurance"
BATCH_SIZE = 4
SEQUENCE_LENGTH = 2048
MAX_TOTAL_TOKENS = 2048  # Set independently for the total token limit
NUM_CORES = 2
HF_MODEL_ID_TO_PUSH = "aboavent/Mistral-7B-Insurance-neuron"
HF_TOKEN = "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"
PRECISION = "fp16"
MODEL_OUTPUT_NAME = "Mistral-7B-Insurance-neuron"
COMPILED_MODEL_OUTPUT_PATH = f"./{MODEL_OUTPUT_NAME}"  # Concatenated path

## Section 3: Hugging Face Authentication

Authenticate with Hugging Face using the provided token to access and upload model resources.


In [None]:
!huggingface-cli login --token $HF_TOKEN


## Section 4: Model Compilation with Optimum CLI

Use `optimum-cli` to export the model to Neuron-compatible format, specifying batch size, sequence length, precision, and number of cores.


In [None]:
!optimum-cli export neuron \
    -m $MODEL_ID \
    --batch_size $BATCH_SIZE \
    --sequence_length $SEQUENCE_LENGTH \
    --num_cores $NUM_CORES \
    --auto_cast_type $PRECISION \
    --trust-remote-code \
    $COMPILED_MODEL_OUTPUT_PATH


## Section 5: Upload Compiled Model to Hugging Face

Create a new repository on Hugging Face (if necessary) and upload the compiled model.


In [None]:
!huggingface-cli repo create $MODEL_OUTPUT_NAME

!huggingface-cli upload $HF_MODEL_ID_TO_PUSH $COMPILED_MODEL_OUTPUT_PATH ./

## Section 6: Define SageMaker Role and Model Configuration

Define the SageMaker role and configure the environment variables for deploying the model on SageMaker.


In [None]:
try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

hub = {
    "HF_MODEL_ID": HF_MODEL_ID_TO_PUSH,
    "HF_NUM_CORES": str(NUM_CORES),
    "HF_SEQUENCE_LENGTH": str(SEQUENCE_LENGTH),
    "HF_AUTO_CAST_TYPE": PRECISION,
    "MAX_BATCH_SIZE": str(BATCH_SIZE),
    "MAX_INPUT_TOKENS": "1800",
    "MAX_TOTAL_TOKENS": str(MAX_TOTAL_TOKENS),
    "HF_TOKEN": HF_TOKEN,
    "MESSAGES_API_ENABLED": "true"
}


## Section 7: Deploy Model to SageMaker Inference

Deploy the compiled model on a `ml.inf2.xlarge` instance in SageMaker.


In [None]:
%%time
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface-neuronx", version="0.0.24"),
    env=hub,
    role=role
)

huggingface_model._is_compiled_model = True
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.inf2.xlarge",
    container_startup_health_check_timeout=2400,
    volume_size=512
)


## Section 8: Test SageMaker Endpoint

Send a sample request to the SageMaker endpoint to verify that the model is deployed and functioning correctly using the HuggingFace API.


In [None]:
def create_sample_request(system_prompt, user_query):
    """
    Creates a sample request structure for the predictor based on the given system prompt and user query.

    Parameters:
        system_prompt (str): The initial system prompt to set the model's role.
        user_query (str): The user's query for the insurance model.

    Returns:
        dict: A structured request for the SageMaker predictor.
    """
    return {
        "model": HF_MODEL_ID_TO_PUSH,
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_query}
        ],
        "parameters": {
            "do_sample": True,
            "max_new_tokens": 128,
            "temperature": 0.7,
            "top_k": 50,
            "top_p": 0.95,
        }
    }

# List of different system prompts and user queries to test various scenarios
system_user_queries = [
    ("You are an expert in health insurance policies.", "What benefits do I get with my current health plan?"),
    ("You are an insurance advisor.", "How can I reduce my monthly insurance premium?"),
    ("You are an expert in auto insurance policies.", "What happens if my car is totaled?"),
    ("You are an expert in life insurance.", "Can you explain the difference between term and whole life insurance?"),
    ("You are an insurance claims specialist.", "What documents are needed to file a claim for home insurance?"),
    ("You are a customer service representative for health insurance.", "Can I add my spouse to my health insurance policy?"),
    ("You are an expert in travel insurance policies.", "What coverage do I have if my flight is canceled?"),
    ("You are a specialist in pet insurance.", "Does my policy cover emergency vet visits?"),
    ("You are an insurance fraud investigator.", "What are some common signs of insurance fraud?"),
    ("You are an advisor on property insurance.", "How do I increase the coverage for natural disasters?")
]

# Loop through each system prompt and user query, create a request, and get a response from the predictor
for i, (system_prompt, user_query) in enumerate(system_user_queries, start=1):
    print(f"--- Sample Request {i} ---")
    request = create_sample_request(system_prompt, user_query)
    response = predictor.predict(request)
    print("System Prompt:", system_prompt)
    print("User Query:", user_query)
    print("Model Response:", response['choices'][0]['message']['content'])
    print("\n")

### Send a sample request to the model using the SageMaker API


In [76]:
sagemaker_client = boto3.client("sagemaker-runtime", 
                                region_name="us-east-2")

# Function to query the model on SageMaker
def query_sagemaker_model(endpoint_name, query):
    payload = {
        "model": HF_MODEL_ID_TO_PUSH,  # Updated model name
        "messages": [
            {"role": "system", "content": "You are an expert in customer support for Insurance."},
            {"role": "user", "content": query}  # Send the user query as a string
        ],
        "parameters": {
            "do_sample": True,
            "max_new_tokens": 4096,
            "temperature": 0.5,
            "top_k": 50,
            "top_p": 0.90,
            "max_length": 4096,
            "stop": None
        }
    }
    
    try:
        # Send the request to SageMaker endpoint
        response = sagemaker_client.invoke_endpoint(
            EndpointName=endpoint_name,
            ContentType="application/json",
            Body=json.dumps(payload)
        )
        
        # Parse the response
        result = json.loads(response['Body'].read())
        print(result)
        return result['choices'][0]['message']['content']
    
    except ClientError as e:
        print(f"An error occurred with SageMaker: {e.response['Error']['Message']}")
        return None
    
sagemaker_endpoint_name = "huggingface-pytorch-tgi-inference-ml-in-2024-11-05-03-57-02-330"  # SageMaker endpoint name    
model_response = query_sagemaker_model(sagemaker_endpoint_name, 
                                       "How can I reduce my monthly insurance premium?")
print(model_response)   

{'object': 'chat.completion', 'id': '', 'created': 1730908518, 'model': 'aboavent/Mistral-7B-Insurance-neuron', 'system_fingerprint': '2.1.1-native', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': ' To effectively reduce your monthly insurance premium, please adhere to the following guidelines:\n\n1. Evaluate your coverage: Examine your existing insurance plan and determine if there are any unused or superfluous options that may be contributing to the overall cost.\n2. Shop around: Compare different insurance providers and the plans they offer to determine whether there are more affordable options available in the market.\n3. Increase your deductible: Opting for a higher deductible will'}, 'logprobs': None, 'finish_reason': 'length'}], 'usage': {'prompt_tokens': 0, 'completion_tokens': 100, 'total_tokens': 100}}
 To effectively reduce your monthly insurance premium, please adhere to the following guidelines:

1. Evaluate your coverage: Examine your existing insura

## Section 9: Convert Model to Safetensors Format for Bedrock

Define a function to convert the model to `safetensors` format, which is required for Amazon Bedrock.


In [4]:
%%time

def convert_to_safetensors(model_name, save_directory):
    """
    Convert a Hugging Face model to safetensors format for Amazon Bedrock compatibility.
    
    Parameters:
        model_name (str): Name of the model to convert.
        save_directory (str): Directory to save the converted model and tokenizer.
    """
    os.makedirs(save_directory, exist_ok=True)
    print(f"Loading model {model_name}...")
    model = AutoModelForCausalLM.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)

    print(f"Converting and saving model to {save_directory} in safetensors format...")
    model.save_pretrained(save_directory, safe=True)
    tokenizer.save_pretrained(save_directory)
    print("Conversion complete!")

# Specify the directory and model name
save_directory = os.path.expanduser("~/Mistral-7B-Insurance")
os.makedirs(save_directory, exist_ok=True)
convert_to_safetensors(MODEL_ID, save_directory)

# List the contents of the save directory to verify the conversion
os.listdir(save_directory)

Loading model bitext/Mistral-7B-Insurance...


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Converting and saving model to ~/Mistral-7B-Insurance in safetensors format...
Conversion complete!
CPU times: user 16.2 s, sys: 51.4 s, total: 1min 7s
Wall time: 5min 13s


## Section 10: Upload Converted Model to S3

Upload the `safetensors` formatted model files to an S3 bucket, making them accessible to Amazon Bedrock.


In [14]:
%%time

from botocore.exceptions import ClientError
from tqdm import tqdm  # Progress bar

# Define S3 and local directory configurations
s3_client = boto3.client("s3", region_name="us-west-2")
s3_bucket_name = "mistral-7b-insurance-bedrock-import"  # Updated to lowercase and valid name
s3_model_directory = "safetensors"
local_model_directory = save_directory  # Use save_directory from Section 9

def create_bucket_if_not_exists(bucket_name, region="us-west-2"):
    """
    Creates the S3 bucket if it does not exist.
    
    Parameters:
        bucket_name (str): The name of the bucket to create.
        region (str): The AWS region for the bucket.
    """
    try:
        s3_client.head_bucket(Bucket=bucket_name)
        print(f"Bucket '{bucket_name}' already exists.")
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == '404':
            print(f"Bucket '{bucket_name}' does not exist. Creating bucket...")
            s3_client.create_bucket(
                Bucket=bucket_name,
                CreateBucketConfiguration={'LocationConstraint': region}
            )
            print(f"Bucket '{bucket_name}' created successfully.")
        else:
            print(f"Unexpected error: {e}")
            raise

# Create the bucket if it doesn't exist
create_bucket_if_not_exists(s3_bucket_name)

def upload_to_s3(local_directory, bucket, s3_directory):
    """
    Uploads all files from a local directory to the specified S3 bucket and directory.

    Parameters:
        local_directory (str): Path to the local directory containing files to upload.
        bucket (str): Name of the S3 bucket.
        s3_directory (str): Directory path within the S3 bucket to store the files.
    """
    files = [f for f in os.listdir(local_directory) if os.path.isfile(os.path.join(local_directory, f))]
    
    # Progress bar for uploads
    for filename in tqdm(files, desc="Uploading files to S3"):
        file_path = os.path.join(local_directory, filename)
        s3_path = f"{s3_directory}/{filename}"
        print(f"Uploading {filename} to s3://{bucket}/{s3_path}...")
        s3_client.upload_file(file_path, bucket, s3_path)
        print(f"{filename} uploaded successfully.")

# Run the upload function
upload_to_s3(local_model_directory, s3_bucket_name, s3_model_directory)

Bucket 'mistral-7b-insurance-bedrock-import' already exists.


Uploading files to S3:   0%|          | 0/12 [00:00<?, ?it/s]

Uploading model-00002-of-00006.safetensors to s3://mistral-7b-insurance-bedrock-import/safetensors/model-00002-of-00006.safetensors...


Uploading files to S3:  17%|█▋        | 2/12 [00:29<02:00, 12.07s/it]

model-00002-of-00006.safetensors uploaded successfully.
Uploading tokenizer_config.json to s3://mistral-7b-insurance-bedrock-import/safetensors/tokenizer_config.json...
tokenizer_config.json uploaded successfully.
Uploading model-00006-of-00006.safetensors to s3://mistral-7b-insurance-bedrock-import/safetensors/model-00006-of-00006.safetensors...


Uploading files to S3:  25%|██▌       | 3/12 [00:55<02:47, 18.62s/it]

model-00006-of-00006.safetensors uploaded successfully.
Uploading model-00004-of-00006.safetensors to s3://mistral-7b-insurance-bedrock-import/safetensors/model-00004-of-00006.safetensors...


Uploading files to S3:  33%|███▎      | 4/12 [01:28<03:15, 24.39s/it]

model-00004-of-00006.safetensors uploaded successfully.
Uploading tokenizer.model to s3://mistral-7b-insurance-bedrock-import/safetensors/tokenizer.model...


Uploading files to S3:  50%|█████     | 6/12 [01:29<01:02, 10.42s/it]

tokenizer.model uploaded successfully.
Uploading model.safetensors.index.json to s3://mistral-7b-insurance-bedrock-import/safetensors/model.safetensors.index.json...
model.safetensors.index.json uploaded successfully.
Uploading config.json to s3://mistral-7b-insurance-bedrock-import/safetensors/config.json...


Uploading files to S3:  58%|█████▊    | 7/12 [01:29<00:35,  7.06s/it]

config.json uploaded successfully.
Uploading model-00003-of-00006.safetensors to s3://mistral-7b-insurance-bedrock-import/safetensors/model-00003-of-00006.safetensors...


Uploading files to S3:  75%|███████▌  | 9/12 [02:03<00:32, 10.81s/it]

model-00003-of-00006.safetensors uploaded successfully.
Uploading generation_config.json to s3://mistral-7b-insurance-bedrock-import/safetensors/generation_config.json...
generation_config.json uploaded successfully.
Uploading special_tokens_map.json to s3://mistral-7b-insurance-bedrock-import/safetensors/special_tokens_map.json...


Uploading files to S3:  83%|████████▎ | 10/12 [02:03<00:15,  7.53s/it]

special_tokens_map.json uploaded successfully.
Uploading model-00005-of-00006.safetensors to s3://mistral-7b-insurance-bedrock-import/safetensors/model-00005-of-00006.safetensors...


Uploading files to S3:  92%|█████████▏| 11/12 [02:36<00:15, 15.13s/it]

model-00005-of-00006.safetensors uploaded successfully.
Uploading model-00001-of-00006.safetensors to s3://mistral-7b-insurance-bedrock-import/safetensors/model-00001-of-00006.safetensors...


Uploading files to S3: 100%|██████████| 12/12 [03:07<00:00, 15.62s/it]

model-00001-of-00006.safetensors uploaded successfully.
CPU times: user 1min 19s, sys: 42.3 s, total: 2min 2s
Wall time: 3min 7s





## Section 11: Import Model into Amazon Bedrock

Create an IAM Execution Role for Bedrock with Parameters to be used by
a model import job in Amazon Bedrock using the files uploaded to S3.


In [17]:
import boto3
from botocore.exceptions import ClientError
import json

# Parameters for region and source account
region = "us-west-2"  # Replace with your desired AWS region
source_account = "603555443475"  # Replace with your AWS account ID

# IAM client and role/policy details
iam_client = boto3.client('iam')
role_name = "BedrockModelImportExecutionRole"
policy_name = "BedrockModelImportPolicy"
s3_bucket_name = "mistral-7b-insurance-bedrock-import"  # Replace with your actual bucket name

# Define the trust policy to allow Bedrock to assume this role with specific conditions
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": source_account  # Parameterized account ID
                },
                "ArnEquals": {
                    "aws:SourceArn": f"arn:aws:bedrock:{region}:{source_account}:model-import-job/*"  # Parameterized region and account ID
                }
            }
        }
    ]
}

# Define the permissions policy for S3 and Bedrock access
permissions_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                f"arn:aws:s3:::{s3_bucket_name}",
                f"arn:aws:s3:::{s3_bucket_name}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateModel",
                "bedrock:GetModel",
                "bedrock:ListModels",
                "bedrock:CreateModelImportJob",
                "bedrock:GetModelImportJob"
            ],
            "Resource": "*"
        }
    ]
}

# Create the IAM role
try:
    print("Creating IAM Role...")
    role_response = iam_client.create_role(
        RoleName=role_name,
        AssumeRolePolicyDocument=json.dumps(trust_policy),
        Description="Role for Amazon Bedrock model import job with S3 access"
    )
    role_arn = role_response['Role']['Arn']
    print(f"IAM Role created with ARN: {role_arn}")
except ClientError as e:
    if e.response['Error']['Code'] == 'EntityAlreadyExists':
        print(f"Role '{role_name}' already exists.")
        role_arn = iam_client.get_role(RoleName=role_name)['Role']['Arn']
    else:
        raise

# Attach the permissions policy to the role
try:
    print("Attaching policy to IAM Role...")
    iam_client.put_role_policy(
        RoleName=role_name,
        PolicyName=policy_name,
        PolicyDocument=json.dumps(permissions_policy)
    )
    print("Policy attached successfully.")
except ClientError as e:
    print(f"Error attaching policy: {e}")
    raise

# The role ARN will be used in the next cell to create the model import job


Creating IAM Role...
IAM Role created with ARN: arn:aws:iam::603555443475:role/BedrockModelImportExecutionRole
Attaching policy to IAM Role...
Policy attached successfully.


In [18]:
from datetime import datetime

bedrock_client = boto3.client('bedrock', region_name=region)  # Use the parameterized region
s3_model_uri = f"s3://{s3_bucket_name}/{s3_model_directory}/"  # Reference the bucket and directory
imported_model_name = "Mistral-7B-Insurance-Model"

# Use the IAM role ARN created in the previous cell
job_name = f"mistral-7b-insurance-import-job-{datetime.now().strftime('%Y%m%d%H%M%S')}"

# Create the model import job
response = bedrock_client.create_model_import_job(
    jobName=job_name,
    importedModelName=imported_model_name,
    roleArn=role_arn,  # Use the ARN from the IAM role created in the previous cell
    modelDataSource={'s3DataSource': {'s3Uri': s3_model_uri}}
)

print("Model import job created:", response)
print(json.dumps(response, indent=4))


Model import job created: {'ResponseMetadata': {'RequestId': '54057d46-a7c8-40c0-8647-e5e31cbdf7dc', 'HTTPStatusCode': 201, 'HTTPHeaders': {'date': 'Tue, 05 Nov 2024 21:49:04 GMT', 'content-type': 'application/json', 'content-length': '81', 'connection': 'keep-alive', 'x-amzn-requestid': '54057d46-a7c8-40c0-8647-e5e31cbdf7dc'}, 'RetryAttempts': 0}, 'jobArn': 'arn:aws:bedrock:us-west-2:603555443475:model-import-job/01252npqffte'}
{
    "ResponseMetadata": {
        "RequestId": "54057d46-a7c8-40c0-8647-e5e31cbdf7dc",
        "HTTPStatusCode": 201,
        "HTTPHeaders": {
            "date": "Tue, 05 Nov 2024 21:49:04 GMT",
            "content-type": "application/json",
            "content-length": "81",
            "connection": "keep-alive",
            "x-amzn-requestid": "54057d46-a7c8-40c0-8647-e5e31cbdf7dc"
        },
        "RetryAttempts": 0
    },
    "jobArn": "arn:aws:bedrock:us-west-2:603555443475:model-import-job/01252npqffte"
}
CPU times: user 22.2 ms, sys: 0 ns, total:

In [43]:
%%time

import time
from botocore.exceptions import ClientError

# Use the job name from the response of create_model_import_job to track the job
polling_interval = 30  # Time in seconds between each status check

def check_job_status(job_name):
    """
    Checks the status of the model import job and returns the current status, failure message, and imported model ARN if available.

    Parameters:
        job_name (str): The name of the model import job to check.

    Returns:
        dict: Contains the status, failure message, and imported model ARN if the job is completed.
    """
    try:
        status_response = bedrock_client.get_model_import_job(jobIdentifier=job_name)
        return {
            "status": status_response["status"],
            "failureMessage": status_response.get("failureMessage", ""),
            "importedModelArn": status_response.get("importedModelArn", None)
        }
    except ClientError as e:
        print(f"An error occurred: {e}")
        return None

# Loop to check the job status periodically
print(f"Checking status for job {job_name} every {polling_interval} seconds...")
imported_model_arn = None
while True:
    result = check_job_status(job_name)
    if result is None:
        print("Unable to retrieve job status. Exiting.")
        break

    status = result["status"]
    failure_message = result["failureMessage"]
    imported_model_arn = result["importedModelArn"]
    print(f"Current status: {status}")

    # Check if the job has reached a final state
    if status in ["Completed", "Failed"]:
        if status == "Failed" and failure_message:
            print(f"Job failed with message: {failure_message}")
            imported_model_arn = None  # Clear the ARN if the job failed
        else:
            print(f"Job {job_name} finished with status: {status}")
            print(f"Imported Model ARN: {imported_model_arn}")
        break

    # Wait before the next status check
    time.sleep(polling_interval)

# Set the model ID to the imported model ARN if the job was successful
if imported_model_arn:
    imported_model_id = imported_model_arn  # Assign the model ARN to model_id for further use
    print(f"Model ID (ARN) for further use: {model_id}")
else:
    print("Model import job did not complete successfully.")

Checking status for job mistral-import-job-20241105214904 every 30 seconds...
Current status: Completed
Job mistral-import-job-20241105214904 finished with status: Completed
Imported Model ARN: arn:aws:bedrock:us-west-2:603555443475:imported-model/39kzslj1khll
Model ID (ARN) for further use: arn:aws:bedrock:us-west-2:603555443475:imported-model/39kzslj1khll
CPU times: user 8.92 ms, sys: 5.88 ms, total: 14.8 ms
Wall time: 313 ms


## Section 12: Call Imported Model Using Amazon Bedrock Converse API

Send a test request to the imported model on Amazon Bedrock using the Converse API to verify its functionality.


In [47]:
import boto3
import json
from botocore.exceptions import ClientError

# Initialize the Bedrock runtime client
bedrock_runtime_client = boto3.client('bedrock-runtime', region_name="us-west-2")  # Replace with your region

# Ensure imported_model_id is set from the previous section where the model import job completed
# imported_model_id should be the imported model's ARN (importedModelArn)
if not imported_model_id:
    raise ValueError("Model ID (importedModelArn) is not set. Ensure the model import job completed successfully.")

# Define the conversation messages, with user role correctly structured
# Add the system-like instruction as part of the initial user message
messages = [
    {
        "role": "user",
        "content": [
            {"text": "You are an expert in customer support for insurance. Please help me understand my health insurance benefits."}
        ]
    }
]

# Define the converse function
def converse(messages):
    """
    Calls the Bedrock converse API without a system message.

    Parameters:
        messages (list): List of conversation messages.

    Returns:
        dict: The API response from Bedrock.
    """
    # Configure the conversation payload
    converse_config = {
        "modelId": imported_model_id,  # Use the imported model ARN as the model ID
        "messages": messages,
        "inferenceConfig": {
            "temperature": 0.5
        }
    }
    
    print("\nConversation:")
    for message in messages:
        print(f"{message['role'].capitalize()}: {json.dumps(message['content'], indent=2)}")
    
    # Call the converse API
    try:
        response = bedrock_runtime_client.converse(**converse_config)
        return response
    except ClientError as e:
        error_message = e.response['Error']['Message']
        print(f"An error occurred: {error_message}")
        print("Converse config:")
        print(json.dumps(converse_config, indent=2))
        return None

def print_converse_response(response):
    """
    Prints the conversation response in a readable format.

    Parameters:
        response (dict): The API response from Bedrock.
    """
    if response:
        print(f"Response: {response['output']['message']['content'][0]['text']}")
        if 'trace' in response:
            print("Trace:")
            print(json.dumps(response['trace'], indent=2))
    else:
        print("No response received.")

# Example usage
# Run the converse function and print the response
response = converse(messages)
print_converse_response(response)


Conversation:
User: [
  {
    "text": "You are an expert in customer support for insurance. Please help me understand my health insurance benefits."
  }
]
Response: To effectively understand your health insurance benefits, please adhere to the following steps:

1. Access our website at {{WEBSITE_URL}}.
2. Enter your login credentials to access your account.
3. Proceed to the {{HEALTH_INSURANCE_SECTION}} section of your account.
4. Click on the {{VIEW_DETAILS_TAB}} tab to review your health insurance details.

Should you require additional support, please reach out to our customer service team via our helpline.


## Section 13: Call Imported Model Using Amazon Bedrock Converse Streaming API

This section demonstrates how to use the Amazon Bedrock converse_stream API to handle multiple test cases in real-time. Each message in the sample set represents a unique query to simulate different customer support scenarios.

In [52]:
import boto3
import logging
from botocore.exceptions import ClientError

# Initialize logging
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)

# Initialize the Bedrock runtime client
bedrock_runtime_client = boto3.client('bedrock-runtime', region_name="us-west-2")  # Replace with your region

# Ensure imported_model_id is set from the previous section where the model import job completed
if not imported_model_id:
    raise ValueError("Model ID (importedModelArn) is not set. Ensure the model import job completed successfully.")

# Define multiple conversation samples without system prompts
sample_messages = [
    [
        {
            "role": "user",
            "content": [{"text": "Can you help me understand my health insurance benefits?"}]
        }
    ],
    [
        {
            "role": "user",
            "content": [{"text": "What does my policy cover if I need to see a specialist?"}]
        }
    ],
    [
        {
            "role": "user",
            "content": [{"text": "Are dental treatments covered in my current insurance plan?"}]
        }
    ],
    [
        {
            "role": "user",
            "content": [{"text": "How do I file a claim for a recent doctor visit?"}]
        }
    ],
    [
        {
            "role": "user",
            "content": [{"text": "Can you explain what deductible means in my policy?"}]
        }
    ]
]

# Inference parameters
inference_config = {"temperature": 0.5}
additional_model_fields = {"top_k": 200}

# Define the streaming converse function
def stream_conversation(bedrock_client, model_id, messages, inference_config, additional_model_fields):
    """
    Calls the Bedrock converse_stream API and handles streaming response.

    Parameters:
        bedrock_client: The Boto3 Bedrock runtime client.
        model_id (str): The model ID to use.
        messages (list): The messages to send.
        inference_config (dict): The inference configuration to use.
        additional_model_fields (dict): Additional model fields to use.
    """
    logger.info("Streaming messages with model %s", model_id)

    response = bedrock_client.converse_stream(
        modelId=model_id,
        messages=messages,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields
    )

    stream = response.get('stream')
    if stream:
        for event in stream:
            if 'messageStart' in event:
                print(f"\nRole: {event['messageStart']['role']}")

            if 'contentBlockDelta' in event:
                print(event['contentBlockDelta']['delta']['text'], end="")

            if 'messageStop' in event:
                print(f"\nStop reason: {event['messageStop']['stopReason']}")

            if 'metadata' in event:
                metadata = event['metadata']
                if 'usage' in metadata:
                    print("\nToken usage")
                    print(f"Input tokens: {metadata['usage']['inputTokens']}")
                    print(f"Output tokens: {metadata['usage']['outputTokens']}")
                    print(f"Total tokens: {metadata['usage']['totalTokens']}")
                if 'metrics' in metadata:
                    print(f"Latency: {metadata['metrics']['latencyMs']} milliseconds")

# Example usage of streaming for multiple test cases
try:
    for i, messages in enumerate(sample_messages, 1):
        print("\n" + "="*50)  # Line separator for clarity
        print(f"\nStarting streaming response for sample #{i}: {messages[0]['content'][0]['text']}")
        stream_conversation(
            bedrock_runtime_client,
            imported_model_id,  # Use the imported model ARN as model ID
            messages,
            inference_config,
            additional_model_fields
        )
        print(f"\nFinished streaming response for sample #{i}: {messages[0]['content'][0]['text']}")
except ClientError as err:
    error_message = err.response['Error']['Message']
    logger.error("A client error occurred: %s", error_message)
    print("A client error occurred: " + format(error_message))
else:
    print(f"\nFinished streaming all test cases with model {imported_model_id}.")


INFO: Streaming messages with model arn:aws:bedrock:us-west-2:603555443475:imported-model/39kzslj1khll




Starting streaming response for sample #1: Can you help me understand my health insurance benefits?

Role: assistant
To effectively utilize your health insurance benefits, please adhere to the following guidelines:

1. Access your account via {{WEBSITE_URL}}.
2. Locate the {{HEALTH_INSURANCE_SECTION}} within your account dashboard.
3. Select the {{CLAIM_FORM}} option to initiate the claim process.
4. Complete the claim form with all necessary information, ensuring that you include your policy number, the details of your claim, and any relevant documentation.
5. Double-check all entered information for accuracy.
6. Submit your claim form by sending it to the designated claims department as indicated on the website.

After you have submitted your claim, please be patient as our team processes

INFO: Streaming messages with model arn:aws:bedrock:us-west-2:603555443475:imported-model/39kzslj1khll


 your request and responds to you in a timely manner.
Stop reason: end_turn

Token usage
Input tokens: 19
Output tokens: 171
Total tokens: 190
Latency: 2642 milliseconds

Finished streaming response for sample #1: Can you help me understand my health insurance benefits?


Starting streaming response for sample #2: What does my policy cover if I need to see a specialist?

Role: assistant
 To ascertain the details of your insurance coverage for visiting a specialist, please adhere to the following guidelines:

1. Access your account at {{WEBSITE_URL}}.
2. Locate the section pertaining to {{HEALTH_INSURANCE_SECTION}}.
3. Select the option for {{SPECIALIST_CONSULTATIONS_OPTION}}.
4. Analyze the specifics of your coverage as outlined in the provided information.

Should you require additional clarification or assistance, please do not hesitate to reach out to our customer support team.

INFO: Streaming messages with model arn:aws:bedrock:us-west-2:603555443475:imported-model/39kzslj1khll



Stop reason: end_turn

Token usage
Input tokens: 22
Output tokens: 127
Total tokens: 149
Latency: 1902 milliseconds

Finished streaming response for sample #2: What does my policy cover if I need to see a specialist?


Starting streaming response for sample #3: Are dental treatments covered in my current insurance plan?

Role: assistant
 To verify whether your dental treatments are covered by your insurance plan, please adhere to the following guidelines:

1. Access your account at {{WEBSITE_URL}}.
2. Proceed to the {{COVERAGE_SECTION}} section.
3. Choose your dental insurance policy from the displayed options.
4. Select the {{COVERAGE_DETAILS}} link to review the dental treatments that are included in your coverage.

If you require additional support, please reach out to our customer service

INFO: Streaming messages with model arn:aws:bedrock:us-west-2:603555443475:imported-model/39kzslj1khll


 team by dialing our support number.
Stop reason: end_turn

Token usage
Input tokens: 19
Output tokens: 119
Total tokens: 138
Latency: 2017 milliseconds

Finished streaming response for sample #3: Are dental treatments covered in my current insurance plan?


Starting streaming response for sample #4: How do I file a claim for a recent doctor visit?

Role: assistant
 To initiate the claim process for your recent doctor visit, please adhere to the following guidelines:

1. Access our website at {{WEBSITE_URL}}.
2. Enter your account credentials to log in.
3. Proceed to the {{CLAIM_SECTION}} section of the site.
4. Choose the option for {{FILE_CLAIM_OPTION}}.
5. Complete the claim form, ensuring all mandatory fields are filled in and necessary documents are attached.
6. Verify your information and submit the claim form.

Our claims department will examine your submission and

INFO: Streaming messages with model arn:aws:bedrock:us-west-2:603555443475:imported-model/39kzslj1khll


 respond at the earliest opportunity.
Stop reason: end_turn

Token usage
Input tokens: 21
Output tokens: 134
Total tokens: 155
Latency: 2158 milliseconds

Finished streaming response for sample #4: How do I file a claim for a recent doctor visit?


Starting streaming response for sample #5: Can you explain what deductible means in my policy?

Role: assistant
 A deductible is the amount of money you are required to pay out of pocket before your insurance coverage begins. This means that in the event of a claim, you will be responsible for paying the specified deductible amount before your insurance provider will cover the remaining costs. For instance, if your policy has a $500 deductible, you will need to pay $500 before your insurance provider will cover any further expenses related to your claim.
Stop reason: end_turn

Token usage
Input tokens: 21
Output tokens: 95
Total tokens: 116
Latency: 1565 milliseconds

Finished streaming response for sample #5: Can you explain what deductible