### Bedrock Deployment and Inference

### Model deployment and inference

After training and evaluating our model, we want to make it available for inference. Amazon Bedrock provides a serverless endpoint for model deployment, allowing us to serve the model without managing infrastructure.

The Bedrock Custom Model feature of Amazon Bedrock lets us import our fine-tuned model and access it through the same API as other foundation models. This provides:


In [None]:
import json
import time
import os
import sagemaker
import json
import boto3
import matplotlib.pyplot as plt

# Get region with fallback
region = os.environ.get('AWS_DEFAULT_REGION') or boto3.Session().region_name or 'us-east-1'
sagemaker_session_bucket = None

sm = boto3.client('sagemaker', region_name='us-east-1')
sess = sagemaker.session.Session(boto_session=boto3.session.Session(), sagemaker_client=sm)

if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

bucket_name = sess.default_bucket()
default_prefix = sess.default_bucket_prefix

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

In [None]:
%store -r post_sft_chkpt

### Step 1: Export Model from S3 Checkpoint to Bedrock

First step we will convert the final checkpoint S3 URI stored in `post_sft_chkpt` to be available in Bedrock under "custom model" for this we will use `create_custom_model` API from Bedrock as shown below:

In [None]:
import boto3

# Initialize the Bedrock client
bedrock = boto3.client("bedrock", region_name=sess.boto_region_name)

model_path = post_sft_chkpt

# Define name for imported model
imported_model_name = "nova-lite-2-post-sft-rft-codetalk"

request_params = {
    "modelName": imported_model_name,
    "modelSourceConfig": {"s3DataSource": {"s3Uri": model_path}},
    "roleArn": role,
    "clientRequestToken": "NovaRecipeSageMaker",
}

# Create the model import job
response = bedrock.create_custom_model(**request_params)

model_arn = response["modelArn"]

# Output the model ARN
print(f"Model import job created with ARN: {model_arn}")


### Step 1.5: Monitoring the Model status

After initiating the model import, we need to monitor its progress. The status goes through several states:

-  **CREATING**: Model is being imported
- **ACTIVE**: Import successful
- **FAILED**: Import encountered errors

This cell polls the Bedrock API every 60 seconds to check the status of the model import, continuing until it reaches a terminal state (ACTIVE or FAILED). Once the import completes successfully, we'll have the model ARN which we can use for inference.


In [None]:
from IPython.display import clear_output
import time

# Check CMI job status
while True:
    response = bedrock.list_custom_models(sortBy='CreationTime',sortOrder='Descending')
    model_summaries = response["modelSummaries"]
    status = ""
    for model in model_summaries:
        if model["modelName"] == imported_model_name:
            status = model["modelStatus"].upper()
            model_arn = model["modelArn"]
            print(f'{model["modelStatus"].upper()} {model["modelArn"]} ...')
            if status in ["ACTIVE", "FAILED"]:
                break
    if status in ["ACTIVE", "FAILED"]:
        break
    clear_output(wait=True)
    time.sleep(10)
    
model_arn

#### Step 2: Next step is we will create a On Demand Inference Endpoint.

Next we will use bedrock API `create_custom_model_deployment` to host the model adapter using Bedrock On Demand inference. After fine-tuning is complete, we need to deploy the model to make it available for inference. The following command creates a deployment for our fine-tuned Nova Micro model.

In [None]:
# Create custom model deployment using boto3
response = bedrock.create_custom_model_deployment(
    modelDeploymentName="nova-lite-2-post-sft-rft-deployment",
    modelArn=modelArn,
    description="on-demand custom model deployment",
    tags=[
        {
            'key': 'key',
            'value': 'value'
        }
    ]
)

# Extract the deployment ARN from the response
deployed_model_id = response['customModelDeploymentArn']



### Step 2.5 Monitor the Status of the Deployment

Next we will monitor the status of the deployment The get_custom_model_deployment API allows you to retrieve detailed information about a custom model deployment in Amazon Bedrock, including its current status, configuration, and metadata.


In [None]:
# List all custom model deployments

response = bedrock.list_custom_model_deployments()

deployments = response.get('modelDeploymentSummaries', [])

if deployments:
    print(f"Found {len(deployments)} custom model deployment(s):")
    print("-" * 80)
    
    for deployment in deployments:
        status_emoji = {
            'Active': 'üü¢',
            'Creating': 'üü°',
            'Failed': 'üî¥'
        }.get(deployment['status'], '‚ö™')
        
        print(f"{status_emoji} Name: {deployment['customModelDeploymentName']}")
        print(f"   Status: {deployment['status']}")
        print(f"   ARN: {deployment['customModelDeploymentArn']}")
        print(f"   Model ARN: {deployment['modelArn']}")
        print(f"   Created: {deployment['createdAt']}")
        print(f"   Last Updated: {deployment['lastUpdatedAt']}")
        
        # Show failure message if deployment failed
        if deployment.get('failureMessage'):
            print(f"   ‚ùå Failure: {deployment['failureMessage']}")
        
        print("-" * 80)
    
    # Summary by status
    status_counts = {}
    for deployment in deployments:
        status = deployment['status']
        status_counts[status] = status_counts.get(status, 0) + 1
    
    print(f"\nüìä Status Summary:")
    for status, count in status_counts.items():
        emoji = {'Active': 'üü¢', 'Creating': 'üü°', 'Failed': 'üî¥'}.get(status, '‚ö™')
        print(f"   {emoji} {status}: {count}")
        
else:
    print("No custom model deployments found")

![imgs/rft_bedrock.png](imgs/rft_bedrock.png)

### Step 3: Make Inference Call using bedrock


Testing the Deployed Model

Now that we have deployed our fine-tuned Nova Micro model, we can test it using various API methods provided by Amazon Bedrock. Each method offers different capabilities for interacting with the model.
Converse API

The Converse API allows for synchronous conversation with the model, receiving the complete response at once.


In [16]:
import json

brt = boto3.client(service_name='bedrock-runtime', region_name = 'us-east-1')

try:
    response = brt.converse(
        modelId="us.amazon.nova-2-lite-v1:0",
        system = [
                    {
                      "text": """You are a securities law tool selection specialist.\n\nYour task: \n1. Classify query type from 8 predefined categories\n2. Select appropriate tools, tool input, and tool sequence. Provide reasoning for each tool choice.\n3. Focus on connecting SEC regulations, EDGAR filings, and case law through expert tool selection decisions.\n4. Output structured JSON format:\n```json\n\"{\"Query analysis\": {\"Type\": \"[predefined_type]\", \"Information needed\": \"[specific_requirements]\", \"Tools\": [{\"Tool\": \"[tool_name]\", \"Parameters\": {[parameter_dict]}, \"Reasoning\": \"[why_this_tool]\"}, {\"Tool\": \"[tool_name_2]\", \"Parameters\": {[parameter_dict_2]}, \"Reasoning\": \"[why_this_tool_2]\"}]}}\"\n```\n\nAVAILABLE TOOLS: statute_retrieval, case_law_search, compliance_checker, citation_validator\n\nPREDEFINED TYPES: regulatory_definition, judicial_interpretation, compliance_validation, citation_verification, regulatory_compliance_analysis, judicial_compliance_assessment, cross_document_analysis, regulatory_interpretation_research"""
                    }
              ],
        messages=[
                    {
              "role": "user",
              "content": [
                {
                  "text": "Does this private placement comply with Regulation D safe harbor provisions?"
                }
              ]
            }
        ]
    )

    print("Request ID:", response['ResponseMetadata']['RequestId'])
    result = response.get('output')
    print(result)
    print("\n\n\n")
    print(result['message']['content'][0]['text'])

except Exception as e:
    # Print the full error response for debugging
    print("Error:", e)
    # Extract and print the Request ID from the error response
    if 'ResponseMetadata' in e.response:
        print("Request ID:", e.response['ResponseMetadata']['RequestId'])  

Request ID: dbd33c3d-a219-42d8-9e3c-b67200e66f0d
{'message': {'role': 'assistant', 'content': [{'text': '```json\n{\n  "Query analysis": {\n    "Type": "compliance_validation",\n    "Information needed": [\n      "Details of the private placement transaction",\n      "Information about investors involved",\n      "Offering amount and structure",\n      "Advertising or solicitation activities",\n      "Accredited investor status verification",\n      "Use of General Solicitation",\n      "Timing and number of sales"\n    ],\n    "Tools": [\n      {\n        "Tool": "compliance_checker",\n        "Parameters": {\n          "regulation": "Regulation D",\n          "safe_harbor": "Rule 504 or Rule 506",\n          "transaction_details": "Private placement memorandum, investor subscription agreements, advertising materials if any"\n        },\n        "Reasoning": "The compliance_checker tool is specifically designed to evaluate whether a given transaction meets the requirements of specific

In [15]:
import json

brt = boto3.client(service_name='bedrock-runtime', region_name = 'us-east-1')

try:
    response = brt.converse(
        modelId=deployed_model_id,
        system = [
                    {
                      "text": """You are a securities law tool selection specialist.\n\nYour task: \n1. Classify query type from 8 predefined categories\n2. Select appropriate tools, tool input, and tool sequence. Provide reasoning for each tool choice.\n3. Focus on connecting SEC regulations, EDGAR filings, and case law through expert tool selection decisions.\n4. Output structured JSON format:\n```json\n\"{\"Query analysis\": {\"Type\": \"[predefined_type]\", \"Information needed\": \"[specific_requirements]\", \"Tools\": [{\"Tool\": \"[tool_name]\", \"Parameters\": {[parameter_dict]}, \"Reasoning\": \"[why_this_tool]\"}, {\"Tool\": \"[tool_name_2]\", \"Parameters\": {[parameter_dict_2]}, \"Reasoning\": \"[why_this_tool_2]\"}]}}\"\n```\n\nAVAILABLE TOOLS: statute_retrieval, case_law_search, compliance_checker, citation_validator\n\nPREDEFINED TYPES: regulatory_definition, judicial_interpretation, compliance_validation, citation_verification, regulatory_compliance_analysis, judicial_compliance_assessment, cross_document_analysis, regulatory_interpretation_research"""
                    }
              ],
        messages=[
                    {
              "role": "user",
              "content": [
                {
                  "text": "Does this private placement comply with Regulation D safe harbor provisions?"
                }
              ]
            }
        ]
    )

    print("Request ID:", response['ResponseMetadata']['RequestId'])
    result = response.get('output')
    print(result)
    print("\n\n\n")
    print(result['message']['content'][0]['text'])

except Exception as e:
    # Print the full error response for debugging
    print("Error:", e)
    # Extract and print the Request ID from the error response
    if 'ResponseMetadata' in e.response:
        print("Request ID:", e.response['ResponseMetadata']['RequestId'])  

Request ID: 13add242-73d5-4aad-b50a-b4652ca012ed
{'message': {'role': 'assistant', 'content': [{'text': '```json\n{\n  "Query analysis": {\n    "Type": "regulatory_compliance_analysis",\n    "Information needed": "Regulation D requirements + EDGAR clause validation",\n    "Tools": [\n      {\n        "Tool": "statute_retrieval",\n        "Parameters": {"regulation": "Regulation D"},\n        "Reasoning": "Need authoritative text of Reg D safe harbor conditions"\n      },\n      {\n        "Tool": "compliance_checker",\n        "Parameters": {"query": "Private placement compliance", "edgar_check": "EDGAR subscription agreement language", "regulation": "Regulation D", "case_interpretation_check": "N/A"},\n        "Reasoning": "Must validate EDGAR clauses against Reg D requirements after retrieving statutory text"\n      }\n    ]\n  }\n}\n```'}]}}




```json
{
  "Query analysis": {
    "Type": "regulatory_compliance_analysis",
    "Information needed": "Regulation D requirements + EDGAR 