# Hosting SageMaker AI models in Bedrock with Bedrock Custom Model Import

In this notebook, you'll take a model artifact that you trained with Amazon SageMaker AI and host it in Amazon Bedrock using Bedrock Custom Model Import.


***

## Global variables

This section contains python variables used in the notebook

In [13]:
import sagemaker
from datasets import load_dataset
import pandas as pd
from transformers import AutoTokenizer
import boto3
import os

sagemaker_session = sagemaker.Session()
bucket_name = sagemaker_session.default_bucket()
account_id = boto3.client('sts').get_caller_identity().get('Account')

print(bucket_name)
print(account_id)

sagemaker-us-west-2-340043819279
340043819279


# Import SageMaker AI fine-tuned Model to Amazon Bedrock

This notebook demonstrates how to import models to Amazon Bedrock using Custom Model Import (CMI) feature.

## Prerequisites

- An AWS account with access to Amazon Bedrock
- Appropriate IAM roles and permissions for Bedrock and Amazon S3, following [the instruction here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-import-iam-role.html)
- A S3 bucket prepared to store the custom model
- Sufficient local storage space (At least 17GB for 8B and 135GB for 70B models)


### Step 1: Verify S3 Path

Update these parameters according to your AWS environment:

In [4]:
s3_prefix = f'llama3-1-8b-merge-adapter-2025-03-25-00-31-59-900/output/model/merged'

In [5]:
full_s3_path = f"s3://{bucket_name}/{s3_prefix}/"
full_s3_path


's3://sagemaker-us-west-2-340043819279/llama3-1-8b-merge-adapter-2025-03-25-00-31-59-900/output/model/merged/'

In [6]:
!aws s3 ls {full_s3_path}

2025-03-25 00:37:51        906 config.json
2025-03-25 00:38:30        184 generation_config.json
2025-03-25 00:37:42 4886466168 model-00001-of-00007.safetensors
2025-03-25 00:37:59 4832007448 model-00002-of-00007.safetensors
2025-03-25 00:37:31 4999813112 model-00003-of-00007.safetensors
2025-03-25 00:38:08 4999813128 model-00004-of-00007.safetensors
2025-03-25 00:37:51 4832007496 model-00005-of-00007.safetensors
2025-03-25 00:38:15 4999813120 model-00006-of-00007.safetensors
2025-03-25 00:38:24 2571158184 model-00007-of-00007.safetensors
2025-03-25 00:37:51      23950 model.safetensors.index.json
2025-03-25 00:37:42        325 special_tokens_map.json
2025-03-25 00:38:07    9085657 tokenizer.json
2025-03-25 00:37:31      55380 tokenizer_config.json


The Bedrock Custom Model Import job requires a service role to run. The appropriate policies can be found in the [CMI documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-import-iam-role.html).

In [14]:
from datetime import datetime

timestamp = datetime.now().strftime("%m-%d-%Y-%H%M%S")
role_name = "bedrock-cmi-role" #replace with your own bedrock service role name if different


# Define your parameters (please update this part based on your setup)
imported_model_name = 'Fine-Tuned-RAFT' # E.x. Deepseek-8B-model
job_name = imported_model_name + '-' + timestamp # E.x. Deepseek-8B-job
role_arn = f'arn:aws:iam::{account_id}:role/{role_name}' # Please make sure it has sufficient permission as listed in the pre-requisite

# Region (currently only 'us-west-2' and 'us-east-1' support CMI with Deepseek-Distilled-Llama models)
region_info = sagemaker_session.boto_region_name#'us-west-2' # You can modify to 'us-east-1' based on your need

In [15]:
job_name

'Fine-Tuned-RAFT-03-28-2025-150703'

### Step 2: Create Custom Model Import Job

Initialize the import job in Amazon Bedrock

<div class="alert alert-warning">
<b>Note:</b> Creating CMI job for 8B model could take 5-20 minutes to complete.
</div>

In [16]:
# Initialize the Bedrock client
bedrock = boto3.client('bedrock', region_name=region_info)

s3_uri = f's3://{bucket_name}/{s3_prefix}/'

# Create the model import job
response = bedrock.create_model_import_job(
    jobName=job_name,
    importedModelName=imported_model_name,
    roleArn=role_arn,
    modelDataSource={
        's3DataSource': {
            's3Uri': s3_uri
        }
    }
)

job_Arn = response['jobArn']

# Output the job ARN
print(f"Model import job created with ARN: {response['jobArn']}")


Model import job created with ARN: arn:aws:bedrock:us-west-2:340043819279:model-import-job/vbzwuxncklo7


### Step 3: Monitor Import Job Status

Check the status of your import job

In [17]:
import time
# Check CMI job status
while True:
    response = bedrock.get_model_import_job(jobIdentifier=job_Arn)
    status = response['status'].upper()
    print(f"Status: {status}")
    
    if status in ['COMPLETED', 'FAILED']:
        break
        
    time.sleep(60)  # Check every 60 seconds

# Get the model ID
model_id = response['importedModelArn']

Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: INPROGRESS
Status: COMPLETED


### Step 4: Wait for Model Initialization

Allow time for the model to initialize:

In [18]:
model_id

'arn:aws:bedrock:us-west-2:340043819279:imported-model/gh3v4a6z3l69'

In [19]:
# Wait for 5mins for cold start 
time.sleep(300)

### Step 5: Model Inference

In [46]:
from botocore.config import Config
import json

def format_messages(messages: list[dict[str, str]]) -> list[str]:
    """
    Format messages for Llama 3+ chat models.
    
    The model only supports 'system', 'user' and 'assistant' roles, starting with 'system', then 'user' and 
    alternating (u/a/u/a/u...). The last message must be from 'user'.
    """
    # auto assistant suffix
    # messages.append({"role": "assistant"})
    
    output = "<|begin_of_text|>"
    # Adding an inferred prefix
    system_prefix = f"\n\nCutting Knowledge Date: December 2024\nToday Date: {datetime.now().strftime('%d %b %Y')}\n\n"
    for i, entry in enumerate(messages):
        output += f"<|start_header_id|>{entry['role']}<|end_header_id|>"
        if entry['role'] == 'system':
            output += f"{system_prefix}{entry['content']}<|eot_id|>"
        elif entry['role'] != 'system' and 'content' in entry:
            output += f"\n\n{entry['content']}<|eot_id|>"
    output += "<|start_header_id|>assistant<|end_header_id|>\n"
    return output


def send_prompt(messages, temperature=0.3, max_tokens=4096, top_p=0.9, continuation=False, max_retries=10):
    # convert u/a format 
    frmt_input = format_messages(messages)

    client = boto3.Session().client(
    service_name='bedrock-runtime',
    region_name=region_info,
    config=Config(
        connect_timeout=300,  # 5 minutes
        read_timeout=300,     # 5 minutes
        retries={'max_attempts': 3}
        )
    )

    attempt = 0
    while attempt < max_retries:
        try:
            response = client.invoke_model(
                modelId=model_id,
                body=json.dumps({
                    'prompt': frmt_input,
                    'temperature': temperature,
                    'max_gen_len': max_tokens,
                    'top_p': top_p
                }),
                accept='application/json',
                contentType='application/json'
            )
            
            result = json.loads(response['body'].read().decode('utf-8'))
            return result
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {str(e)}")
            attempt += 1
            if attempt < max_retries:
                time.sleep(30)
    
    raise Exception("Failed to get response after maximum retries")
    
    return response

In [47]:
def build_messages(data):
    system_content = f"""You are an assistant for question-answering tasks. Answer the following question in 5 sentences using the provided context. If you don't know the answer, just say "I don't know."."""
    user_content = f"""
        Context: {data["CONTEXT"]} 
        
        Question: {data["QUESTION"]}
        """

    messages = [
        {"role": "system", "content": system_content},
        {"role": "user", "content": user_content}
    ]
    
    return messages

In [48]:
from datasets import load_dataset, concatenate_datasets

test_dataset = load_dataset("json", data_files="../data/sft_test_data.json", split="train")

In [49]:
test_item = test_dataset[2]
test_item

{'CONTEXT': 'The aim of this study was to detect anti-topoisomerase I (anti-topo I) autoantibodies, which are known to be limited in systemic sclerosis patients, in silicosis patients with no clinical symptoms of autoimmune disease.\n\nSerum anti-topo I autoantibodies were detected using ELISA. Differences in clinical parameters between patients with and without anti-topo I autoantibodies were analyzed.\n\nSeven of 69 patients had anti-topo I autoantibodies. These 7 patients showed elevated PaCO(2) values (P=0.0212), and inverse correlations between serum soluble Fas levels and PaCO(2) values were found.\n\nSINV (TR339-eGFP) (+) strand RNA, infectious virus titers and infection rates transiently increased in mosquitoes following dsRNA injection to cognate Ago2, Dcr2, or TSN mRNAs. Detection of SINV RNA-derived small RNAs at 2 and 7 days post-infection in non-silenced mosquitoes provided important confirmation of RNAi pathway activity. Two different recombinant SINV viruses (MRE16-eGFP 

In [50]:
messages = build_messages(test_item)

model_response = send_prompt(messages)

print(f"""
    ============== Question ============
    {test_item["QUESTION"]}
    
    ========= Generated Answer =========
    {model_response}

    ======== Ground Truth Answer =======
    {test_item["ANSWER"]}
    """
)

Attempt 1 failed: An error occurred (ModelNotReadyException) when calling the InvokeModel operation (reached max retries: 3): Model is not ready for inference. Wait and try your request again. Refer to https://docs.aws.amazon.com/bedrock/latest/userguide/invoke-imported-model.html#handle-model-not-ready-exception.
Attempt 2 failed: An error occurred (ModelNotReadyException) when calling the InvokeModel operation (reached max retries: 3): Model is not ready for inference. Wait and try your request again. Refer to https://docs.aws.amazon.com/bedrock/latest/userguide/invoke-imported-model.html#handle-model-not-ready-exception.

    Do detection of anti-topoisomerase I autoantibody in patients with silicosis?
    
    {'generation': '        Answer: Yes, the study detected anti-topoisomerase I autoantibodies in 7 out of 69 patients with silicosis. These autoantibodies were found using an ELISA test. The presence of anti-topo I autoantibodies was associated with elevated PaCO2 values in the