# Import DeepSeek-R1-Distill-Llama Models to Amazon Bedrock

This notebook demonstrates how to import DeepSeek's distilled Llama models to Amazon Bedrock using Custom Model Import (CMI) feature. We'll use the 8B parameter model as an example, <u>but the same process applies to the 70B variant</u>.

## Introduction

DeepSeek has released several distilled versions of their models based on Llama architecture. These models maintain strong performance while being more efficient than their larger counterparts. The 8B model we'll use here is derived from Llama 3.1 and has been **optimized for reasoning tasks**.

## Prerequisites

- An AWS account with access to Amazon Bedrock
- Appropriate IAM roles and permissions for Bedrock and S3, following [the instruction here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-import-iam-role.html)
- A S3 bucket prepared to store the custom model
- Sufficient local storage space (At least 17GB for 8B and 135GB for 70B models)

## Important Note About Inference API Options
This notebook supports two ways to interact with your imported model:
1. Direct [InvokeModel API](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) 
2. [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-call.html)

If you plan to use the Converse API, you'll need to update the model's <i>chat_template</i> configuration before uploading to S3. We provide instructions for both paths in this notebook. For details on using Converse API with CMI, check it [here](https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-import-code-samples-converse.html).


### Step 1: Install Required Packages

First, let's install the necessary Python packages:

In [None]:
!pip install boto3 --upgrade
!pip install -U huggingface_hub
!pip install hf_transfer huggingface huggingface_hub "huggingface_hub[hf_transfer]"

### Step 2: Configure Parameters

Update these parameters according to your AWS environment:

In [None]:
# Define your parameters (please update this part based on your setup)
bucket_name = "<YOUR-PREDEFINED-S3-BUCKET-TO-HOST-IMPORT-MODEL>"
s3_prefix = "<S3-PREFIX>" # E.x. DeepSeek-R1-Distill-Llama-8B
local_directory = "<LOCAL-FOLDER-TO-STORE-DOWNLOADED-MODEL>" # E.x. DeepSeek-R1-Distill-Llama-8B

job_name = '<CMI-JOB-NAME>' # E.x. Deepseek-8B-job
imported_model_name = '<CMI-MODEL-NAME>' # E.x. Deepseek-8B-model
role_arn = '<IAM-ROLE-ARN>' # Please make sure it has sufficient permission as listed in the pre-requisite

# Region (currently only 'us-west-2' and 'us-east-1' support CMI with Deepseek-Distilled-Llama models)
region_info = 'us-west-2' # You can modify to 'us-east-1' based on your need

### Step 3: Download Model from Hugging Face

Download the model files from Hugging Face. 

- Note that you can also use the 70B model by changing the model_id to "deepseek-ai/DeepSeek-R1-Distill-Llama-70B":

<div class="alert alert-warning">
<b>Note:</b> Downloading the 8B model files may take 2-10 minutes depending on your internet connection speed.
</div>

In [None]:
import os
from huggingface_hub import snapshot_download

model_id = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"

# Enable hf_transfer for faster downloads
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

# Download using snapshot_download with hf_transfer enabled
snapshot_download(repo_id=model_id, local_dir=f"./{local_directory}")

### Step 3b: (Optional) Configure for Converse API
If you plan to use the Converse API, you'll need to update the model's chat_template configuration.



In [None]:
import json

def update_tokenizer_config(local_directory):
    """
    Updates the tokenizer_config.json file with the required chat template for Converse API support
    """
        
    config_path = f"{local_directory}/tokenizer_config.json"
    
    try:
        # Read existing config
        with open(config_path, 'r') as f:
            config = json.load(f)
        
        # Update chat template
        config["chat_template"] = "{{- bos_token }}\n{%- if custom_tools is defined %}\n    {%- set tools = custom_tools %}\n{%- endif %}\n{%- if not tools_in_user_message is defined %}\n    {%- set tools_in_user_message = true %}\n{%- endif %}\n{%- if not date_string is defined %}\n    {%- set date_string = \"26 Jul 2024\" %}\n{%- endif %}\n{%- if not tools is defined %}\n    {%- set tools = none %}\n{%- endif %}\n\n{#- This block extracts the system message, so we can slot it into the right place. #}\n{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content']|trim %}\n    {%- set messages = messages[1:] %}\n{%- else %}\n    {%- set system_message = \"\" %}\n{%- endif %}\n\n{#- System message + builtin tools #}\n{{- \"<|start_header_id|>system<|end_header_id|>\\n\\n\" }}\n{%- if builtin_tools is defined or tools is not none %}\n    {{- \"Environment: ipython\\n\" }}\n{%- endif %}\n{%- if builtin_tools is defined %}\n    {{- \"Tools: \" + builtin_tools | reject('equalto', 'code_interpreter') | join(\", \") + \"\\n\\n\"}}\n{%- endif %}\n{{- \"Cutting Knowledge Date: December 2023\\n\" }}\n{{- \"Today Date: \" + date_string + \"\\n\\n\" }}\n{%- if tools is not none and not tools_in_user_message %}\n    {{- \"You have access to the following functions. To call a function, please respond with JSON for a function call.\" }}\n    {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n    {{- \"Do not use variables.\\n\\n\" }}\n    {%- for t in tools %}\n        {{- t | tojson(indent=4) }}\n        {{- \"\\n\\n\" }}\n    {%- endfor %}\n{%- endif %}\n{{- system_message }}\n{{- \"<|eot_id|>\" }}\n\n{#- Custom tools are passed in a user message with some extra guidance #}\n{%- if tools_in_user_message and not tools is none %}\n    {#- Extract the first user message so we can plug it in here #}\n    {%- if messages | length != 0 %}\n        {%- set first_user_message = messages[0]['content']|trim %}\n        {%- set messages = messages[1:] %}\n    {%- else %}\n        {{- raise_exception(\"Cannot put tools in the first user message when there's no first user message!\") }}\n{%- endif %}\n    {{- '<|start_header_id|>user<|end_header_id|>\\n\\n' -}}\n    {{- \"Given the following functions, please respond with a JSON for a function call \" }}\n    {{- \"with its proper arguments that best answers the given prompt.\\n\\n\" }}\n    {{- 'Respond in the format {\"name\": function name, \"parameters\": dictionary of argument name and its value}.' }}\n    {{- \"Do not use variables.\\n\\n\" }}\n    {%- for t in tools %}\n        {{- t | tojson(indent=4) }}\n        {{- \"\\n\\n\" }}\n    {%- endfor %}\n    {{- first_user_message + \"<|eot_id|>\"}}\n{%- endif %}\n\n{%- for message in messages %}\n    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}\n        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\\n\\n'+ message['content'] | trim + '<|eot_id|>' }}\n    {%- elif 'tool_calls' in message %}\n        {%- if not message.tool_calls|length == 1 %}\n            {{- raise_exception(\"This model only supports single tool-calls at once!\") }}\n        {%- endif %}\n        {%- set tool_call = message.tool_calls[0].function %}\n        {%- if builtin_tools is defined and tool_call.name in builtin_tools %}\n            {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}\n            {{- \"<|python_tag|>\" + tool_call.name + \".call(\" }}\n            {%- for arg_name, arg_val in tool_call.arguments | items %}\n                {{- arg_name + '=\"' + arg_val + '\"' }}\n                {%- if not loop.last %}\n                    {{- \", \" }}\n                {%- endif %}\n                {%- endfor %}\n            {{- \")\" }}\n        {%- else  %}\n            {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' -}}\n            {{- '{\"name\": \"' + tool_call.name + '\", ' }}\n            {{- '\"parameters\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- \"}\" }}\n        {%- endif %}\n        {%- if builtin_tools is defined %}\n            {#- This means we're in ipython mode #}\n            {{- \"<|eom_id|>\" }}\n        {%- else %}\n            {{- \"<|eot_id|>\" }}\n        {%- endif %}\n    {%- elif message.role == \"tool\" or message.role == \"ipython\" %}\n        {{- \"<|start_header_id|>ipython<|end_header_id|>\\n\\n\" }}\n        {%- if message.content is mapping or message.content is iterable %}\n            {{- message.content | tojson }}\n        {%- else %}\n            {{- message.content }}\n        {%- endif %}\n        {{- \"<|eot_id|>\" }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|start_header_id|>assistant<|end_header_id|>\\n\\n' }}\n{%- endif %}\n"

        # Write updated config back
        with open(config_path, 'w') as f:
            json.dump(config, f)
        
        print("Successfully updated tokenizer_config.json with chat_template for Converse API support")
    except Exception as e:
        print(f"Error updating tokenizer config: {str(e)}")
        raise

# Update the configuration if Converse API is enabled
update_tokenizer_config(local_directory)

### Step 4: Upload Model to S3

Upload the downloaded model files to your S3 bucket

<div class="alert alert-warning">
<b>Note:</b> Uploading the 8B model files normally takes 10-20 minutes.
</div>

In [None]:
import os
import time
import json
import boto3
from pathlib import Path
from tqdm import tqdm

def upload_directory_to_s3(local_directory, bucket_name, s3_prefix):
    s3_client = boto3.client('s3')
    local_directory = Path(local_directory)
    
    # Get list of all files first
    all_files = []
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            local_path = Path(root) / filename
            relative_path = local_path.relative_to(local_directory)
            s3_key = f"{s3_prefix}/{relative_path}"
            all_files.append((local_path, s3_key))
    
    # Upload with progress bar
    for local_path, s3_key in tqdm(all_files, desc="Uploading files"):
        try:
            s3_client.upload_file(
                str(local_path),
                bucket_name,
                s3_key
            )
        except Exception as e:
            print(f"Error uploading {local_path}: {str(e)}")


# Upload all files
upload_directory_to_s3(local_directory, bucket_name, s3_prefix)


### Step 5: Create Custom Model Import Job

Initialize the import job in Amazon Bedrock

<div class="alert alert-warning">
<b>Note:</b> Creating CMI job for 8B model could take 5-20 minutes to complete.
</div>

In [None]:
# Initialize the Bedrock client
bedrock = boto3.client('bedrock', region_name=region_info)

s3_uri = f's3://{bucket_name}/{s3_prefix}/'

# Create the model import job
response = bedrock.create_model_import_job(
    jobName=job_name,
    importedModelName=imported_model_name,
    roleArn=role_arn,
    modelDataSource={
        's3DataSource': {
            's3Uri': s3_uri
        }
    }
)

job_Arn = response['jobArn']

# Output the job ARN
print(f"Model import job created with ARN: {response['jobArn']}")


### Step 6: Monitor Import Job Status

Check the status of your import job

In [None]:
# Check CMI job status
while True:
    response = bedrock.get_model_import_job(jobIdentifier=job_Arn)
    status = response['status'].upper()
    print(f"Status: {status}")
    
    if status in ['COMPLETED', 'FAILED']:
        break
        
    time.sleep(60)  # Check every 60 seconds

# Get the model ID
model_id = response['importedModelArn']

### Step 7: Wait for Model Initialization

Allow time for the model to initialize:

In [None]:
# Wait for 5mins for cold start 
time.sleep(300)

### Step 8: Model Inference Options

After successful model import and initialization, you can interact with your model through two different APIs:
1. Direct InvokeModel API 
2. Converse API 

In [None]:
client = boto3.client("bedrock-runtime", region_name=region_info)

def invoke_r1_direct(user_prompt, max_retries=10, return_prompt=False):
    """
    Direct invocation using invoke_model API
    
    Parameters:
        user_prompt (str): The prompt to send to the model
        max_retries (int): Number of retries if the model doesn't respond
        return_prompt (bool): If True, prints the formatted prompt
    
    Returns:
        str: The model's response
    """
    formatted_prompt = (
        f"<s>[INST]\n"
        f"\nHuman: {user_prompt}[/INST]"
        "\n\nAssistant: "
    )

    if return_prompt:
        print("==== Prompt ====")
        print(formatted_prompt)
        print("================")

    native_request = {
        "prompt": formatted_prompt,
        "max_gen_len": 4096,
        "top_p": 0.9,
        "temperature": 0.6
    }

    attempt = 0
    response_text = ""
    while attempt < max_retries:
        try:
            response = client.invoke_model(modelId=model_id, body=json.dumps(native_request))
            response_body = json.loads(response.get('body').read())
            if 'generation' in response_body:
                response_text = response_body['generation'].strip()
                break
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {str(e)}")
        attempt += 1
        time.sleep(30)

    return response_text

def invoke_r1_converse(user_prompt, system_prompt="You are a helpful assistant.", max_retries=10):
    """
    Invocation using the Converse API
    
    Parameters:
        user_prompt (str): The prompt to send to the model
        system_prompt (str): System prompt to set the model's behavior
        max_retries (int): Number of retries if the model doesn't respond
    
    Returns:
        str: The model's response
    """
    messages = [
        {
            "role": "user",
            "content": [{
                "text": user_prompt
            }]
        }
    ]

    system_prompts = [{"text": system_prompt}]

    attempt = 0
    while attempt < max_retries:
        try:
            response = client.converse(
                modelId=model_id,
                messages=messages,
                system=system_prompts,
                inferenceConfig={
                    "temperature": 0.6,
                    "topP": 0.9,
                    "maxTokens": 2048
                }
            )
            
            output_message = response['output']['message']
            return output_message['content'][0]['text']
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {str(e)}")
            attempt += 1
            time.sleep(30)
    
    return "Failed to get response after maximum retries"

#### Example Usage
Let's test the model with a simple reasoning task:

In [None]:
test_prompt = """Given the following financial data:
- Company A's revenue grew from $10M to $15M in 2023
- Operating costs increased by 20%
- Initial operating costs were $7M

Calculate the company's operating margin for 2023. Please reason step by step, and put your final answer within \\boxed{}.
"""

print("=== Using direct invoke_model API ===")
response_direct = invoke_r1_direct(test_prompt)
print(response_direct)

<div class="alert alert-warning">
Run the following code block only if you have run <b>step 3b</b>
</div>

In [None]:
print("\n=== Using Converse API ===")
response_converse = invoke_r1_converse(
    test_prompt,
    system_prompt="You are a helpful financial analyst."
)
print(response_converse)

## Conclusion

This notebook demonstrates the end-to-end process of importing DeepSeek's distilled Llama models to Amazon Bedrock using Custom Model Import (CMI). Starting from downloading the model from HuggingFace, through preparing and uploading files to S3, to creating a CMI job and performing inference, we've covered the essential steps to get your DeepSeek distilled Llama models running on Amazon Bedrock.


While we've used the DeepSeek-R1-Distill-Llama-8B model in this example, the same process applies to other variants including the 70B model. For more information about Custom Model Import and its features, refer to the [Amazon Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html).