# Import DeepSeek-R1-Distill-Llama Models to Amazon Bedrock

This notebook demonstrates how to import DeepSeek's distilled Llama models to Amazon Bedrock using Custom Model Import (CMI) feature. We'll use the 8B parameter model as an example, <u>but the same process applies to the 70B variant</u>.

## Introduction

DeepSeek has released several distilled versions of their models based on Llama architecture. These models maintain strong performance while being more efficient than their larger counterparts. The 8B model we'll use here is derived from Llama 3.1 and has been **optimized for reasoning tasks**.

## Prerequisites

- An AWS account with access to Amazon Bedrock
- Appropriate IAM roles and permissions for Bedrock and S3, following [the instruction here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-import-iam-role.html)
- A S3 bucket prepared to store the custom model
- Sufficient local storage space (At least 17GB for 8B and 135GB for 70B models)

### Step 1: Install Required Packages

First, let's install the necessary Python packages:

In [None]:
!pip install -U huggingface_hub
!pip install boto3 --upgrade

### Step 2: Configure Parameters

Update these parameters according to your AWS environment:

In [None]:
# Define your parameters (please update this part based on your setup)
bucket_name = "<YOUR-PREDEFINED-S3-BUCKET-TO-HOST-IMPORT-MODEL>"
s3_prefix = "<S3-PREFIX>" # E.x. DeepSeek-R1-Distill-Llama-8B
local_directory = "<LOCAL-FOLDER-TO-STORE-DOWNLOADED-MODEL>" # E.x. DeepSeek-R1-Distill-Llama-8B

job_name = '<CMI-JOB-NAME>' # E.x. Deepseek-8B-job
imported_model_name = '<CMI-MODEL-NAME>' # E.x. Deepseek-8B-model
role_arn = '<IAM-ROLE-ARN>' # Please make sure it has sufficient permission as listed in the pre-requisite

# Region (currently only 'us-west-2' and 'us-east-1' support CMI with Deepseek-Distilled-Llama models)
region_info = 'us-west-2' # You can modify to 'us-east-1' based on your need

### Step 3: Download Model from Hugging Face

Download the model files from Hugging Face. 

- Note that you can also use the 70B model by changing the model_id to "deepseek-ai/DeepSeek-R1-Distill-Llama-70B":

<div class="alert alert-warning">
<b>Note:</b> Downloading the 8B model files may take 10-20 minutes depending on your internet connection speed.
</div>

In [None]:
from huggingface_hub import snapshot_download

model_id = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
snapshot_download(repo_id=model_id, local_dir=f"./{local_directory}")

### Step 4: Upload Model to S3

Upload the downloaded model files to your S3 bucket

<div class="alert alert-warning">
<b>Note:</b> Uploading the 8B model files normally takes 10-20 minutes.
</div>

In [None]:
import os
import time
import json
import boto3
from pathlib import Path
from tqdm import tqdm

def upload_directory_to_s3(local_directory, bucket_name, s3_prefix):
    s3_client = boto3.client('s3')
    local_directory = Path(local_directory)
    
    # Get list of all files first
    all_files = []
    for root, dirs, files in os.walk(local_directory):
        for filename in files:
            local_path = Path(root) / filename
            relative_path = local_path.relative_to(local_directory)
            s3_key = f"{s3_prefix}/{relative_path}"
            all_files.append((local_path, s3_key))
    
    # Upload with progress bar
    for local_path, s3_key in tqdm(all_files, desc="Uploading files"):
        try:
            s3_client.upload_file(
                str(local_path),
                bucket_name,
                s3_key
            )
        except Exception as e:
            print(f"Error uploading {local_path}: {str(e)}")


# Upload all files
upload_directory_to_s3(local_directory, bucket_name, s3_prefix)


### Step 5: Create Custom Model Import Job

Initialize the import job in Amazon Bedrock

<div class="alert alert-warning">
<b>Note:</b> Creating CMI job for 8B model could take 14-18 minutes to complete.
</div>

In [None]:
# Initialize the Bedrock client
bedrock = boto3.client('bedrock', region_name=region_info)

s3_uri = f's3://{bucket_name}/{s3_prefix}/'

# Create the model import job
response = bedrock.create_model_import_job(
    jobName=job_name,
    importedModelName=imported_model_name,
    roleArn=role_arn,
    modelDataSource={
        's3DataSource': {
            's3Uri': s3_uri
        }
    }
)

job_Arn = response['jobArn']

# Output the job ARN
print(f"Model import job created with ARN: {response['jobArn']}")


### Step 6: Monitor Import Job Status

Check the status of your import job

In [None]:
# Check CMI job status
while True:
    response = bedrock.get_model_import_job(jobIdentifier=job_Arn)
    status = response['status'].upper()
    print(f"Status: {status}")
    
    if status in ['COMPLETED', 'FAILED']:
        break
        
    time.sleep(60)  # Check every 60 seconds

# Get the model ID
model_id = response['importedModelArn']

### Step 7: Wait for Model Initialization

Allow time for the model to initialize:

In [None]:
# Wait for 5mins for cold start 
time.sleep(300)

### Step 8: Model Inference

After successful model import and initialization, you can interact with your model through various inference methods supported by Amazon Bedrock. Here we demonstrate using the invoke_model API with a helper function:

In [None]:
client = boto3.client("bedrock-runtime", region_name=region_info)

def invoke_r1(user_prompt, max_retries=10, return_prompt=False):
    """
    user_prompt: The entire instruction for the model, including any directives
                 like 'Please reason step by step...' or context as needed.

    max_retries: Number of retries if the model doesn't respond properly.

    return_prompt: If True, prints out the final prompt being sent to the model.
    """

    # Note: We avoid using a separate system prompt per the DeepSeek-R1 recommendation.
    formatted_prompt = (
        f"<s>[INST]\n"
        f"\nHuman: {user_prompt}[/INST]"
        "\n\nAssistant: "
    )

    if return_prompt:
        print("==== Prompt ====")
        print(formatted_prompt)
        print("================")

    native_request = {
        "prompt": formatted_prompt,
        "max_gen_len": 4096,
        "top_p": 0.9,
        # Set temperature to around 0.6 to help prevent repetitiveness or incoherence
        "temperature": 0.6
    }

    attempt = 0
    response_text = ""
    while attempt < max_retries:
        response = client.invoke_model(modelId=model_id, body=json.dumps(native_request))
        response_body = json.loads(response.get('body').read())
        if 'generation' in response_body:
            response_text = response_body['generation'].strip()
            break
        else:
            print("Model does not appear to be ready. Retrying.")
            attempt += 1
            time.sleep(30)

    return response_text

#### Example Usage
Let's test the model with a simple reasoning task:

In [None]:
my_question = """Given the following financial data:
- Company A's revenue grew from $10M to $15M in 2023
- Operating costs increased by 20%
- Initial operating costs were $7M

Calculate the company's operating margin for 2023. Please reason step by step, and put your final answer within \\boxed{}.
"""

response = invoke_r1(my_question)
print(response)

#### Additional Inference Methods
For other inference methods like streaming responses or using the Converse API, refer to the [Invoke your imported model page](https://docs.aws.amazon.com/bedrock/latest/userguide/invoke-imported-model.html). 

Note that using the Converse API requires specific chat templates in your model's configuration files, for details check it [here](https://docs.aws.amazon.com/bedrock/latest/userguide/custom-model-import-code-samples-converse.html).

## Conclusion

This notebook demonstrates the end-to-end process of importing DeepSeek's distilled Llama models to Amazon Bedrock using Custom Model Import (CMI). Starting from downloading the model from HuggingFace, through preparing and uploading files to S3, to creating a CMI job and performing inference, we've covered the essential steps to get your DeepSeek distilled Llama models running on Amazon Bedrock.


While we've used the DeepSeek-R1-Distill-Llama-8B model in this example, the same process applies to other variants including the 70B model. For more information about Custom Model Import and its features, refer to the [Amazon Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html).