<h1 style="background: linear-gradient(to right, #ff6b6b, #4ecdc4); 
           color: white; 
           padding: 20px; 
           border-radius: 10px; 
           text-align: center; 
           font-family: Arial, sans-serif; 
           text-shadow: 2px 2px 4px rgba(0,0,0,0.5);">
    DeepSeek-R1 with Amazon Bedrock 
</h1>

# Prerequisites

- An AWS account with access to Amazon Bedrock
- Sufficient local storage space (at least 17GB for 8B and 135GB for 70B models)
- (Optional) An Amazon S3 bucket prepared to store the custom model
- (Optional) An AWS IAM Role with permissions for Bedrock to read from S3

## Step 1: Install required packages

In [9]:
!pip install boto3 huggingface_hub transformers[torch] -U




[notice] A new release of pip is available: 24.1.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip



Collecting accelerate>=0.26.0 (from transformers[torch])
  Downloading accelerate-1.3.0-py3-none-any.whl.metadata (19 kB)
Downloading accelerate-1.3.0-py3-none-any.whl (336 kB)
   ---------------------------------------- 0.0/336.6 kB ? eta -:--:--
   ------------------------------ --------- 256.0/336.6 kB 7.9 MB/s eta 0:00:01
   ---------------------------------------- 336.6/336.6 kB 6.9 MB/s eta 0:00:00
Installing collected packages: accelerate
Successfully installed accelerate-1.3.0


## Step 2: Configure parameters
Note: When using the defaults, unique random strings will be appended to resource names to prevent naming conflicts.

In [2]:
# Default configuration values
default_region = 'us-east-1'
default_repository_id = 'deepseek-ai/DeepSeek-R1-Distill-Llama-8B'
default_s3_root_folder = '/'
default_s3_bucket_base_name = 'bedrock-imported-models'
default_import_role_base_name = 'AmazonBedrockModelImportRole'
default_import_policy_name = 'AmazonBedrockModelImportPolicy'

# Collect required parameters from user
# Allow user to specify custom Hugging Face model repository
repository_id = input(f"Enter Hugging Face repository ID ['{default_repository_id}']: ") or default_repository_id

# Allow user to specify AWS region for deployment
aws_region = input(f"Enter the AWS region: ['us-east-1']: ") or default_region

# Get IAM role name for model import permissions
# If left empty, a new role will be created automatically
import_role_name = input("Enter the IAM role name for the model import [Leave empty to create a new role]") or None

# Get S3 storage configuration
# If left is empty, a new bucket will be created
s3_bucket_name = input('Enter the S3 bucket name [Leave empty to create a new bucket]') or None
s3_root_folder = input(f"Enter the S3 root prefix ['{default_s3_root_folder}']") or default_s3_root_folder

# Display final configuration settings
print('Configuration:')
print(f"- HF Repository ID: {repository_id}")
print(f"- Import role ARN: {import_role_name or 'Create a new IAM role'}")
print(f"- S3 bucket: {s3_bucket_name or 'Create a new S3 bucket'}")
print(f"- S3 root folder: {s3_root_folder}")


Configuration:
- HF Repository ID: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- Import role ARN: AmazonBedrockModelImportRole-42sjopt3
- S3 bucket: bedrock-imported-models-42sjopt3
- S3 root folder: /


## Step 3: Create new IAM Role and S3 Bucket if required

In [3]:
import boto3
import json
import random
import string

from botocore.exceptions import ClientError

# Create a random resource name postfix
postfix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=8))

def get_aws_account_id():
    sts_client = boto3.client('sts')
    return sts_client.get_caller_identity()['Account']

def get_or_create_role(role_name):
    iam_client = boto3.client('iam')
    
    if not role_name:
        print('Creating new IAM role...')
        
        account_id = get_aws_account_id()
        role_name = f"{default_import_role_base_name}-{postfix}"
       
        trust_policy = {
            "Version": "2012-10-17",
            "Statement": [{
                "Effect": "Allow",
                "Principal": { "Service": "bedrock.amazonaws.com" },
                "Action": "sts:AssumeRole",
                "Condition": {
                    "StringEquals": { "aws:SourceAccount": account_id },
                    "ArnEquals": {
                        "aws:SourceArn": f"arn:aws:bedrock:{aws_region}:{account_id}:model-import-job/*"
                    }
                }
            }]
        }
        
        inline_policy = {
            "Version": "2012-10-17",
            "Statement": [{
                    "Effect": "Allow",
                    "Action": [
                        "s3:GetObject",
                        "s3:ListBucket"
                    ],
                    "Resource": [
                        s3_bucket_arn,
                        f"{s3_bucket_arn}/*"
                    ],
                    "Condition": {
                        "StringEquals": {
                            "aws:ResourceAccount": account_id
                        }
                    }
                }
            ]
        }
        
        try:
            role = iam_client.create_role(
                RoleName=role_name,
                AssumeRolePolicyDocument=json.dumps(trust_policy)
            )

            iam_client.put_role_policy(
                RoleName=role_name,
                PolicyName=f"{default_import_policy_name}",
                PolicyDocument=json.dumps(inline_policy)
            )
            
            print(f"Successfully created IAM role: {role_name}")
        except ClientError as e:
            print(f"Error creating IAM role: {e}")
            exit(1)
    else:
        print(f"Checking IAM role: {role_name}")
        try:
            role = iam_client.get_role(RoleName=role_name)
            print('Found existing role.')
        except ClientError as e:
            print(f"Error retrieving S3 bucket: {e}")
            exit(1)
            
    return role["Role"]

def get_or_create_bucket(bucket_name):
    s3_client = boto3.client('s3', region_name=aws_region)
    
    if not bucket_name:
        print(f"Creating new S3 bucket...")
        
        bucket_name = f"{default_s3_bucket_base_name}-{postfix}"
        
        try:
            s3_client.create_bucket(Bucket=bucket_name)
            
            # Wait until bucket exists
            waiter = s3_client.get_waiter('bucket_exists')
            waiter.wait(
                Bucket=bucket_name,
                WaiterConfig={
                    'Delay': 5,
                    'MaxAttempts': 20
                }
            )
            
            print(f"Successfully created S3 bucket: {bucket_name}")
            return bucket_name
            
        except ClientError as e:
            print(f"Error creating S3 bucket: {e}")
            exit(1)
    else:
        print(f"Checking S3 bucket: {bucket_name}")
        try:
            bucket = s3_client.head_bucket(Bucket=bucket_name)
            print('Found existing bucket')
        except ClientError as e:
            print(f"Error retrieving IAM role: {e}")
            exit(1)
            
s3_bucket_arn = f"arn:aws:s3:::{s3_bucket_name}"
import_role = get_or_create_role(import_role_name)
import_role_arn = import_role["Arn"]

s3_bucket = get_or_create_bucket(s3_bucket_name)
s3_bucket_arn = f"arn:aws:s3:::{s3_bucket_name}"

Checking IAM role: AmazonBedrockModelImportRole-42sjopt3
Found existing role.
Checking S3 bucket: bedrock-imported-models-42sjopt3
Found existing bucket


---

# Download and deploy the model
## Step 1: Download the weights from Hugging Face

In [4]:
from huggingface_hub import snapshot_download

local_dir = snapshot_download(repository_id)
print(f"Model downloaded to: {local_dir}")

Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

README.md:   0%|          | 0.00/19.0k [00:00<?, ?B/s]

Model downloaded to: C:\Users\traubd\.cache\huggingface\hub\models--deepseek-ai--DeepSeek-R1-Distill-Llama-8B\snapshots\ebf7e8d03db3d86a442d22d30d499abb7ec27bea


## Step 2: Upload the weights to Amazon S3

In [5]:
import os

s3 = boto3.client('s3')

s3_folder = repository_id if s3_root_folder == '/' else f'{s3_root_folder}/{repository_id}'
s3_folder_uri = f"s3://{s3_bucket_name}/{s3_folder}"

def file_exists_in_s3(bucket_name, s3_key):
    return s3.list_objects_v2(Bucket=bucket_name, Prefix=s3_key)['KeyCount'] > 0

def upload_to_s3():
    for root, _, files in os.walk(local_dir):
        for file in files:
            file_path = os.path.join(root, file)
            relative_path = os.path.relpath(file_path, local_dir)
            s3_key = f"{s3_folder}/{relative_path}"
            
            if file_exists_in_s3(s3_bucket_name, s3_key):
                print(f"Skipping existing file: s3://{s3_bucket_name}/{s3_key}")
                continue
                
            print(f"Uploading: {file_path} to s3://{s3_bucket_name}/{s3_key}")
            s3.upload_file(file_path, s3_bucket_name, s3_key)

print('Uploading model files to S3...')

upload_to_s3()

print(f"Successfully uploaded model files to {s3_folder_uri}")

Uploading model files to S3...
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/.gitattributes
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/config.json
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/generation_config.json
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/LICENSE
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/model-00001-of-000002.safetensors
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/model-00002-of-000002.safetensors
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill-Llama-8B/model.safetensors.index.json
Skipping existing file: s3://bedrock-imported-models-42sjopt3/deepseek-ai/DeepSeek-R1-Distill

## Step 3: Deploy the model to Amazon Bedrock
### 3.1: Start a model import job 

In [None]:
import datetime

bedrock = boto3.client('bedrock', region_name=aws_region)
model_name = repository_id.split('/')[-1].replace('.', '-').replace('_', '-')

timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")

job_name = f'{model_name}-{timestamp}'

print(f"Starting model import job: {job_name}")

# Create the model import job
response = bedrock.create_model_import_job(
    jobName=job_name,
    importedModelName=model_name,
    roleArn=import_role_arn,
    modelDataSource={'s3DataSource': {'s3Uri': s3_folder_uri}}
)

print(f"Model import job started")

job_arn = response['jobArn']


### 3.2: Monitor the import job status

In [None]:
import time

print(f"Checking status of import job: {job_name}")
while True:
    response = bedrock.get_model_import_job(jobIdentifier=job_arn)
    status = response['status']
    if status == 'Failed':
        print('Model import failed!')
        
        failure_message = response['failureMessage']
        print(f"Reason: {failure_message}")
        break
    elif status == 'Completed':
        print('Model import complete.')
        
        model_id = response['importedModelArn']
        print(f"Imported model ID: {model_id}")
        break
    else:
        print('Importing...')

    time.sleep(60)  # Check every 60 seconds
    
model_arn = response['importedModelArn']


### 3.3: If necessary, wait some more time to make sure the model has been initialized

In [None]:
# Wait for 5 minutes for model initialization 
print('Waiting 5 minutes for model initialization...')

for i in range(5, 0, -1):
    print(f'{i} minute{"s" if i > 1 else ""}...')
    time.sleep(60)

# Test the model

## Step 1: Define the prompt 

In [30]:
prompt = "Explain the concept of 'rubber duck debugging' in a single paragraph."


## Step 2: Invoke the model
 
### Option 2.1: Using the `InvokeModelWithResponseStream` API
This allows you to retrieve and process the response in real-time as chunks.

**Pros:** Lower latency to first displayed content; provides visual feedback during generation; reduced memory overhead for large responses

**Cons:** More complex implementation; requires handling partial/incomplete responses; need to manage stream state and error handling

In [17]:
from botocore.config import Config
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(repository_id)

final_response = ""

def invoke_bedrock_model_stream(model_arn, message, region_name=aws_region, max_tokens=4096):
    global final_response
    
    config = Config(
        retries={
            'total_max_attempts': 10,
            'mode': 'standard'
        }
    )
    session = boto3.session.Session()
    br_runtime = session.client(service_name='bedrock-runtime',
                                region_name=region_name,
                                config=config)
    
    messages = [{'role': 'user', 'content': message}]
    
    formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    print(f"Formatted prompt: \n{formatted_prompt}")
    
    payload = {
        "prompt": formatted_prompt,
        "max_tokens": max_tokens,
        "temperature": 0.7,
        "top_p": 0.9
    }
    response = br_runtime.invoke_model_with_response_stream(
        modelId=model_arn,
        body=json.dumps(payload)
    )
    print("\nModel output:\n")
    for event in response["body"]:
        chunk = json.loads(event['chunk']['bytes'])
        if "generation" in chunk:
            text = chunk["generation"]
            final_response += text 
            print(text, end="", flush=True)

In [31]:
invoke_bedrock_model_stream(model_arn=model_arn, message=prompt)

Formatted prompt: 
<｜begin▁of▁sentence｜><｜User｜>Explain the concept of 'rubber duck debugging' in a single paragraph.<｜Assistant｜>

Model output:

<think>
Okay, so I'm trying to understand this concept called "rubber duck debugging." I've heard the term before, but I'm not exactly sure what it means. Let me think about it. I know that "debugging" usually means fixing errors in a program or software, right? So maybe "rubber duck" has something to do with that process.

Wait, rubber duck... I think I've heard it used in a metaphorical sense before. Maybe it's about taking a step back from the problem. Like, when you're stuck on a bug, you try to explain it to someone else, and in doing so, you figure it out yourself. So, if you're working on a computer issue, sometimes you just talk through it with a rubber duck or a stuffed animal, which forces you to articulate the problem clearly.

I remember reading somewhere that it's a technique used by software developers. They talk about explaini

### Option 2.2: Using the `InvokeModel` API
This provides a single, complete response after model processing is finished.

**Pros:** Simpler implementation; response arrives fully formatted; easier to use for downstream processing

**Cons:** Longer perceived wait time; no visual feedback during processing; entire response must be held in memory at once

In [27]:
def invoke_model(model, message, region_name=aws_region, max_tokens=4096):
    config = Config(
        retries={
            'total_max_attempts': 10, 
            'mode': 'standard'
        }
    )

    session = boto3.session.Session()
    br_runtime = session.client('bedrock-runtime', region_name=region_name, config=config)
    
    messages = [{'role': 'user', 'content': message}]
    
    formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    print(f"Formatted prompt:\n{formatted_prompt}")
    
    payload = {
        "prompt": formatted_prompt,
        "max_tokens": max_tokens,
        "temperature": 0.7,
        "top_p": 0.9
    }
        
    try:
        response = br_runtime.invoke_model(
            modelId=model, 
            body=json.dumps(payload) 
        )
        result = json.loads(response["body"].read().decode("utf-8"))
    except Exception as e:
        print(e)
        print(e.__repr__())

    return result

In [32]:
response = invoke_model(model=model_arn, message=prompt)

Formatted prompt:
<｜begin▁of▁sentence｜><｜User｜>Explain the concept of 'rubber duck debugging' in a single paragraph.<｜Assistant｜>


In [33]:
from IPython import display
display.Markdown(response['generation'])

<think>
Okay, so I need to explain the concept of 'rubber duck debugging' in a single paragraph. I've heard the term before, but I'm not exactly sure what it means. Let me try to break it down. 

First, I think it's related to programming or software development because the term 'debugging' suggests fixing errors. Rubber duck debugging sounds like it's a specific method for debugging. Maybe it's a technique where you explain your problem to a rubber duck? That seems a bit odd because a duck isn't going to understand programming. But perhaps it's a metaphor for a process.

I remember hearing that sometimes when you're stuck on a problem, explaining it to someone else can help you figure it out yourself. So maybe rubber duck debugging is similar. Instead of a person, you use a rubber duck as a dummy audience. By explaining the problem out loud or to an object, the act of verbalizing the issue can help clarify your own thoughts. That makes sense because sometimes when you talk through a problem, you catch your own mistakes or see a solution.

I should also consider the origins of the term. I think it's a practice used by programmers. They might use a rubber duck as a prop to simulate explaining the code to someone else, like a colleague or a rubber duck. This way, they can test their own understanding and identify where they're getting stuck.

So, putting it all together, rubber duck debugging is a technique where you explain a programming problem to a rubber duck or an imaginary audience. By doing so, you can better understand the problem and find the solution. It's a form of debugging that relies on verbal communication to enhance problem-solving.
</think>

Rubber duck debugging is a technique used in programming and problem-solving where one explains a problem to a rubber duck or an imaginary audience. By verbalizing the issue, often through storytelling, individuals can gain clarity and identify their own misunderstandings or errors, facilitating a better grasp of the problem and leading to a potential solution. This practice leverages the power of communication to enhance self-discovery and critical thinking.