# Bedrock Marketplace DeepSeek

In this notebook, we'll use Bedrock Marketplace API to deploy a `DeepSeek-R1-Distill-Llama-8B` model to a Bedrock Marketplace Deployment endpoint.

**Pre-requisites:**
- Make sure the model is available in the region which you are deploying the model. We'll use `us-east-2` in this notebook.
- Make sure you have enough account level quota for `ml.g6.2xlarge for endpoint usage`. If not, you can request a quota increase from the AWS console here [SageMake Service Quotas](https://us-east-2.console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas?region=us-east-2)
- Upgrade your `boto3` library to the latest version. You can do this by running `!pip install boto3 --upgrade` in a code cell. This notebook is tested with `boto3==1.36.14`.
- An IAM role which gives access to model artifacts in AWS ECR registeries and S3, and deploy the model.


In [1]:
import boto3
import json
import re
import time

## Create an IAM role for Deployment
You can reuse this role for multiple deployments.

**IMPORTANT**: The IAM policy used here is for demonstration purposes only. In production environments, always follow the principle of least privilege by granting only the minimum necessary permissions required for your specific use case.

In [2]:
iam = boto3.client("iam")
role_name = "AmazonSageMakerExecutionRoleForBedrockMarketplace"

Let's define iam and trust policy.

In [3]:
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {"Service": "sagemaker.amazonaws.com"},
            "Action": "sts:AssumeRole",
        }
    ],
}

# Read inline policy from file
with open("iam_policy.json", "r") as f:
    inline_policy = json.load(f)

Creating the role if it doesn't exist.


In [4]:
try:
    role = iam.get_role(RoleName=role_name)
    print("Role exists, using it...")
except iam.exceptions.NoSuchEntityException:
    print("Role does not exist, creating it...")
    role = iam.create_role(
        RoleName=role_name, AssumeRolePolicyDocument=json.dumps(trust_policy)
    )
    iam.put_role_policy(
        RoleName=role_name,
        PolicyName="inline-policy",
        PolicyDocument=json.dumps(inline_policy),
    )

Role exists, using it...


## Create a new endpoint


In [5]:
region = "us-east-2"
model_arn = "arn:aws:sagemaker:us-east-2:aws:hub-content/SageMakerPublicHub/Model/deepseek-llm-r1-distill-llama-8b/1.0.0"
endpoint_name = "deepseek-r1-llama-8b-ep"
instance_type = "ml.g6.2xlarge"

bedrock_runtime = boto3.client("bedrock-runtime", region_name=region)
bedrock = boto3.client("bedrock", region_name=region)

In [6]:
def create_endpoint(
    model_arn, endpoint_name, instance_type, role_arn, timeout
):
    """
    Create a Bedrock Marketplace Endpoint for a model. After creating
    a new endpoint, this function will poll the endpoint until it is in service
    or until the timeout is reached.
    """
    endpoint = bedrock.create_marketplace_model_endpoint(
        modelSourceIdentifier=model_arn,
        endpointConfig={
            "sageMaker": {
                "initialInstanceCount": 1,
                "instanceType": instance_type,
                "executionRole": role_arn,
            }
        },
        endpointName=endpoint_name,
    )
    endpoint_arn = endpoint["marketplaceModelEndpoint"]["endpointArn"]

    # Poll the endpoint until it is in service
    print("Endpoint is created, waiting for it to be in service...")
    for s in range(timeout // 5):
        endpoint = bedrock.get_marketplace_model_endpoint(
            endpointArn=endpoint_arn
        )
        status = endpoint["marketplaceModelEndpoint"]["endpointStatus"]
        if status == "InService":
            print("Endpoint is in service.")
            break
        time.sleep(5)
    else:
        print("Timeout: Endpoint is not in service yet.")

    return endpoint_arn


def get_or_create_endpoint(
    model_arn, endpoint_name, instance_type, role_arn, timeout=360
):
    """
    Get or create a Bedrock Marketplace Endpoint for a model and return the
    endpoint ARN.
    """
    endpoints = bedrock.list_marketplace_model_endpoints(
        modelSourceEquals=model_arn
    )["marketplaceModelEndpoints"]

    if endpoints:
        print("Endpoint exists, using it...")
        return endpoints[0]["endpointArn"]

    print("Endpoint does not exist, creating it...")
    endpoint_arn = create_endpoint(
        model_arn, endpoint_name, instance_type, role_arn, timeout
    )

    return endpoint_arn

Get the endpoint ARN. If the endpoint exists, we'll reuse it otherwise we'll create a new endpoint.

In [7]:
endpoint_arn = get_or_create_endpoint(
    model_arn=model_arn,
    endpoint_name=endpoint_name,
    instance_type=instance_type,
    role_arn=role["Role"]["Arn"],
)

Endpoint does not exist, creating it...
Endpoint is created, waiting for it to be in service...
Endpoint is in service.


## Inference

In [8]:
def deepseek_r1_chat(
    bedrock_runtime,
    endpoint_arn,
    prompt,
    temperature=0.6,
    max_tokens=1000,
    top_p=0.9,
):
    """Constructs the payload for the deepseek model, sends it to the endpoint,
    and returns the response.

    Returns:
      The response is string made of two parts. Model's chain of thoughts
      which is contained within the <think></think> tags, and the final
      answer.
    """

    payload = {
        "inputs": f"""You are an AI assistant. Do as the user asks.
    ### Instruction: {prompt}
    ### Response: <think>""",
        "parameters": {
            "max_new_tokens": max_tokens,
            "top_p": top_p,
            "temperature": temperature,
        },
    }

    # Invoke model
    response = bedrock_runtime.invoke_model(
        modelId=endpoint_arn, body=json.dumps(payload)
    )

    output = None
    if response:
        body = response.get("body")
        if body:
            body = json.loads(body.read().decode("utf-8"))
            output = body.get("generated_text")

    return output

In [9]:
def deepseek_r1_parse_output(response):
    """Parses the response from the deepseek model and returns the output.

    Returns:
      Dict[str, str]: The parsed output with the keys "cot" and "answer".
    """
    output = {
        "cot": "",
        "answer": "",
    }

    if not response:
        return output

    # Extract content after "### Response:"
    content = response.split("### Response:")[1].strip()

    # Extract content between <think> and </think>
    cot_match = re.search(r"<think>(.*?)</think>", content, re.DOTALL)
    if cot_match:
        output["cot"] = cot_match.group(1).strip()

    # Extract final answer which comes after </think>
    if "</think>" in content:
        output["answer"] = content.split("</think>", 1)[1].strip()

    return output

In [10]:
prompt = "If Alice has 6 apples and gives half to Bob, who then eats 2 apples and shares the rest equally with his sister, how many apples does Bob's sister receive?"

response = deepseek_r1_chat(bedrock_runtime, endpoint_arn, prompt)
output = deepseek_r1_parse_output(response)

In [11]:
print(f"Raw Response:\n{response}\n")
print("-" * 80)
print(
    f"Chain of Thought:\n{output["cot"]}\n\n"
    f"Final Answer:\n{output["answer"]}"
)

Raw Response:
You are an AI assistant. Do as the user asks.
    ### Instruction: If Alice has 6 apples and gives half to Bob, who then eats 2 apples and shares the rest equally with his sister, how many apples does Bob's sister receive?
    ### Response: <think>
Okay, so I need to figure out how many apples Bob's sister receives after all these transactions. Let me break it down step by step.

First, Alice starts with 6 apples. She gives half to Bob. Hmm, half of 6 is 3, right? So Bob gets 3 apples. That leaves Alice with 3 apples as well because she gave away half.

Now, Bob has 3 apples. He eats 2 of them. So, if he eats 2, how many does he have left? Let me subtract 2 from 3. That gives me 1 apple remaining. So Bob now has 1 apple left.

Next, Bob shares the rest equally with his sister. The "rest" here refers to the apples he has after eating, which is 1 apple. He wants to share this equally, so he'll split it into two parts. If he shares 1 apple equally between himself and his sis

## Delete the endpoint


In [12]:
response = bedrock.delete_marketplace_model_endpoint(endpointArn=endpoint_arn)
if response["ResponseMetadata"]["HTTPStatusCode"] == 200:
    print("Deleting endpoint...")

Deleting endpoint...
