# Using DeepSeek-R1-Distill-Qwen-14B reasoning model hosted on Amazon SageMaker with DJL Serving DLC


❗This notebook works well on `ml.t3.medium` instance with `PyTorch 2.2.0 Python 3.10 CPU optimized` kernel from **SageMaker Studio Classic** or `Python3` kernel from **JupyterLab**.

# Set up Environment

In [None]:
%%capture --no-stderr

!pip install -U pip
!pip install -U "sagemaker>=2.237.3"
!pip install -U "transformers>=4.47.0"

In [None]:
import boto3

aws_region = boto3.Session().region_name
aws_region

In [None]:
import boto3
from typing import List


def get_cfn_outputs(stackname: str, region_name: str='us-east-1') -> List:
    cfn = boto3.client('cloudformation', region_name=region_name)
    outputs = {}
    for output in cfn.describe_stacks(StackName=stackname)['Stacks'][0]['Outputs']:
        outputs[output['OutputKey']] = output['OutputValue']
    return outputs

In [None]:
CFN_STACK_NAME = "DeepSeekR1EndpointStack"
cfn_stack_outputs = get_cfn_outputs(CFN_STACK_NAME, aws_region)

endpoint_name = cfn_stack_outputs['EndpointName']
model_name = cfn_stack_outputs['ModelName']

model_name, endpoint_name

# Create a Predictor with SageMaker Endpoint name

In [None]:
from sagemaker import Predictor
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer


predictor = Predictor(
    endpoint_name=endpoint_name,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

# Run Inference

### Message API

- Ref: https://docs.djl.ai/docs/serving/serving/docs/lmi/user_guides/chat_input_output_schema.html#message

In [None]:
import json


prompt = "Help me write a quick sort code in python"

messages = [
    {
        "role": "user",
        "content": prompt
    }
]

response = predictor.predict({
    "messages": messages,
    "max_tokens": 5000
})

print(json.dumps(response, indent=2))

{
  "id": "chatcmpl-140334570478352",
  "object": "chat.completion",
  "created": 1737954736,
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Sure, here is a simple implementation of the Quick Sort algorithm in Python:\n\n```python\ndef quick_sort(arr):\n    if len(arr) <= 1:\n        return arr\n    pivot = arr[len(arr) // 2]\n    left = [x for x in arr if x < pivot]\n    middle = [x for x in arr if x == pivot]\n    right = [x for x in arr if x > pivot]\n    return quick_sort(left) + middle + quick_sort(right)\n\nprint(quick_sort([3,6,8,10,1,2,1]))\n# Output: [1, 1, 2, 3, 6, 8, 10]\n```\n\nThis quick sort implementation uses the divide-and-conquer strategy. It works by choosing a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then recursively sorted. This is done until the entire array is sorted.\n"
  

In [None]:
import json


prompt = "Generate 10 random numbers. Please reason step by step."

messages = [
    {
        "role": "user",
        "content": prompt
    }
]

response = predictor.predict({
    "messages": messages,
    "max_tokens": 5000,
    "temperature": 0.6,
    "top_p": 0.9,
})

print(response['choices'][0]['message']['content'])

Sure, let's generate 10 random numbers using Python programming language.

Here's the step-by-step process:

1. **Import the random module:** This module provides various functions for generating random numbers.

```python
import random
```

2. **Initialize an empty list:** We will use this list to store our random numbers.

```python
random_numbers = []
```

3. **Use a loop to generate random numbers:** The `randint()` function in the random module generates a random integer. We will use a loop to generate 10 random integers.

```python
for i in range(10):
    random_numbers.append(random.randint(1, 100))
```

In the above code, `randint(1, 100)` generates a random integer between 1 and 100. The loop runs 10 times, so it generates 10 random integers.

4. **Print the list of random numbers:** Finally, we print the list of random numbers.

```python
print(random_numbers)
```

So, the complete code will look like this:

```python
import random

random_numbers = []

for i in range(10):
  

# Streaming

### Message Schema streaming

In [None]:
import io
import json
from sagemaker.iterators import BaseIterator


class MessageTokenIterator(BaseIterator):
    def __init__(self, stream):
        super().__init__(stream)
        self.byte_iterator = iter(stream)
        self.buffer = io.BytesIO()
        self.read_pos = 0

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            self.buffer.seek(self.read_pos)
            line = self.buffer.readline()

            if line and line[-1] == ord("\n"):
                self.read_pos += len(line)
                full_line = line[:-1].decode('utf-8')
                line_data = json.loads(full_line.lstrip('data:').rstrip('\n'))
                return line_data['choices'][0]['delta'].get('content', '')
            chunk = next(self.byte_iterator)
            self.buffer.seek(0, io.SEEK_END)
            self.buffer.write(chunk['PayloadPart']['Bytes'])

In [None]:
prompt = "Generate 10 random numbers. Please reason step by step."

messages = [
    {
        "role": "user",
        "content": prompt
    }
]

payload = {
    "messages": messages,
    "max_tokens": 5000,
    "temperature": 0.6,
    "top_p": 0.9,
    "stream": "true"
}

response_stream = predictor.predict_stream(
    data=payload,
    custom_attributes="accept_eula=false",
    iterator=MessageTokenIterator,
)

for token in response_stream:
    print(token, end="", flush=True)

Sure, here's a simple Python script that generates 10 random numbers using the random module.

```python
import random

random_numbers = []
for i in range(10):
    random_numbers.append(random.randint(1, 100))

print(random_numbers)
```

This script works as follows:

1. The `random` module is imported. This module contains various functions for generating random numbers.

2. An empty list `random_numbers` is created to store the generated random numbers.

3. A `for` loop is started, which will run 10 times.

4. Inside the loop, `random.randint(1, 100)` is called to generate a random integer between 1 and 100 (inclusive). This number is then appended to `random_numbers`.

5. After the loop has run 10 times, `random_numbers` will contain 10 random integers between 1 and 100.

6. Finally, `random_numbers` is printed to the console.

This will print out 10 random numbers between 1 and 100.


In [None]:
prompt = "첫째항과 공비가 모두 양수 k인 등비수열 {a_n}이 (a_4/a_2)+(a_2/a_1)=30을 만족할 때, k의 값은?"

messages = [
    {
        "role": "user",
        "content": prompt
    }
]

payload = {
    "messages": messages,
    "max_tokens": 5000,
    "temperature": 0.6,
    "top_p": 0.8,
    "stream": "true"
}

response_stream = predictor.predict_stream(
    data=payload,
    custom_attributes="accept_eula=false",
    iterator=MessageTokenIterator,
)

for token in response_stream:
    print(token, end="", flush=True)

등비수열의 항은 다음과 같이 나타낼 수 있습니다:

a_n = a_1 * r^n

여기서 r은 공비이고, a_1은 첫 번째 항입니다. 주어진 등비수열의 관계식은 다음과 같습니다:

(a_4/a_2) + (a_2/a_1) = 30

이를 풀면:

a_2 * a_2 * a_4 = 30 * a_1 * a_1

여기서 a_2와 a_4는 수열의 두 번째와 네 번째 항에 해당하고, a_1은 첫 번째 항에 해당합니다. 따라서, 우리는 다음 식을 얻을 수 있습니다:

a_2^2 * a_4 = 30 * a_1^2

이 식에서 a_2와 a_4는 양수이므로, a_2^2는 30*a_1^2의 제곱근입니다. 따라서, a_2의 값은 30*a_1^2의 제곱근입니다. 따라서, a_2의 값은 양수이며, 따라서 k의 값은 30*a_1^2의 제곱근입니다.

이를 Python 코드로 표현하면 다음과 같습니다:

```python
import math

# a_1의 값을 제곱근으로 설정
a_1 = math.sqrt(30)

# a_2의 값을 계산
a_2 = math.sqrt(30 * a_1**2)

# k의 값을 계산
k = a_2

print(k)
```

이 코드는 a_1의 값을 30의 제곱근으로 설정하고, a_2의 값을 30*a_1^2의 제곱근으로 설정하고, k의 값을 a_2로 설정합니다. 그런 다음, k의 값을 출력합니다.


# Clean up the environment

In [None]:
predictor.delete_model()
predictor.delete_endpoint()

# References

- [DeepSeek-R1 Model Card](https://huggingface.co/deepseek-ai/DeepSeek-R1#usage-recommendations)
- [deepseek-ai/deepseek-coder-6.7b-instruct SageMaker LMI deployment guide](https://github.com/aws-samples/llm_deploy_gcr/blob/main/sagemaker/deepseek_coder_6.7_instruct.ipynb)