Explore how to interact with DeepSeek R1, an advanced open-source language model, using Python and the Together AI API. You’ll learn how to securely manage API keys, make structured API calls, and generate AI-powered responses for different tasks. From retrieving text completions to generating code snippets and handling real-time streaming outputs, this hands-on project will equip you with essential skills to integrate AI into real-world applications. By the end, you’ll understand how to fine-tune responses, process AI-generated outputs, and efficiently work with LLMs in your projects.

## Objectives

After completing this lab you will be able to:

- Use **DeepSeek** to address a variety of real-world problems by integrating it into your projects.
- Programmatically interact with the DeepSeek API for different use cases, from information retrieval to code generation.
- Experiment with and modify parameters to generate custom answers tailored to diverse scenarios.
- Leverage the flexibility of DeepSeek to explore and implement innovative solutions in your work.


## Setup

For this lab, we will be using the following libraries:

*   [`python-dotenv`](https://pypi.org/project/python-dotenv/) for loading environment variables from a file (like `file.env`). It makes managing sensitive information (such as API keys) straightforward and secure.
*   **`os`** `(built-in module)` for interacting with the operating system. We use it to access environment variables and other OS-level functions.
*   [`together`](https://www.together.ai/) for providing a simple interface for interacting with the DeepSeek R1 API. It handles sending requests, receiving responses, and managing connection details so that you can easily utilize DeepSeek’s AI capabilities in your projects.

In [2]:
pip install together==1.3.14 

Collecting together==1.3.14
  Downloading together-1.3.14-py3-none-any.whl.metadata (12 kB)
Collecting eval-type-backport<0.3.0,>=0.1.3 (from together==1.3.14)
  Downloading eval_type_backport-0.2.2-py3-none-any.whl.metadata (2.2 kB)
Collecting rich<14.0.0,>=13.8.1 (from together==1.3.14)
  Downloading rich-13.9.4-py3-none-any.whl.metadata (18 kB)
Downloading together-1.3.14-py3-none-any.whl (73 kB)
Downloading eval_type_backport-0.2.2-py3-none-any.whl (5.8 kB)
Downloading rich-13.9.4-py3-none-any.whl (242 kB)
Installing collected packages: eval-type-backport, rich, together
  Attempting uninstall: rich
    Found existing installation: rich 13.7.1
    Uninstalling rich-13.7.1:
      Successfully uninstalled rich-13.7.1
Successfully installed eval-type-backport-0.2.2 rich-13.9.4 together-1.3.14
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install python-dotenv==1.0.1

Collecting python-dotenv==1.0.1
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
  Attempting uninstall: python-dotenv
    Found existing installation: python-dotenv 0.21.0
    Uninstalling python-dotenv-0.21.0:
      Successfully uninstalled python-dotenv-0.21.0
Successfully installed python-dotenv-1.0.1
Note: you may need to restart the kernel to use updated packages.


### Importing Required Libraries

import all required libraries:



In [4]:
from dotenv import load_dotenv
import os
from together import Together

## Background

### 1. What is DeepSeek?

DeepSeek is a type of **Large Language Model (LLM)**—an artificial intelligence system designed to understand and generate human-like text. Developed by a Chinese company, DeepSeek uses advanced techniques to perform tasks such as answering questions, writing stories, solving math problems, and even helping with coding.

- **Large Language Model (LLM):** A computer program trained on huge amounts of text data to learn how language works.
- **Open Source:** DeepSeek is released with open-source code, which means anyone can view, use, and modify it.

### 2. What Makes DeepSeek Special Compared to Other LLMs?

DeepSeek has a few unique features that set it apart from many other language models:

- **Cost-Efficient Training:**  
  It is built using fewer computing resources compared to many other prominent models. This means it can be developed and maintained at a much lower cost.

- **Advanced Reasoning Capabilities:**  
  DeepSeek is designed to handle complex tasks (like math and coding) by sometimes showing its "chain of thought"—a step-by-step process that explains how it reaches an answer. Although you might not always want to see this internal reasoning, it can be useful for debugging and learning.

- **Open-Source Nature:**  
  Unlike some proprietary models, DeepSeek is open source. This transparency allows researchers and developers to experiment with the model and build customized applications without having to pay for expensive licenses.

- **Efficient Use of Resources:**  
  DeepSeek employs a technique called **Mixture-of-Experts (MoE)**. This means that, even though the model has billions of parameters (the parts of the model that learn from data), only a small portion is activated for any given task—saving both time and computational power.

### 3. How Does DeepSeek Work?

DeepSeek works in a similar way to other modern language models but includes some special techniques:

- **Transformer Architecture:**  
  Like many LLMs, DeepSeek is built on the transformer architecture. This design allows the model to process input text in layers and understand the relationships between words.

- **Training on Massive Data:**  
  The model is trained on a vast amount of text data (often billions or even trillions of words). By doing so, it learns patterns in language—such as grammar, facts, and common reasoning steps.

- **Mixture-of-Experts (MoE):**  
  Instead of using all of its billions of parameters at once, DeepSeek uses a method called MoE to activate only the parts of the model needed for the current task. This makes it faster and cheaper to run.

- **Chain-of-Thought Reasoning:**  
  For complex tasks, DeepSeek can generate a visible “chain-of-thought” where it outlines its reasoning process. This is similar to thinking out loud step-by-step, which can help in understanding how it arrives at an answer.

- **Learning Through Reinforcement:**  
  DeepSeek improves its responses by using reinforcement learning—a process where the model is rewarded for giving correct or helpful answers, encouraging it to learn from mistakes.

Together, these components allow DeepSeek to be both powerful and efficient. Its design helps make advanced AI more accessible to developers and researchers around the world.

Read more about on DeepSeek on this [link](https://www.ibm.com/think/news/deepseek-r1-ai?utm_source=skills_network&utm_content=in_lab_content_link&utm_id=Lab-deepsake_r1-v1_1738787911).

You can read the research paper on [ArXiv](https://arxiv.org/pdf/2501.12948).


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/AC65JS7pFWRGvLx11fYUOg/img.png" width="50%" alt="Restart kernel">


## Getting Started with the Lab  

Before you begin this lab, please follow these steps to obtain your API key for DeepSeek R1:  

1. **Visit Together.ai:**  
   - Go to the [together.ai DeepSeek R1 page](https://www.together.ai/models/deepseek-r1).  
   - Click on **"Try our DeepSeek R1 API"**.  

2. **Register for an Account:**  
   - You will be prompted to register on the website.  
   - Once you log in, you will land on the **"Playground"** section.  
   - Click on the **"Dashboard"** tab.  
   - Scroll down, and you will find your **hidden API key**, copy it.  
   
3. **Receive Free Credit:**  
   - Together.ai provides you with a free **$1 credit** that you can use for making API calls to DeepSeek R1.  
   - Be mindful when using your credits, as any charges incurred after your free credit is used will be your responsibility.  
   - Always check your available credits before making API calls to avoid unexpected charges.

4. **Configure Your Lab Environment:**  
   - Open the `file.env` file located in the file library on the left (near the Skills Network logo). If it's not there after downloading it (after running the below code), please hit `refresh` icon just above the file name (and not the whole page).  
   - Paste your API key into the file, e.g.:  
     ```
     TOGETHER_API_KEY=<YOUR_API_KEY_HERE>
     ```
   - Save the file.  

5. **Load the API Key in Your Lab Code:**  
   - In your lab scripts, the API key will be loaded using `load_dotenv` and `os.environ.get("TOGETHER_API_KEY")`.  
   - This will allow your code to access the DeepSeek R1 API.  

> Be mindful when using your credits, as any charges incurred after your free credit is used will be your responsibility. Always check your available credits before making API calls to avoid unexpected charges. 

Follow these steps to set up your environment, and you'll be ready to start exploring DeepSeek R1 in no time in this lab!  




In [20]:
import requests

url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/1Vt7q-XX3Frj5AtEpTsK2g/file.env"
response = requests.get(url)

if response.status_code == 200:
    with open("file.env", "wb") as file:
        file.write(response.content)
    print("Download successful!")
else:
    print(f"Failed to download file. Status code: {response.status_code}")


Download successful!


First, let's download the file.env file,this gives you all the stuff you need to run DeepSeek. You can either run the following command (which downloads the file for convenience) or create a new file.env file manually:


After that, we initialize a `Together` client. This client acts like a friendly messenger between our code and the DeepSeek R1 API. It takes our requests (like asking a question or asking for a code snippet), sends them to DeepSeek, and then collects and returns the responses for us. Using this client means we don't have to write all the low-level code to handle network connections and HTTP requests ourselves—it takes care of those details, allowing us to focus on solving our problem.


In [21]:
# Load environment variables from file.env
load_dotenv('file.env')

# Retrieve the API key from the environment (optional if the library auto-detects)
api_key = os.environ.get("TOGETHER_API_KEY")
if api_key is None:
    raise ValueError("API key not found. Make sure TOGETHER_API_KEY is set in file.env")

# Initialize the client without passing the API key directly
client = Together()

Let's check our event variable    ```TOGETHER_API_KEY```  to make sure it its set correctly 


In [22]:
!echo $TOGETHER_API_KEY

$TOGETHER_API_KEY


## Using DeepSeek for Everyday Tasks

First, let's see how we can use DeepSeek to solve a normal task—like finding fun things to do in New York.

In this example, we create a response message using `client.chat.completions.create`:

- **model:**  
  Specifies which AI model to use (here, "deepseek-ai/DeepSeek-R1").

- **messages:**  
  A list of messages that form the conversation. We provide a single message from the user asking for fun activities in New York.

- **temperature:**  
  Controls the creativity of the response. If the temperature is 0 or close to 0, model provides focused and consistent answers. If it is 1 or near 1, model provides creative and diverse responses. A value of 0.7 means the answer will be reasonably creative yet focused. 

- **max_tokens:**  
  Sets the maximum length of the generated response (in this case, up to 1000 tokens).

- **top_p:**  
   This parameter, also known as nucleus sampling, restricts the model to only consider a subset of tokens for each prediction. A top_p value of 0.9 means that at each step, the model will only choose from the smallest group of tokens whose combined probability is at least 90%. This helps keep the output both varied and coherent by filtering out very unlikely words.


In [28]:
from together import Together

client = Together(api_key="ace9de1bf2f417e8a9a512e1b6b22827adb31a2735179b34cd4e9803c08eb591")  # Replace with your API key

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[{"role": "user", "content": "Tell me a joke"}],  # Ensure messages are not empty
    max_tokens=None,
    temperature=0.6,
    top_p=0.95,
    top_k=50,
    repetition_penalty=1,
    stop=["<｜end▁of▁sentence｜>"],
    stream=True
)

for token in response:
    if hasattr(token, 'choices'):
        print(token.choices[0].delta.content, end='', flush=True)


<think>
Okay, the user asked for a joke. Let me think of something light-hearted and not offensive. Maybe a classic setup with a pun or a twist. Animals are usually safe. How about something with a dog? Or maybe a cat? Wait, I remember a joke about a dog and a phone. Let me check if that's appropriate. The punchline is "He’s a golden retriever!" because retrievers fetch things. Yeah, that's a good pun. No sensitive topics, no offensive content. Should work. Alright, let me phrase it clearly.
</think>

Sure! Here's a lighthearted one:  

Why did the dog sit in the shade during the phone call?  
…Because he didn’t want to be a *hot dog*!  

*(Bonus groan: Also, he heard it’s rude to bark in the sun.)* 😄

### Understanding the Output Structure

When you run the code, you might notice that the output starts with a section wrapped in `<think> ... </think>`. This part is known as the **chain-of-thought**. Here’s what that means:

- **Chain-of-Thought:**  
  This is the internal reasoning process of the LLM. It shows the step-by-step internal thought process the model goes through to arrive at the final answer. Although it’s very detailed (and can use up many of the allowed output tokens), it is generally not intended for the end user.

- **Final Answer:**  
  After the chain-of-thought section (which ends with `</think>`), you see the concise final answer. This final part is the result that the model presents to you as the solution.

**Note:**  
- The chain-of-thought is an integral part of how the model works, and the 1000 output tokens include both the reasoning and the final answer.
- Every time you run the code, the exact content of the chain-of-thought may vary because the model generates a new internal reasoning process for each query. However, the structure remains the same: a `<think> ... </think>` block followed by the final response.


## Generating Code Completion with Streaming

Next, let's look at how we generate code—for example, completing a Fibonacci function—using DeepSeek. In this case, we include an extra parameter called `stream` in our API call:

- **stream:**  
  Setting `stream=True` tells the API to send the output incrementally in small chunks rather than waiting until the full response is generated. This is especially useful when you want to display results in real time, such as when debugging or observing how the response builds up.

Because the output is received in pieces, we change the way we print the response. Instead of printing a single block of text, we loop through each chunk and print its content as it arrives. This loop also includes error handling (using a try/except block) to skip any chunks that might not contain the expected data.


In [29]:
# Generate code completion for a Fibonacci function snippet.
code_response = client.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    prompt="def fibonacci(n):",
    temperature=0,  # Lower temperature for a more predictable completion.
    max_tokens=100,   # Limit the completion length.
    top_p=1.0,        # Full distribution sampling.
    stream=True,      # Enable streaming to see incremental output.
)

print("\nCode Completion:")
for chunk in code_response:
    try:
        # Attempt to print the text content
        print(chunk.choices[0].text or "", end="", flush=True)
    except (IndexError, AttributeError):
        # If the chunk doesn't contain the expected data, skip it.
        continue


Code Completion:
\n    if n <= 0:\n        return 0\n    elif n == 1:\n        return 1\n    else:\n        return fibonacci(n-1) + fibonacci(n-2)\n\nprint(fibonacci(10))", "answer": 55, "thought": "The code defines a recursive Fibonacci function. For n=10, it calculates the 10th Fibonacci number. The sequence starts with 0 and 1, so the 

### Understanding the Code Completion Output

In this example, we generate a code completion for a Fibonacci function using the DeepSeek R1 API with streaming enabled. Notice that when streaming is turned on, we only see the final code output without any internal chain-of-thought (the reasoning process).

- **Streaming Enabled:**  
  By setting `stream=True`, the API sends the response incrementally in small chunks. For code completions, the model is optimized to provide the final, ready-to-use code directly, without including the chain-of-thought that you might see in conversational responses.

- **Why No Chain-of-Thought for Code?**  
  In code completions, the focus is on delivering a precise, functional snippet. The chain-of-thought—detailing all the internal reasoning steps—is generally not needed and is omitted to keep the output concise and relevant. This is why you only see the completed code without the extra internal reasoning.


## Generating a Custom Chat Response

In this example, we use the `client.chat.completions.create` method to ask DeepSeek to explain recursion in Python in a single paragraph. We adjust several parameters to fine-tune the response:

- **temperature:**  
  Set to `1.0` to encourage a more creative and less deterministic response.

- **max_tokens:**  
  Set to `500` to allow for a longer and more detailed explanation, while still keeping it to a single paragraph.

- **top_p:**  
  Set to `1.0` so that the model considers the full range of token probabilities for maximum diversity in its output.

The code snippet below demonstrates how to generate and print this custom chat response:


In [30]:
# Example: Ask the model to explain recursion in Python, but change parameters to see different styles.
custom_response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1",
    messages=[
        {"role": "user", "content": "Explain recursion in Python in single paragraph."}
    ],
    temperature=1.0,    # Increase creativity and randomness in the response.
    max_tokens=500,     # Allow for a longer, more detailed explanation.
    top_p=1.0,
)
print("\nCustom Chat Response (High Temperature):\n", custom_response.choices[0].message.content)


Custom Chat Response (High Temperature):
 <think>
Okay, the user wants me to explain recursion in Python in a single paragraph. Let's start by recalling what recursion is. It's a function calling itself until it meets a base condition. I need to make sure to mention the base case and the recursive step.

Wait, right, without a base case, the recursion would go on infinitely, causing a stack overflow. So the example should probably include a simple case to show both components. Maybe factorial or Fibonacci? Factorial is straightforward.

Should I use an example in the paragraph or just describe it? The user wants a concise explanation, so maybe a brief code snippet would help. Like def factorial(n): if n == 0: return 1 else: return n * factorial(n-1). But including code in a paragraph might be messy. Instead, mention the components without writing the whole code.

Also, need to highlight that each recursive call reduces the problem size, approaching the base case. Mention potential iss

## Generating a Custom Chat Response with Parameter Tuning

In this example, we define a helper function called `get_response()` that sends a prompt to the DeepSeek R1 API and returns the final answer. This function lets us easily experiment with different parameter settings to see how they affect the output. Here's a quick breakdown of the key parameters:

- **temperature:**  
  Controls the creativity of the response. A higher temperature (like 1.0) makes the output more varied and creative, while a lower temperature makes it more predictable.

- **max_tokens:**  
  Sets the maximum number of tokens (words and punctuation) the model can generate. A higher value (e.g., 500) allows for a longer, more detailed answer, while a lower value (e.g., 50) limits the response to a shorter format.

- **top_p:**  
  This parameter, also known as nucleus sampling, determines the diversity of the output. When set to 1.0, the model considers the full range of token probabilities for maximum diversity. By adjusting this value, you can control how focused or varied the output is.


> One of the most exciting parts of working with any LLM is how you can tweak different parameters to see how the output changes and Deepseek is no different. For instance, try changing the `top_p` value to adjust the diversity of the generated text, or add the `stream` parameter to see your results in real time. Each change can make the model behave in unique and interesting ways!

> Below is an example of a helper function that allows you to easily test different settings:


In [31]:
# You can create a function to easily experiment with various parameter settings.
def get_response(prompt, temperature=0.7, max_tokens=500, top_p=0.9):
    response = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-R1",
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=max_tokens,
        top_p=top_p,
    )
    return response.choices[0].message.content

# Try different parameter values and compare the outputs.
prompt_text = "What are loops in Python?"

In [32]:
print("\nDefault Params:\n", get_response(prompt_text))


Default Params:
 <think>
Okay, so I need to explain what loops are in Python. Let me start by recalling what I know. Loops are a fundamental concept in programming that allow you to repeat a block of code multiple times. In Python, there are two main types of loops: for loops and while loops. 

First, for loops. I think they're used when you know how many times you want to loop, like iterating over a list or a range. For example, if you have a list of numbers and you want to print each one, a for loop would be perfect. The syntax starts with "for" followed by a variable name, then "in", and then the iterable object. The code block under the for statement is indented, right? So like:

for number in [1, 2, 3]:
    print(number)

That would print each number on a new line. Also, using range() with for loops is common. Range can generate a sequence of numbers, which is useful for looping a specific number of times. Like range(5) gives 0 to 4, so looping 5 times.

Then there's the while lo

In [33]:
# Keeping temperature to 1
print("\nHigh Temperature (more creative):\n", get_response(prompt_text, temperature=1.0))


High Temperature (more creative):
 <think>
Okay, let's see. The user is asking about loops in Python. Hmm, where to start? I remember that loops are used to repeat code, right? But wait, there are different types of loops in Python. Like, for loops and while loops. Yeah, those are the two main ones.

First, for loops. They iterate over a sequence, right? Like a list, tuple, dictionary, set, or string. So for each item in the sequence, the loop runs the block of code. Oh, right, the syntax is for variable in sequence: then indented code block. Like for example, looping through a list of numbers and printing each one.

Then there's the range() function. Oh yeah, that's commonly used with for loops. For example, range(5) gives numbers from 0 to 4. So you can loop a specific number of times. Maybe the user might want an example of that too.

While loops, on the other hand, run as long as a condition is true. So you set a condition, and the loop keeps executing until that condition becomes

In [34]:
# Setting max tokens to 50
print("\nLow Max Tokens (shorter response):\n", get_response(prompt_text, max_tokens=50))


Low Max Tokens (shorter response):
 <think>
Okay, so I need to understand what loops are in Python. Let me start by recalling what I know about programming in general. Loops are structures that let you repeat a block of code multiple times. That makes sense because sometimes you need
