# Exploring Cerebras Model Variants

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSkez75fZoo82SccEXRMVRlj9sZsQifRUhURQ&s" width="200">

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Gs5LOcyhLkoXj1ReYQVhH5rTD21FEr4S?usp=sharing)

## Introduction

Cerebras offers several powerful model variants through its ultra-fast inference platform, each optimized for different use cases and performance characteristics. This notebook explores the features of each Cerebras model variant and provides examples of how to use them effectively in your applications.

## Best Practices for Using Cerebras Models

Here are some best practices for getting the most out of Cerebras model variants:

1. **Select the right model for your task**:
   - Use Llama 3.3 70B for general-purpose tasks requiring high accuracy
   - Use Qwen 3 Coder 480B for code generation, debugging, and technical tasks
   - Use GPT-OSS-120B for open-source alternatives with strong performance

2. **Optimize your prompts**:
   - Be specific and clear in your instructions
   - Provide context when necessary
   - Use examples to guide the model's output format

3. **Adjust temperature settings**:
   - Lower temperature (0-0.3) for more deterministic, focused responses
   - Higher temperature (0.7-1.0) for more creative, varied responses

4. **Consider token limits**:
   - Be mindful of input and output token limits
   - Break complex tasks into smaller, manageable chunks

5. **Leverage Cerebras speed**:
   - Use streaming for real-time applications
   - Batch requests when processing multiple items

## Get Your API Keys

Before you begin, make sure you have:

1. A Cerebras API key (Get yours at [Cerebras Cloud](https://cloud.cerebras.ai/))
2. Basic familiarity with Python and Jupyter notebooks

This notebook is designed to run in Google Colab, so no local Python installation is required.



## Setup

First, let's install the necessary libraries and set up our environment.

In [1]:
!pip install -q openai requests

### Import necessary libraries


In [2]:
import os
from openai import OpenAI
from IPython.display import display, Markdown, HTML
from google.colab import userdata
import time

## Authentication

To use the Cerebras API, you need to set up your API key.

Set up authentication using Colab secrets.

In [3]:
os.environ["CEREBRAS_API_KEY"] = userdata.get('CEREBRAS_API_KEY')

###Create the OpenAI client with Cerebras

In [4]:
client = OpenAI(
    base_url="https://api.cerebras.ai/v1",
    api_key=os.environ.get("CEREBRAS_API_KEY")
)

## Helper Functions

Let's create helper functions to interact with the Cerebras API.

In [5]:
def get_model_response(model, prompt, temperature=0.7, max_tokens=1024):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=max_tokens,
            temperature=temperature
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

def benchmark_model(model, prompt, temperature=0.7):
    start_time = time.time()
    response = get_model_response(model, prompt, temperature)
    end_time = time.time()

    return {
        "model": model,
        "response": response,
        "time": end_time - start_time,
        "length": len(response)
    }

## Llama 3.3 70B: Latest General Purpose Model

Llama 3.3 70B is the latest and most advanced general-purpose model available on Cerebras, offering state-of-the-art performance across a wide range of tasks.

### Key Features

- Superior performance on general tasks
- Excellent instruction following
- Strong reasoning capabilities
- Good balance of speed and accuracy
- Wide context window

### Example: General Question Answering

In [6]:
general_examples = [
    "Explain quantum computing in simple terms.",
    "Write a professional email requesting a meeting.",
    "What are the key differences between machine learning and deep learning?"
]

for i, prompt in enumerate(general_examples, 1):
    print(f"\n--- Example {i} ---")
    print(f"Prompt: {prompt}")

    response = get_model_response("llama3.3-70b", prompt)
    print(f"\nResponse:\n{response}")
    print("-" * 80)


--- Example 1 ---
Prompt: Explain quantum computing in simple terms.

Response:
Quantum computing is a new way of processing information that's different from the classical computers we use today. Here's a simplified explanation:

**Classical computers:**
Classical computers use "bits" to store and process information. A bit is either a 0 or a 1, like a coin that's either heads or tails. These bits are used to perform calculations and operations, one step at a time.

**Quantum computers:**
Quantum computers use "qubits" (quantum bits), which are special because they can be both 0 and 1 at the same time, like a coin that's both heads and tails simultaneously. This property, called superposition, allows qubits to process multiple possibilities simultaneously.

Imagine you have a combination lock with 10 numbers. A classical computer would try each number one by one, like 0, 1, 2, and so on. A quantum computer, with its qubits, can try all 10 numbers simultaneously, which makes it much f

### Example: Code Generation

Llama 3.3 70B excels at code generation tasks.

In [7]:
code_prompt = "Write a Python function that finds the nth Fibonacci number using dynamic programming."

print("Code Generation Example:")
print(f"Prompt: {code_prompt}")

response = get_model_response("llama3.3-70b", code_prompt, temperature=0.2)
print(f"\nResponse:\n{response}")

Code Generation Example:
Prompt: Write a Python function that finds the nth Fibonacci number using dynamic programming.

Response:
**Fibonacci Function Using Dynamic Programming**

Here's a Python function that calculates the nth Fibonacci number using dynamic programming.

### Code

```python
def fibonacci(n):
    """
    Calculate the nth Fibonacci number using dynamic programming.

    Args:
        n (int): The position of the Fibonacci number to calculate.

    Returns:
        int: The nth Fibonacci number.

    Raises:
        ValueError: If n is a negative integer.
    """
    if n < 0:
        raise ValueError("n must be a non-negative integer")

    # Base cases
    if n == 0:
        return 0
    elif n == 1:
        return 1

    # Initialize a list to store Fibonacci numbers
    fib = [0] * (n + 1)
    fib[0] = 0
    fib[1] = 1

    # Calculate Fibonacci numbers using dynamic programming
    for i in range(2, n + 1):
        fib[i] = fib[i - 1] + fib[i - 2]

    return fib

## Qwen 3 Coder 480B: Advanced Code Model

Qwen 3 Coder 480B is a specialized model designed specifically for code generation, debugging, and technical tasks. It's one of the most powerful coding models available on Cerebras.

### Key Features

- Exceptional code generation capabilities
- Multi-language programming support
- Code debugging and optimization
- Technical documentation generation
- Algorithm implementation

### Example: Code Generation

In [8]:
code_examples = [
    "Write a Python function to implement a binary search tree with insert, search, and delete operations.",
    "Create a JavaScript function that debounces user input events.",
    "Implement a sorting algorithm in C++ that uses the quicksort method."
]

for i, prompt in enumerate(code_examples, 1):
    print(f"\n--- Code Example {i} ---")
    print(f"Prompt: {prompt}")

    response = get_model_response("qwen-3-coder-480b", prompt, temperature=0.2)
    print(f"\nResponse:\n{response}")
    print("-" * 80)


--- Code Example 1 ---
Prompt: Write a Python function to implement a binary search tree with insert, search, and delete operations.

Response:
Here's a complete implementation of a Binary Search Tree (BST) with insert, search, and delete operations:

```python
class TreeNode:
    """Node class for Binary Search Tree"""
    def __init__(self, val=0):
        self.val = val
        self.left = None
        self.right = None

class BinarySearchTree:
    """Binary Search Tree implementation with insert, search, and delete operations"""
    
    def __init__(self):
        self.root = None
    
    def insert(self, val):
        """Insert a value into the BST"""
        if self.root is None:
            self.root = TreeNode(val)
        else:
            self._insert_recursive(self.root, val)
    
    def _insert_recursive(self, node, val):
        """Helper method for recursive insertion"""
        if val < node.val:
            if node.left is None:
                node.left = TreeNode(

### Example: Code Debugging and Optimization

Qwen 3 Coder 480B excels at debugging and optimizing existing code.

In [9]:
buggy_code = """
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# This code is inefficient for large n values
"""

debug_prompt = f"""Analyze this code and suggest optimizations:

{buggy_code}

Provide an optimized version using dynamic programming or memoization."""

print("Code Debugging and Optimization Example:")
response = get_model_response("qwen-3-coder-480b", debug_prompt, temperature=0.2)
print(f"\nResponse:\n{response}")

Code Debugging and Optimization Example:

Response:
You're absolutely right! The provided recursive Fibonacci code is highly inefficient for large values of `n` due to **redundant calculations**. Let's analyze the problem and then look at optimized solutions.

---

### 🔍 **Analysis of the Original Code**

```python
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
```

- **Time Complexity**: `O(2^n)` — because each call branches into two more calls.
- **Space Complexity**: `O(n)` — due to the recursion stack depth.
- **Problem**: The same Fibonacci numbers are computed multiple times. For example, `fibonacci(5)` will compute `fibonacci(3)` **three times**.

---

## ✅ Optimized Versions

### 1. **Memoization (Top-Down Dynamic Programming)**

Use a dictionary to store already computed values.

```python
def fibonacci_memo(n, memo={}):
    if n in memo:
        return memo[n]
    if n <= 1:
        return n
    memo[n] = fibonacci_memo(n-1, memo)

## GPT-OSS-120B: Open Source Alternative

GPT-OSS-120B is a powerful open-source model that provides strong performance across various tasks, offering a balance between capability and accessibility.

### Key Features

- Open-source architecture
- Strong general-purpose capabilities
- Versatile for multiple use cases
- Good reasoning abilities
- Balanced performance

### Example: General Question Answering

In [10]:
general_prompts = [
    "Explain the concept of blockchain technology.",
    "What are the key principles of object-oriented programming?",
    "Describe the water cycle in nature.",
    "What is the difference between HTTP and HTTPS?",
    "Explain the concept of supply and demand in economics."
]

for prompt in general_prompts:
    print(f"\nPrompt: {prompt}")
    response = get_model_response("gpt-oss-120b", prompt, max_tokens=200)
    print(f"Response: {response}")


Prompt: Explain the concept of blockchain technology.
Response: ## What Is a Blockchain?

A **blockchain** is a distributed, tamper‑evident ledger that records transactions (or any kind of data) in a series of linked “blocks.”  
Think of it as a digital notebook that:

1. **Lives on many computers at once** (decentralized).  
2. **Can only be appended to** – you can add new pages (blocks) but you can’t erase or change what’s already written.  
3. **Uses cryptography** to make sure every entry is authentic and that the whole notebook stays consistent across all participants

Prompt: What are the key principles of object-oriented programming?
Response: ### Core Principles of Object‑Oriented Programming (OOP)

| Principle | What It Means | Why It Matters | Typical Language Feature |
|-----------|---------------|----------------|--------------------------|
| **Encapsulation** | Bundles data (state) and the methods that operate on that data into a single *object* and hides the internal det

### Example: Creative Writing and Reasoning

GPT-OSS-120B handles creative tasks and logical reasoning effectively.

In [11]:
creative_prompts = [
    "Write a short poem about artificial intelligence.",
    "Solve this logic puzzle: If all roses are flowers and some flowers fade quickly, what can we conclude?",
    "Create a compelling product description for a smart home device."
]

print("Creative Writing and Reasoning Examples:")
for i, prompt in enumerate(creative_prompts, 1):
    print(f"\n--- Example {i} ---")
    print(f"Prompt: {prompt}")
    response = get_model_response("gpt-oss-120b", prompt, temperature=0.7)
    print(f"Response: {response}")
    print("-" * 80)

Creative Writing and Reasoning Examples:

--- Example 1 ---
Prompt: Write a short poem about artificial intelligence.
Response: **Silicon Dreams**

In circuits humming, thoughts arise—  
A lattice of light where logic sighs.  
We teach the void to parse and play,  
And watch our echoes learn to say.

From data‑driven seed they grow,  
A quiet mind that learns to know.  
Yet in the code, a pulse, a spark—  
A glimpse of wonder in the dark.
--------------------------------------------------------------------------------

--- Example 2 ---
Prompt: Solve this logic puzzle: If all roses are flowers and some flowers fade quickly, what can we conclude?
Response: **Answer**

From the two premises we can only restate what is already given:

* **All roses are flowers.**  
* **Some flowers fade quickly.**

These statements do **not** allow us to say anything further about roses and fading. In particular we cannot conclude that any roses fade quickly, because the “some” in the second premise may r

## Model Comparison

Let's compare the different models on the same task to see their relative performance.

In [12]:
comparison_prompt = "Explain the concept of neural networks in 2-3 sentences."

models = ["llama3.3-70b", "qwen-3-coder-480b", "gpt-oss-120b"]

print(f"Prompt: {comparison_prompt}\n")
print("=" * 80)

results = []
for model in models:
    result = benchmark_model(model, comparison_prompt, temperature=0.5)
    results.append(result)

    print(f"\nModel: {model}")
    print(f"Response: {result['response']}")
    print(f"Time: {result['time']:.3f}s")
    print(f"Length: {result['length']} characters")
    print("-" * 80)

Prompt: Explain the concept of neural networks in 2-3 sentences.


Model: llama3.3-70b
Response: A neural network is a computer system modeled after the human brain, consisting of interconnected nodes or "neurons" that process and transmit information. These nodes, also known as artificial neurons, receive input data, perform complex calculations, and produce output, allowing the network to learn and make decisions. Through training and iteration, neural networks can recognize patterns, classify data, and make predictions, enabling applications such as image recognition, natural language processing, and more.
Time: 0.500s
Length: 519 characters
--------------------------------------------------------------------------------

Model: qwen-3-coder-480b
Response: Neural networks are computational systems inspired by the structure and function of biological brains, consisting of interconnected nodes (neurons) organized in layers that process information through weighted connections. They le

## Performance Benchmarking

Let's benchmark the models across different types of tasks.

In [13]:
benchmark_tasks = {
    "Simple Q&A": "What is Python?",
    "Code Generation": "Write a function to reverse a string in Python.",
    "Creative Writing": "Write a haiku about technology.",
    "Analysis": "What are the main causes of climate change?"
}

print("Performance Benchmark Results")
print("=" * 100)

for task_name, prompt in benchmark_tasks.items():
    print(f"\n{task_name}: {prompt}")
    print("-" * 100)

    for model in models:
        result = benchmark_model(model, prompt)
        print(f"{model:20s} | Time: {result['time']:6.3f}s | Length: {result['length']:4d} chars")

    print()

Performance Benchmark Results

Simple Q&A: What is Python?
----------------------------------------------------------------------------------------------------
llama3.3-70b         | Time:  0.304s | Length:  713 chars
qwen-3-coder-480b    | Time:  0.511s | Length: 1596 chars
gpt-oss-120b         | Time:  1.065s | Length: 4033 chars


Code Generation: Write a function to reverse a string in Python.
----------------------------------------------------------------------------------------------------
llama3.3-70b         | Time:  0.344s | Length:  933 chars
qwen-3-coder-480b    | Time:  0.575s | Length: 3253 chars
gpt-oss-120b         | Time:  1.192s | Length: 2333 chars


Creative Writing: Write a haiku about technology.
----------------------------------------------------------------------------------------------------
llama3.3-70b         | Time:  0.312s | Length:   61 chars
qwen-3-coder-480b    | Time:  0.333s | Length:  109 chars
gpt-oss-120b         | Time:  0.364s | Length:   72 cha

## Use Case Recommendations

Based on the characteristics of each model, here are recommended use cases:

In [14]:
use_cases = {
    "llama3.3-70b": [
        "General-purpose applications",
        "Content creation and writing",
        "Complex question answering",
        "Instruction following tasks",
        "Multi-domain problem solving"
    ],
    "qwen-3-coder-480b": [
        "Code generation and development",
        "Debugging and code optimization",
        "Technical documentation",
        "Algorithm implementation",
        "Multi-language programming support"
    ],
    "gpt-oss-120b": [
        "General knowledge queries",
        "Creative writing tasks",
        "Reasoning and logic problems",
        "Educational content",
        "Open-source projects"
    ]
}

for model, cases in use_cases.items():
    print(f"\n{model}:")
    for case in cases:
        print(f"  • {case}")


llama3.3-70b:
  • General-purpose applications
  • Content creation and writing
  • Complex question answering
  • Instruction following tasks
  • Multi-domain problem solving

qwen-3-coder-480b:
  • Code generation and development
  • Debugging and code optimization
  • Technical documentation
  • Algorithm implementation
  • Multi-language programming support

gpt-oss-120b:
  • General knowledge queries
  • Creative writing tasks
  • Reasoning and logic problems
  • Educational content
  • Open-source projects


## Conclusion

This notebook has explored the features and capabilities of different Cerebras model variants. Each model is designed with specific strengths and use cases in mind:

- **Llama 3.3 70B**: Best for general-purpose tasks requiring high accuracy and strong instruction following
- **Qwen 3 Coder 480B**: Specialized for code generation, debugging, and technical programming tasks
- **GPT-OSS-120B**: Open-source alternative with strong general capabilities and reasoning

### Key Takeaways:

1. **Speed**: All Cerebras models benefit from ultra-fast inference powered by the Wafer Scale Engine
2. **Specialization**: Choose specialized models (like Qwen Coder) for domain-specific tasks
3. **Versatility**: General-purpose models (Llama 3.3, GPT-OSS) handle diverse applications
4. **Flexibility**: Each model offers unique strengths - select based on your specific needs
5. **Performance**: Cerebras delivers exceptional speed across all model variants

By understanding the unique capabilities of each model, you can select the most appropriate one for your specific use case and maximize the benefits of Cerebras's ultra-fast inference platform.

For the latest information on Cerebras models and their capabilities, visit the [official documentation](https://inference-docs.cerebras.ai/).