<div style="display: flex; align-items: center; gap: 40px;">

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSkez75fZoo82SccEXRMVRlj9sZsQifRUhURQ&s" width="300">

<div>
  <h2>Cerebras Inference</h2>
  <p>Cerebras Systems builds the world's largest computer chip - the Wafer Scale Engine (WSE) - designed specifically for AI workloads. This cookbook provides comprehensive examples, tutorials, and best practices for developing and deploying AI models using Cerebras infrastructure, including both training on WSE clusters and fast inference via Cerebras Cloud.</p>
</div>

</div>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/17BMZjby9ZZ6cJopyu4xWiAE6Nj3gNIBy?usp=sharing)

# 🧠 Getting Started with Cerebras SDK

This notebook provides a comprehensive guide to using the Cerebras SDK for AI inference. We'll cover:

1. **Setup and Installation** - Installing the SDK and configuring API keys
2. **Basic Chat Completions** - Simple text generation examples
3. **Advanced Usage** - Streaming, function calling Examples
4. **Model Comparison** - Testing different models including gpt-oss-120b

## 📋 Prerequisites

- A Cerebras account and API key from [Cerebras Cloud](https://cloud.cerebras.ai/)
- Python 3.7+
- Basic understanding of Python and AI/ML concepts

---

## Step 1: Installation and Setup

First, let's install the required packages and set up our environment.

In [None]:
# Install required packages
!pip install --upgrade cerebras_cloud_sdk langchain langchain-community python-dotenv

# Import necessary libraries
import os
import json
import time
from typing import List, Dict, Any
from cerebras.cloud.sdk import Cerebras
from google.colab import userdata

print("✅ Installation complete!")

## Step 2: API Key Configuration

Set up your Cerebras API key. **Never hardcode your API key in production code!**

Get your API key from: https://cloud.cerebras.ai/

In [None]:
# Initialize the Cerebras client
client = Cerebras(
    api_key=userdata.get("CEREBRAS_API_KEY"),
)

print("🎯 Cerebras client initialized successfully!")

🎯 Cerebras client initialized successfully!


### Example 1: Basic Chat Completion


In [None]:
print("📝 EXAMPLE 1: Basic Chat Completion")
print("="*60)

# Create a simple chat completion
response = client.chat.completions.create(
      model="gpt-oss-120b",  # Using Llama 3.1 8B model
      messages=[
                {
                    "role": "system",
                    "content": "You are a helpful AI assistant specialized in explaining complex topics clearly."
                },
                {
                    "role": "user",
                    "content": "Explain what makes Cerebras Wafer-Scale Engine unique for AI training in simple terms."
                }
            ],
            max_tokens=500,
            temperature=0.7,
        )

print("🤖 Model Response:")
print("-" * 40)
print(response.choices[0].message.content)

### Example 2: Complex Reasoning Task Using GPT-OSS-120B

In [None]:
print("\n" + "="*60)
print("🎯 EXAMPLE 2: GPT-OSS-120B Model Usage")
print("="*60)


# Complex reasoning task using GPT-OSS-120B
response = client.chat.completions.create(
      model="gpt-oss-120b",  # Using the powerful GPT-OSS-120B model
      messages=[
                {
                    "role": "system",
                    "content": "You are an expert software architect and AI researcher. Provide detailed, technical insights."
                },
                {
                    "role": "user",
                    "content": """
                    Design a scalable architecture for a real-time AI inference system that can handle:
                    1. 10,000+ concurrent requests
                    2. Multiple model types (text, image, multimodal)
                    3. Sub-100ms latency requirements
                    4. Global deployment across 5 regions

                    Include specific technologies, patterns, and considerations.
                    """
                }
            ],
            max_tokens=1000,
            temperature=0.3,  # Lower temperature for more focused technical responses
            top_p=0.9,
        )

print("🧠 GPT-OSS-120B Response:")
print("-" * 50)
print(response.choices[0].message.content)
print("-" * 50)
print(f"Model: {response.model}")
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")

### Example 3: Streaming Responses


In [None]:
print("\n" + "="*60)
print("🌊 EXAMPLE 3: Streaming Responses")
print("="*60)

print("🔄 Starting streaming response...")
print("📝 Response (streaming):")
print("-" * 40)

# Create streaming completion
stream = client.chat.completions.create(
    model="gpt-oss-120b",  # Using larger model for better responses
    messages=[
        {
            "role": "user",
            "content": "Write a creative short story about an AI that discovers it's running on a Cerebras Wafer-Scale Engine. Make it engaging and technical."
        }
    ],
    max_tokens=800,
    temperature=0.8,
    stream=True,  # Enable streaming
)

# Process streaming chunks
full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        content = chunk.choices[0].delta.content
        print(content, end="", flush=True)
        full_response += content

print("\n" + "-" * 40)
print(f"📊 Total characters streamed: {len(full_response)}")

### Example 4: Advanced Parameters And Function Calling

In [None]:
print("\n" + "="*60)
print("🌡️ EXAMPLE 4: Testing Different Temperatures")
print("="*60)

question = "Write a creative story about AI"

# Low temperature = more focused
print("\n• Low temperature (0.2) - Focused response:")
response_low = client.chat.completions.create(
    model="gpt-oss-120b", # Using a model that supports temperature
    messages=[{"role": "user", "content": question}],
    max_tokens=200,
    temperature=0.2  # Low = more predictable
)
print(response_low.choices[0].message.content)

# High temperature = more creative
print("\n• High temperature (0.9) - Creative response:")
response_high = client.chat.completions.create(
    model="gpt-oss-120b", # Using a model that supports temperature
    messages=[{"role": "user", "content": question}],
    max_tokens=200,
    temperature=0.9  # High = more creative
)
print(response_high.choices[0].message.content)

print("\n" + "="*50)

### Basic Function Calling

In [None]:
print("🛠️ Function Calling")
print("-" * 30)

# Simple function definition
simple_tool = [
    {
        "type": "function",
        "function": {
            "name": "add_numbers",
            "description": "Add two numbers together",
            "parameters": {
                "type": "object",
                "properties": {
                    "number1": {"type": "number", "description": "First number"},
                    "number2": {"type": "number", "description": "Second number"}
                },
                "required": ["number1", "number2"]
            }
        }
    }
]

try:
    print("🤖 Asking AI to use the add_numbers function:")

    response = client.chat.completions.create(
        model="gpt-oss-120b",
        messages=[
            {"role": "user", "content": "Please add 25 and 17 using the add_numbers function"}
        ],
        tools=simple_tool,
        tool_choice="auto",  # Let AI decide when to use tools
        max_tokens=200,
        temperature=0.3
    )

    # Check if AI wants to call our function
    if response.choices[0].message.tool_calls:
        print("✅ AI decided to use the function!")

        for tool_call in response.choices[0].message.tool_calls:
            print(f"Function name: {tool_call.function.name}")
            print(f"Function arguments: {tool_call.function.arguments}")

            # Parse the arguments and do the calculation
            args = json.loads(tool_call.function.arguments)
            result = args["number1"] + args["number2"]
            print(f"🧮 Calculation result: {args['number1']} + {args['number2']} = {result}")
    else:
        print("ℹ️ AI responded without using the function:")
        print(response.choices[0].message.content)

except Exception as e:
    print(f"❌ An error occurred: {e}")

### Example 5: Code Generation with Function Calling

In [None]:
print("\n" + "="*50)
print("💻 EXAMPLE 6: Code Generation with Function Calling")
print("="*50)

# Define the generate_code function tool
generate_code_tool = [
    {
        "type": "function",
        "function": {
            "name": "generate_code",
            "description": "Generate code snippets in specified language",
            "parameters": {
                "type": "object",
                "properties": {
                    "language": {"type": "string", "description": "Programming language"},
                    "task": {"type": "string", "description": "What the code should do"},
                    "complexity": {"type": "string", "enum": ["simple", "medium", "advanced"]}
                },
                "required": ["language", "task"]
            }
        }
    }
]

try:
    print("🤖 Asking AI to generate code:")
    response = client.chat.completions.create(
        model="gpt-oss-120b",
        messages=[{
            "role": "user",
            "content": "Generate Python code to sort a list of numbers, medium complexity"
        }],
        tools=generate_code_tool,
        tool_choice="auto",
        max_tokens=400,
        temperature=0.3
    )

    if response.choices[0].message.tool_calls:
        print("✅ AI decided to use the function!")
        for tool_call in response.choices[0].message.tool_calls:
            print(f"Function: {tool_call.function.name}")
            args = json.loads(tool_call.function.arguments)
            print(f"🔧 Code Request: {args}")

            # Simulate code generation
            if tool_call.function.name == "generate_code":
                print(f"📝 Generated {args['language']} code for: {args['task']}")
    else:
        print("ℹ️ AI responded without using functions:")
        print(response.choices[0].message.content)

except Exception as e:
    print(f"❌ An error occurred: {e}")

print("\n" + "="*50)

### Example 6: Model Comparison

In [None]:
def example_6_model_comparison(client: Cerebras) -> Dict[str, Any]:
    """
    Compare different Cerebras models for the same task.

    Args:
        client: Initialized Cerebras client

    Returns:
        A dictionary containing the results of the model comparison.
    """
    print("\n" + "="*60)
    print("📊 EXAMPLE 6: Model Comparison")
    print("="*60)

    # Test prompt for comparison
    test_prompt = "Explain quantum computing in exactly 3 sentences."

    # Models to compare
    models = [
        "qwen-3-coder-480b",
        "llama-3.3-70b",
        "gpt-oss-120b",
        "llama-4-maverick-17b-128e-instruct"
    ]

    results = {}

    for model in models:
        try:
            print(f"\n🧪 Testing model: {model}")
            start_time = time.time()

            response = client.chat.completions.create(
                model=model,
                messages=[
                    {
                        "role": "user",
                        "content": test_prompt
                    }
                ],
                max_tokens=200,
                temperature=0.5,
            )

            end_time = time.time()
            response_time = end_time - start_time

            results[model] = {
                "response": response.choices[0].message.content,
                "tokens": response.usage.total_tokens,
                "time": response_time
            }

            print(f"⏱️ Response time: {response_time:.2f}s")
            print(f"📝 Response: {response.choices[0].message.content[:100]}...")

        except Exception as e:
            print(f"❌ Error with {model}: {str(e)}")
            results[model] = {"error": str(e)}

    return results

In [None]:
# Summary comparison
print("\n" + "="*40)
print("📈 COMPARISON SUMMARY")
print("="*40)

results = example_6_model_comparison(client)

for model, result in results.items():
    if "error" not in result:
      print(f"\n🤖 {model}:")
      print(f"   ⏱️ Time: {result['time']:.2f}s")
      print(f"   🎯 Tokens: {result['tokens']}")
      print(f"   📏 Length: {len(result['response'])} chars")
    else:
      print(f"\n🤖 {model}: Error - {result['error']}")