
<div style="display: flex; align-items: center; gap: 40px;">

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSkez75fZoo82SccEXRMVRlj9sZsQifRUhURQ&s" width="200">
<img src="https://images.crunchbase.com/image/upload/c_pad,f_auto,q_auto:eco,dpr_1/fc52752016ff487da8e4686a2b7fcb6d" width="120">

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1cbX10xhxtjENyJJGtVFupeiLqP4cr3t8?usp=sharing)


<div>
  <h2>Cerebras Inference</h2>
  <p>Cerebras Systems builds the world's largest computer chip - the Wafer Scale Engine (WSE) - designed specifically for AI workloads. This cookbook provides comprehensive examples, tutorials, and best practices for developing and deploying AI models using Cerebras infrastructure, including both training on WSE clusters and fast inference via Cerebras Cloud.</p>

</div>
</div>

## What is Agno?

Agno is a lightweight library for building agents with memory, knowledge, tools, and reasoning capabilities. It's model-agnostic, allowing you to connect to 23+ model providers without lock-in.

#Cerebras with Agno Agent


## Get Your API Keys

Before you begin, make sure you have:

1. A Cerebras API key (Get yours at [Cerebras Cloud](https://cloud.cerebras.ai/))
2. Basic familiarity with Python and Jupyter notebooks

This notebook is designed to run in Google Colab, so no local Python installation is required.

## Setup and Installation

First, let's install the required packages:

In [1]:
!pip install -q openai agno ddgs

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.1 MB[0m [31m7.8 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.1/1.1 MB[0m [31m17.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.6/41.6 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/5.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━[0m [32m4.6/5.3 MB[0m [31m137.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━

## Setting up Environment Variables

You'll need to set up your API keys. For security reasons, it's best to use environment variables:

In [2]:
import os
from google.colab import userdata

os.environ["CEREBRAS_API_KEY"] = userdata.get("CEREBRAS_API_KEY")

## Basic Usage of Cerebras with OpenAI Client

Let's first see how to use Cerebras with the standard OpenAI client:

In [3]:
from openai import OpenAI

client = OpenAI(
    base_url='https://api.cerebras.ai/v1',
    api_key=os.environ["CEREBRAS_API_KEY"]
)

response = client.chat.completions.create(
    model="llama3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant that specializes in AI and machine learning."},
        {"role": "user", "content": "Explain the concept of neural networks in simple terms."}
    ]
)

print(response.choices[0].message.content)

**What is a Neural Network?**

A neural network is a type of computer system that is designed to think and learn like a human brain. It's made up of many interconnected nodes or "neurons" that work together to process and understand information.

**How Does it Work?**

Imagine you're trying to recognize a picture of a dog. Here's how a neural network would approach this task:

1. **Input Layer**: The picture of the dog is fed into the network as an input.
2. **Hidden Layers**: The input is then passed through multiple layers of neurons, each of which looks for specific features in the picture, such as edges, shapes, and textures.
3. **Output Layer**: The final layer of neurons takes the output from the previous layers and makes a prediction about what the picture is (in this case, a dog).

**Key Concepts:**

* **Artificial Neurons**: Each node in the network is an artificial neuron that receives one or more inputs, performs a computation, and sends the output to other neurons.
* **Conn

## Testing Different Cerebras Models

Cerebras offers multiple models. Let's test with different models:

In [4]:
models = ["llama3.3-70b", "qwen-3-coder-480b", "gpt-oss-120b"]

for model in models:
    print(f"\n{'='*50}")
    print(f"Testing model: {model}")
    print(f"{'='*50}\n")

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": "What is the capital of France?"}
        ],
        max_tokens=50
    )

    print(response.choices[0].message.content)


Testing model: llama3.3-70b

The capital of France is Paris.

Testing model: qwen-3-coder-480b

The capital of France is Paris.

Testing model: gpt-oss-120b

The capital of France is **Paris**.


## Integrating Cerebras with Agno Agent

Now, let's integrate Cerebras with Agno to create an intelligent agent:

In [None]:
from agno.agent import Agent
from agno.models.openai.like import OpenAILike

cerebras_agent = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    description="You are a helpful AI assistant powered by Cerebras inference.",
    markdown=True
)

cerebras_agent.print_response("Tell me about the benefits of fast inference in AI applications.", stream=True)

## Adding Tools to the Cerebras Agent

Let's enhance our Cerebras agent by adding tools, such as web search capabilities:

In [None]:
from agno.tools.duckduckgo import DuckDuckGoTools

cerebras_agent_with_tools = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    description="You are a helpful assistant with web search capabilities.",
    tools=[DuckDuckGoTools()],
    markdown=True
)

cerebras_agent_with_tools.print_response("What are the latest developments in Cerebras technology?", stream=True)

## Creating a Research Agent with Reasoning

Now, let's create a more advanced Cerebras agent that can handle research tasks with reasoning capabilities:

In [None]:
research_agent = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    description="You are a research assistant that can analyze complex topics and provide well-reasoned answers.",
    tools=[DuckDuckGoTools()],
    instructions=[
        "Break down complex questions into smaller parts.",
        "Use web search when you need current information.",
        "Provide detailed, well-structured answers.",
        "Cite your sources when using web search results."
    ],
    markdown=True
)

research_agent.print_response(
    "Compare the performance of Cerebras inference with traditional GPU-based inference. What makes it faster?",
    stream=True
)

## Creating a Specialized Code Assistant Agent

In [None]:
code_agent = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    description="You are an expert code assistant that helps with programming questions.",
    instructions=[
        "Provide clear, well-commented code examples.",
        "Explain complex concepts step by step.",
        "Follow best practices and coding standards.",
        "Include error handling when appropriate."
    ],
    markdown=True
)

code_agent.print_response(
    "Write a Python function that implements a binary search algorithm with detailed explanations.",
    stream=True
)

## Creating a Multi-Agent System

In [None]:
general_agent = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    name="General Assistant",
    description="General purpose assistant",
    markdown=True
)

technical_agent = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    name="Technical Expert",
    description="Expert in technical topics and programming",
    tools=[DuckDuckGoTools()],
    markdown=True
)

print("General Agent Response:")
general_agent.print_response("What is machine learning?", stream=True)

print("\n\nTechnical Agent Response:")
technical_agent.print_response("What is machine learning?", stream=True)

## Creating a Conversational Agent with Memory

In [None]:
conversational_agent = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    description="You are a friendly conversational assistant that remembers context.",
    markdown=True,
)

print("First interaction:")
conversational_agent.print_response("My name is John and I'm learning about AI.", stream=True)

print("\n\nSecond interaction:")
conversational_agent.print_response("What's my name and what am I learning about?", stream=True)

## Performance Comparison: Streaming vs Non-Streaming

In [None]:
import time

test_agent = Agent(
    model=OpenAILike(
        id="llama3.3-70b",
        api_key=os.getenv("CEREBRAS_API_KEY"),
        base_url="https://api.cerebras.ai/v1"
    ),
    markdown=True
)

query = "Explain quantum computing in 3 paragraphs."

print("Non-streaming response:")
start_time = time.time()
test_agent.print_response(query, stream=False)
non_stream_time = time.time() - start_time
print(f"\nTime taken: {non_stream_time:.2f} seconds")

print("\n\nStreaming response:")
start_time = time.time()
test_agent.print_response(query, stream=True)
stream_time = time.time() - start_time
print(f"\nTime taken: {stream_time:.2f} seconds")

## Conclusion

In this notebook, we've demonstrated how to use Cerebras models with the OpenAI client and integrate them with Agno to create intelligent agents. We've explored:

1. Basic usage of Cerebras with the OpenAI client
2. Testing different Cerebras models
3. Creating a simple Cerebras agent with Agno
4. Adding tools like web search to the Cerebras agent
5. Building a research agent with reasoning capabilities
6. Creating specialized agents (code assistant)
7. Implementing multi-agent systems
8. Building conversational agents with memory
9. Performance comparison between streaming and non-streaming responses

Cerebras's ultra-fast inference capabilities, combined with Agno's flexible agent framework, provide a powerful platform for building intelligent applications that can respond quickly and leverage various tools and knowledge sources.