<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Gen AI Experiments](https://img.shields.io/badge/Gen%20AI%20Experiments-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://github.com/buildfastwithai/gen-ai-experiments)
[![Gen AI Experiments GitHub](https://img.shields.io/github/stars/buildfastwithai/gen-ai-experiments?style=for-the-badge&logo=github&color=gold)](http://github.com/buildfastwithai/gen-ai-experiments)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/169ZBo3jo0nCQ3duAFzxq0sMY3eQVQh6z?usp=sharing)

## Master Generative AI in 8 Weeks

**What You'll Learn:**
- Cutting-edge Generative AI tools & frameworks
- 6 weeks of hands-on, project-based learning
- Weekly live mentorship
- No prior coding experience required
- Access to an innovation-driven community

Transform your AI ideas into reality through hands-on projects and expert mentorship.

üëâ [Start Your Journey](https://www.buildfastwithai.com/genai-course)

# üöÄ GLM-5 Testing Notebook

**Z.ai's Flagship Open-Source Foundation Model for Complex Systems & Agent Workflows**

This notebook explores the powerful capabilities of **GLM-5**, Z.ai's most capable model engineered for complex systems design and long-horizon agent workflows.

---

## üìä Model Specifications

| Feature | Specification |
|---------|---------------|
| **Provider** | Z.ai (via OpenRouter) |
| **Model ID** | `z-ai/glm-5` |
| **Architecture** | Open-source foundation model |
| **Tool Calling** | ‚úÖ Supported |
| **Reasoning** | ‚úÖ Deep backend reasoning |
| **Streaming** | ‚úÖ Supported |
| **JSON Mode** | ‚úÖ Structured output |

---

## üîë Key Capabilities

1. **Complex Systems Design** - Full-system construction beyond simple code generation
2. **Long-Horizon Agent Workflows** - Agentic planning and autonomous execution
3. **Deep Backend Reasoning** - Advanced step-by-step reasoning with self-correction
4. **Production-Grade Code** - Large-scale programming tasks rivaling closed-source models
5. **Iterative Self-Correction** - Identifies and fixes issues autonomously
6. **Open Source** - Model weights available on [HuggingFace](https://huggingface.co/zai-org/GLM-5)

---
## üì¶ Setup & Installation

In [1]:
# @title Install Dependencies
!pip install -q openai

In [2]:
# @title Configure API Client
import os
from google.colab import userdata
from openai import OpenAI

# Setup OpenRouter client
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=userdata.get("OPENROUTER_API_KEY")
)

MODEL = "z-ai/glm-5"

print(f"‚úÖ Client configured for: {MODEL}")

‚úÖ Client configured for: z-ai/glm-5


---
## üß™ Example 1: Basic Chat & Reasoning

Test GLM-5's fundamental chat capabilities and its deep reasoning ability.

In [3]:
# @title Basic Chat Completion
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are GLM-5, Z.ai's flagship open-source foundation model. Be helpful, precise, and thorough."},
        {"role": "user", "content": "What are the key architectural innovations that make modern large language models effective? Explain concisely."}
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)




---
## üõ†Ô∏è Example 2: Tool Calling

GLM-5 supports function/tool calling for building agentic workflows.

In [4]:
# @title Define Tools
import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Perform mathematical calculations",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {"type": "string", "description": "Math expression to evaluate"}
                },
                "required": ["expression"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_knowledge",
            "description": "Search a knowledge base for information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    }
]

# Simulate tool execution
def execute_tool(name, args):
    if name == "get_weather":
        return f"Weather in {args.get('location', 'Unknown')}: 22¬∞C, Partly Cloudy"
    elif name == "calculate":
        try:
            result = eval(args.get('expression', '0'))
            return f"Result: {result}"
        except:
            return "Error in calculation"
    elif name == "search_knowledge":
        return f"Found information about: {args.get('query', 'topic')}"
    return "Unknown tool"

print("‚úÖ Tools defined: get_weather, calculate, search_knowledge")

‚úÖ Tools defined: get_weather, calculate, search_knowledge


In [5]:
# @title Tool Calling Demo
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "What's the weather in Beijing and also calculate 18% tip on $120.00?"}
    ],
    tools=tools,
    max_tokens=1024
)

# Check for tool calls
message = response.choices[0].message
print("üì§ Model Response:")

if message.tool_calls:
    print(f"\nüîß Tool Calls Detected: {len(message.tool_calls)}")
    for tc in message.tool_calls:
        print(f"   ‚û§ {tc.function.name}: {tc.function.arguments}")

        # Execute and show result
        args = json.loads(tc.function.arguments)
        result = execute_tool(tc.function.name, args)
        print(f"   üìã Result: {result}")
else:
    print(message.content)

üì§ Model Response:

üîß Tool Calls Detected: 2
   ‚û§ get_weather: {"location":"Beijing"}
   üìã Result: Weather in Beijing: 22¬∞C, Partly Cloudy
   ‚û§ calculate: {"expression":"120 * 0.18"}
   üìã Result: Result: 21.599999999999998


---
## üíª Example 3: Code Generation

GLM-5 is built for production-grade programming ‚Äî test its ability to generate complex, well-documented code.

In [6]:
# @title Complex Code Generation
coding_prompt = """
Create a Python class for a Rate Limiter with the following features:
1. Token bucket algorithm implementation
2. Configurable rate (tokens per second) and burst size
3. Thread-safe operations
4. Methods: acquire, try_acquire, wait, get_stats
5. Include type hints, docstrings, and usage examples

Provide a complete, production-ready implementation.
"""

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are an expert Python developer. Write clean, production-quality code with full documentation."},
        {"role": "user", "content": coding_prompt}
    ],
    max_tokens=4096
)

print(response.choices[0].message.content)

Here is a complete, production-ready implementation of a `RateLimiter` class using the Token Bucket algorithm.

```python
import time
import threading
from typing import Optional, Dict, Union


class RateLimiter:
    """
    A thread-safe implementation of the Token Bucket algorithm for rate limiting.

    This class controls the rate at which actions are performed by maintaining a
    "bucket" of tokens. Tokens are added at a fixed rate up to a maximum capacity
    (burst size). Actions consume tokens; if the bucket is empty, the action must
    wait or fail.

    Attributes:
        rate (float): The rate at which tokens are added to the bucket per second.
        burst_size (int): The maximum number of tokens the bucket can hold.
    
    Example:
        >>> limiter = RateLimiter(rate=5, burst_size=10)
        >>> if limiter.try_acquire():
        ...     print("Action permitted")
        ... else:
        ...     print("Rate limit exceeded")
    """

    def __init__(self, rate: f

---
## üìä Example 4: Structured JSON Output

Extract structured data from unstructured text using GLM-5's JSON mode.

In [None]:
# @title JSON Structured Output
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": """You are a data extraction specialist.
Always respond with valid JSON matching this schema:
{
  "company_name": string,
  "industry": string,
  "key_products": [string],
  "headquarters": string,
  "open_source_models": [string],
  "notable_facts": [string]
}"""},
        {"role": "user", "content": "Extract structured information about Z.ai (Zhipu AI), the company behind the GLM series of language models."}
    ],
    max_tokens=1024,
    response_format={"type": "json_object"}
)

import json
result = json.loads(response.choices[0].message.content)
print(json.dumps(result, indent=2))

---
## üåä Example 5: Streaming Responses

Stream long-form content generation for real-time feedback.

In [8]:
# @title Streaming Demo
stream = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Explain the evolution of open-source AI models from GPT-2 to modern models like GLM-5. Keep it to 3 concise paragraphs."}
    ],
    max_tokens=1024,
    stream=True
)

full_response = ""
print("üì° Streaming response:\n")

for chunk in stream:
    if chunk.choices[0].delta.content:
        content = chunk.choices[0].delta.content
        full_response += content
        print(content, end="", flush=True)

print("\n\n‚úÖ Stream complete!")

üì° Streaming response:



‚úÖ Stream complete!


---
## üß† Example 6: Complex Reasoning & System Design

Push GLM-5's flagship capability: complex systems design and deep analytical reasoning.

In [None]:
# @title System Design & Reasoning Task
design_prompt = """
Design a real-time notification system for a large-scale social media platform.

Requirements:
- Handle 10M+ concurrent users
- Support push notifications, in-app alerts, and email digests
- Sub-second delivery for priority notifications
- Notification preferences per user (frequency, channels, mute)
- Deduplication and batching for non-urgent notifications

Provide:
1. High-level architecture with component breakdown
2. Data flow diagram (described textually)
3. Key technology choices with justification
4. Scaling strategy and bottleneck analysis
5. Failure handling and graceful degradation approach
"""

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are a senior systems architect with expertise in distributed systems at scale. Provide thorough, production-ready design."},
        {"role": "user", "content": design_prompt}
    ],
    max_tokens=4096
)

print(response.choices[0].message.content)

---
## üìà Summary & Token Usage

Track token usage and costs for your API calls.

In [None]:
# @title Check Token Usage (Last Response)
if hasattr(response, 'usage') and response.usage:
    usage = response.usage
    print("üìä Token Usage (Last Request):")
    print(f"   ‚Ä¢ Prompt tokens: {usage.prompt_tokens}")
    print(f"   ‚Ä¢ Completion tokens: {usage.completion_tokens}")
    print(f"   ‚Ä¢ Total tokens: {usage.total_tokens}")
else:
    print("Token usage not available in response")

print("\nüí° Check latest GLM-5 pricing at:")
print("   https://openrouter.ai/z-ai/glm-5")

---
## üéØ Key Takeaways

**GLM-5 excels at:**
- ‚úÖ Complex systems design and architecture
- ‚úÖ Long-horizon agentic workflows
- ‚úÖ Production-grade code generation
- ‚úÖ Deep reasoning with self-correction
- ‚úÖ Tool calling for autonomous execution
- ‚úÖ Structured JSON output generation

**Best use cases:**
- üèóÔ∏è Full-system construction & architecture design
- üíª Large-scale software development
- ü§ñ Building autonomous AI agents
- üìä Complex analytical tasks & reasoning
- üîß Production backend engineering

**Resources:**
- üìñ [OpenRouter Model Page](https://openrouter.ai/z-ai/glm-5)
- ü§ó [HuggingFace Model Weights](https://huggingface.co/zai-org/GLM-5)

---
*Notebook by @BuildFastWithAI*