<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Gen AI Experiments](https://img.shields.io/badge/Gen%20AI%20Experiments-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://github.com/buildfastwithai/gen-ai-experiments)
[![Gen AI Experiments GitHub](https://img.shields.io/github/stars/buildfastwithai/gen-ai-experiments?style=for-the-badge&logo=github&color=gold)](http://github.com/buildfastwithai/gen-ai-experiments)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OchQHwAA4xrNVIVykNwkYeFYbTeHY7jk?usp=sharing)

## Master Generative AI in 8 Weeks
**What You'll Learn:**
- Master cutting-edge AI tools & frameworks
- 6 weeks of hands-on, project-based learning
- Weekly live mentorship sessions
- Join Innovation Community

Learn by building. Get expert mentorship and work on real AI projects.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)


# üöÄ Qwen3.5 397B-A17B ‚Äî Testing Notebook

**Alibaba's Latest Hybrid MoE Vision-Language Model via OpenRouter**

This notebook tests the capabilities of **Qwen3.5 397B-A17B** ‚Äî a native vision-language model built on a hybrid architecture combining **linear attention** with a **sparse Mixture-of-Experts (MoE)** design for blazing-fast inference.

---

## üìä Model Specifications

| Feature | Detail |
|---------|--------|
| **Provider** | Qwen (Alibaba) via OpenRouter |
| **Model ID** | `qwen/qwen3.5-397b-a17b` |
| **Total Parameters** | 397 Billion |
| **Active Parameters** | 17 Billion (sparse MoE) |
| **Architecture** | Hybrid Linear Attention + Sparse MoE |
| **Type** | Native Vision-Language Model (VLM) |
| **Reasoning** | ‚úÖ Supported |
| **Streaming** | ‚úÖ Supported |
| **Tool Calling** | ‚úÖ Supported |
| **Vision** | ‚úÖ Image & Video Understanding |

---

## üîë Key Strengths

- üß† **Reasoning & Logic** ‚Äî Deep thinking with step-by-step reasoning
- üíª **Code Generation** ‚Äî Strong generalization across coding tasks
- ü§ñ **Agent Capabilities** ‚Äî Built for agentic workflows & tool use
- üëÅÔ∏è **Vision Understanding** ‚Äî Image, video, and GUI interaction
- ‚ö° **Efficient Inference** ‚Äî Only 17B active params at any time
- üîó [Model Weights on HuggingFace](https://huggingface.co/Qwen/Qwen3.5-397B-A17B)

---
## üì¶ Setup & Installation

In [1]:
# @title Install Dependencies
!pip install -q openai

In [2]:
# @title Configure OpenRouter Client
from google.colab import userdata
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=userdata.get("OPENROUTER_API_KEY")
)

MODEL = "qwen/qwen3.5-397b-a17b"

print(f"‚úÖ Client configured for: {MODEL}")

‚úÖ Client configured for: qwen/qwen3.5-397b-a17b


---
## üí¨ Example 1: Basic Chat & Reasoning

Test Qwen3.5's fundamental chat ability and reasoning skills.

In [None]:
# @title Basic Chat
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Explain the difference between GPT, BERT, and Mixture-of-Experts architectures in simple terms. Use analogies."}
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

---
## üß† Example 2: Deep Reasoning with Thinking Mode

Qwen3.5 supports **reasoning mode** ‚Äî you can see its step-by-step thinking before the final answer. Perfect for math and logic puzzles!

In [None]:
# @title Reasoning Mode
import json

response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": """A bat and a ball cost $1.10 in total.
The bat costs $1.00 more than the ball.
How much does the ball cost?

Think step by step and show your work."""}
    ],
    max_tokens=2048,
    extra_body={
        "reasoning": {
            "effort": "high"
        }
    }
)

# Check for reasoning content
msg = response.choices[0].message

# Display reasoning if available
if hasattr(msg, 'reasoning_content') and msg.reasoning_content:
    print("üí≠ REASONING:")
    print("=" * 50)
    print(msg.reasoning_content)
    print("\n")

print("‚úÖ ANSWER:")
print("=" * 50)
print(msg.content)

---
## üíª Example 3: Code Generation

Qwen3.5 has strong code generation abilities. Let's test it with a real-world coding task.

In [None]:
# @title Code Generation
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "You are an expert Python developer. Write clean, production-quality code with comments."},
        {"role": "user", "content": """Write a Python class called 'TaskManager' that:
1. Can add tasks with a title, priority (high/medium/low), and due date
2. Can list all tasks sorted by priority
3. Can mark tasks as complete
4. Can show only pending tasks
5. Include a __str__ method for nice output

Show usage examples at the end."""}
    ],
    max_tokens=4096
)

print(response.choices[0].message.content)

---
## üõ†Ô∏è Example 4: Tool Calling / Function Calling

Qwen3.5 excels at agentic tasks. Let's test its ability to understand when and how to call tools.

In [None]:
# @title Define Tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get the current stock price for a given ticker symbol",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {"type": "string", "description": "Stock ticker symbol (e.g., AAPL, GOOGL)"},
                    "currency": {"type": "string", "enum": ["USD", "EUR", "GBP"], "description": "Currency for the price"}
                },
                "required": ["ticker"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "send_email",
            "description": "Send an email to a recipient",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {"type": "string", "description": "Recipient email address"},
                    "subject": {"type": "string", "description": "Email subject"},
                    "body": {"type": "string", "description": "Email body text"}
                },
                "required": ["to", "subject", "body"]
            }
        }
    }
]

# Ask a question that requires tool use
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Check the stock price of Apple and Tesla, then email me a summary at john@example.com"}
    ],
    tools=tools,
    max_tokens=1024
)

message = response.choices[0].message

if message.tool_calls:
    print(f"üîß Tool Calls Detected: {len(message.tool_calls)}\n")
    for i, tc in enumerate(message.tool_calls, 1):
        args = json.loads(tc.function.arguments)
        print(f"  Call {i}: {tc.function.name}")
        print(f"  Args:   {json.dumps(args, indent=2)}")
        print()
else:
    print(message.content)

---
## üåä Example 5: Streaming ‚Äî Real-time Output

Stream responses token-by-token for a better user experience with long outputs.

In [None]:
# @title Streaming Response
print("üì° Streaming from Qwen3.5 397B:\n")

stream = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Write a short, compelling story (under 200 words) about a robot that learns to paint."}
    ],
    max_tokens=1024,
    stream=True
)

full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content:
        content = chunk.choices[0].delta.content
        full_response += content
        print(content, end="", flush=True)

print(f"\n\n‚úÖ Stream complete! ({len(full_response)} chars)")

---
## ‚öîÔ∏è Example 6: Head-to-Head ‚Äî Qwen3.5 vs Kimi K2.5

Let's compare **Qwen3.5 397B** (via OpenRouter) with **Kimi K2.5** (via NVIDIA) on the same prompts to see how they stack up!

Both are massive MoE models:
- **Qwen3.5**: 397B total / 17B active params
- **Kimi K2.5**: 1T total params, 384 experts

In [8]:
# @title Setup Both Clients
!pip install -q langchain-nvidia-ai-endpoints

[?25l   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m0.0/50.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m50.2/50.2 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# @title Configure Kimi K2.5 Client (NVIDIA)
from langchain_nvidia_ai_endpoints import ChatNVIDIA

kimi_client = ChatNVIDIA(
    model="moonshotai/kimi-k2.5",
    api_key=userdata.get("NVIDIA_API_KEY"),
    temperature=0.7,
    max_tokens=2048,
)

print("‚úÖ Kimi K2.5 (NVIDIA) ready!")
print("‚úÖ Qwen3.5 (OpenRouter) ready!")

In [None]:
# @title üèÜ Comparison Test 1: Logical Reasoning
import time

REASONING_PROMPT = """There are 3 boxes. One has only apples, one has only oranges,
and one has both. All boxes are labeled WRONG. You can pick one fruit from
one box. Which box do you pick from, and how do you figure out all labels?
Explain step by step."""

print("=" * 60)
print("üß† LOGICAL REASONING COMPARISON")
print("=" * 60)

# --- Qwen 3.5 ---
print("\nüìò QWEN 3.5 397B:")
print("-" * 40)
start = time.time()
qwen_response = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": REASONING_PROMPT}],
    max_tokens=2048
)
qwen_time = time.time() - start
qwen_answer = qwen_response.choices[0].message.content
print(qwen_answer)
print(f"\n‚è±Ô∏è Time: {qwen_time:.2f}s")

# --- Kimi K2.5 ---
print("\n\nüìó KIMI K2.5:")
print("-" * 40)
start = time.time()
kimi_response = kimi_client.invoke(REASONING_PROMPT)
kimi_time = time.time() - start
print(kimi_response.content)
print(f"\n‚è±Ô∏è Time: {kimi_time:.2f}s")

# Summary
print("\n" + "=" * 60)
print("üìä SPEED COMPARISON:")
print(f"   Qwen 3.5: {qwen_time:.2f}s")
print(f"   Kimi K2.5: {kimi_time:.2f}s")
faster = "Qwen 3.5" if qwen_time < kimi_time else "Kimi K2.5"
diff = abs(qwen_time - kimi_time)
print(f"   üèÜ {faster} was {diff:.2f}s faster")

In [None]:
# @title üèÜ Comparison Test 2: Code Generation

CODE_PROMPT = """Write a Python function called 'analyze_text' that takes a string and returns a dictionary with:
- word_count: total words
- char_count: total characters (no spaces)
- sentence_count: number of sentences
- most_common_word: the most frequent word
- avg_word_length: average word length

Include type hints and a docstring."""

print("=" * 60)
print("üíª CODE GENERATION COMPARISON")
print("=" * 60)

# --- Qwen 3.5 ---
print("\nüìò QWEN 3.5 397B:")
print("-" * 40)
start = time.time()
qwen_code = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "Write clean Python code only. No extra explanation."},
        {"role": "user", "content": CODE_PROMPT}
    ],
    max_tokens=2048
)
qwen_code_time = time.time() - start
print(qwen_code.choices[0].message.content)
print(f"\n‚è±Ô∏è Time: {qwen_code_time:.2f}s")

# --- Kimi K2.5 ---
print("\n\nüìó KIMI K2.5:")
print("-" * 40)
start = time.time()
from langchain_core.messages import SystemMessage, HumanMessage
kimi_code = kimi_client.invoke([
    SystemMessage(content="Write clean Python code only. No extra explanation."),
    HumanMessage(content=CODE_PROMPT)
])
kimi_code_time = time.time() - start
print(kimi_code.content)
print(f"\n‚è±Ô∏è Time: {kimi_code_time:.2f}s")

# Summary
print("\n" + "=" * 60)
print("üìä SPEED COMPARISON:")
print(f"   Qwen 3.5: {qwen_code_time:.2f}s")
print(f"   Kimi K2.5: {kimi_code_time:.2f}s")
faster = "Qwen 3.5" if qwen_code_time < kimi_code_time else "Kimi K2.5"
diff = abs(qwen_code_time - kimi_code_time)
print(f"   üèÜ {faster} was {diff:.2f}s faster")

In [None]:
# @title üèÜ Comparison Test 3: Creative Writing

CREATIVE_PROMPT = "Write a haiku about artificial intelligence, then explain the meaning behind it in one sentence."

print("=" * 60)
print("‚úçÔ∏è CREATIVE WRITING COMPARISON")
print("=" * 60)

# --- Qwen 3.5 ---
print("\nüìò QWEN 3.5 397B:")
print("-" * 40)
start = time.time()
qwen_creative = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": CREATIVE_PROMPT}],
    max_tokens=512
)
qwen_creative_time = time.time() - start
print(qwen_creative.choices[0].message.content)
print(f"\n‚è±Ô∏è Time: {qwen_creative_time:.2f}s")

# --- Kimi K2.5 ---
print("\n\nüìó KIMI K2.5:")
print("-" * 40)
start = time.time()
kimi_creative = kimi_client.invoke(CREATIVE_PROMPT)
kimi_creative_time = time.time() - start
print(kimi_creative.content)
print(f"\n‚è±Ô∏è Time: {kimi_creative_time:.2f}s")

print("\n" + "=" * 60)
print("üìä SPEED COMPARISON:")
print(f"   Qwen 3.5: {qwen_creative_time:.2f}s")
print(f"   Kimi K2.5: {kimi_creative_time:.2f}s")
faster = "Qwen 3.5" if qwen_creative_time < kimi_creative_time else "Kimi K2.5"
diff = abs(qwen_creative_time - kimi_creative_time)
print(f"   üèÜ {faster} was {diff:.2f}s faster")

---
## üìà Token Usage & Summary

In [None]:
# @title Final Summary
print("üìä Overall Comparison Summary")
print("=" * 60)
print(f"\n{'Test':<25} {'Qwen 3.5':>12} {'Kimi K2.5':>12} {'Winner':>12}")
print("-" * 60)

tests = [
    ("Logical Reasoning", qwen_time, kimi_time),
    ("Code Generation", qwen_code_time, kimi_code_time),
    ("Creative Writing", qwen_creative_time, kimi_creative_time),
]

qwen_wins = 0
kimi_wins = 0

for name, qt, kt in tests:
    winner = "Qwen 3.5" if qt < kt else "Kimi K2.5"
    if qt < kt:
        qwen_wins += 1
    else:
        kimi_wins += 1
    print(f"{name:<25} {qt:>10.2f}s {kt:>10.2f}s {winner:>12}")

print("-" * 60)
print(f"\nüèÜ Speed Winner: {'Qwen 3.5' if qwen_wins > kimi_wins else 'Kimi K2.5'} ({max(qwen_wins, kimi_wins)}/{len(tests)} tests faster)")
print("\nüí° Note: Speed ‚â† Quality! Read both responses to judge which model gave better answers.")
print("\nüìù Pricing:")
print("   ‚Ä¢ Qwen 3.5: Check https://openrouter.ai/qwen/qwen3.5-397b-a17b")
print("   ‚Ä¢ Kimi K2.5: FREE via NVIDIA (build.nvidia.com)")

---
## üéØ Key Takeaways

### Qwen3.5 397B-A17B Strengths:
- ‚úÖ Hybrid architecture = efficient inference
- ‚úÖ Strong reasoning with thinking mode
- ‚úÖ Excellent code generation
- ‚úÖ Native vision-language support
- ‚úÖ Agent-ready with tool calling

### Comparison Notes:
| Feature | Qwen 3.5 | Kimi K2.5 |
|---------|----------|----------|
| **Total Params** | 397B | 1T |
| **Active Params** | 17B | MoE (384 experts) |
| **Architecture** | Linear Attn + MoE | MoE + MoonViT |
| **Vision** | ‚úÖ Native VLM | ‚úÖ VLM |
| **Provider** | OpenRouter | NVIDIA (free) |
| **Open Source** | ‚úÖ | ‚úÖ |

### üìö Resources:
- üîó [Qwen3.5 on OpenRouter](https://openrouter.ai/qwen/qwen3.5-397b-a17b)
- üîó [Qwen3.5 on HuggingFace](https://huggingface.co/Qwen/Qwen3.5-397B-A17B)
- üîó [Kimi K2.5 on NVIDIA](https://build.nvidia.com/moonshotai/kimi-k2.5)

---
*Notebook by @BuildFastWithAI*