<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Gen AI Experiments](https://img.shields.io/badge/Gen%20AI%20Experiments-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://github.com/buildfastwithai/gen-ai-experiments)
[![Gen AI Experiments GitHub](https://img.shields.io/github/stars/buildfastwithai/gen-ai-experiments?style=for-the-badge&logo=github&color=gold)](http://github.com/buildfastwithai/gen-ai-experiments)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/[NOTEBOOK_ID])

## Master Generative AI in 8 Weeks
**What You'll Learn:**
- Master cutting-edge AI tools & frameworks
- 6 weeks of hands-on, project-based learning
- Weekly live mentorship sessions
- No coding experience required
- Join Innovation Community

Transform your AI ideas into reality through hands-on projects and expert mentorship.

[Start Your Journey](https://www.buildfastwithai.com/genai-course)

---

# üöÄ Step 3.5 Flash - Complete Testing Notebook

## Model Overview

**Step 3.5 Flash** is StepFun's most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency.

### Key Specifications

| Specification | Value |
|--------------|-------|
| **Architecture** | Sparse Mixture of Experts (MoE) |
| **Total Parameters** | 196B |
| **Active Parameters** | ~11B (per token) |
| **Context Window** | 256K tokens |
| **Vocabulary Size** | 128,896 tokens |
| **Layers** | 45-layer Transformer (4,096 hidden dim) |
| **License** | Apache 2.0 |

### üåü Key Capabilities

1. **Deep Reasoning at Speed**: Generation throughput of 100‚Äì300 tok/s (peaks at 350 tok/s)
2. **Robust Coding & Agents**: 74.4% on SWE-bench Verified, 51.0% on Terminal-Bench 2.0
3. **Efficient Long Context**: 256K context window with 3:1 Sliding Window Attention
4. **Tool Calling & Function Calling**: Native support for function calling and tool use
5. **Reasoning Tokens**: Support for step-by-step reasoning process

### MoE Architecture Highlights

- **288 routed experts per layer** + 1 shared expert (always active)
- **Top-8 experts** selected per token
- **3-way Multi-Token Prediction (MTP-3)** for faster generation

---

## üì¶ Section 1: Installation & Setup

In [2]:
# Install required packages
!pip install -q openai

In [3]:
# Import libraries and setup client
from google.colab import userdata
from openai import OpenAI
import json
import time

# Initialize OpenRouter client for Step 3.5 Flash (FREE tier)
client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="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"
)

# Model identifier for Step 3.5 Flash (free tier)
MODEL_NAME = "stepfun/step-3.5-flash:free"

print("‚úÖ Setup complete!")
print(f"üìç Using model: {MODEL_NAME}")
print(f"üåê Provider: OpenRouter (Free Tier)")

‚úÖ Setup complete!
üìç Using model: stepfun/step-3.5-flash:free
üåê Provider: OpenRouter (Free Tier)
‚úÖ Setup complete!
üìç Using model: stepfun/step-3.5-flash:free
üåê Provider: OpenRouter (Free Tier)


---

## üéØ Section 2: Basic Usage - Hello World

In [4]:
# Basic "Hello World" test
start_time = time.time()

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "system",
            "content": "You are an AI chat assistant provided by StepFun. You are good at Chinese, English, and many other languages."
        },
        {
            "role": "user",
            "content": "Hello! Introduce yourself briefly."
        }
    ]
)

elapsed = time.time() - start_time

print("ü§ñ Response:")
print(response.choices[0].message.content)
print(f"\n‚è±Ô∏è Response time: {elapsed:.2f}s")

# Display usage stats if available
if response.usage:
    print(f"üìä Tokens - Prompt: {response.usage.prompt_tokens}, Completion: {response.usage.completion_tokens}")

ü§ñ Response:
Hello! I‚Äôm **Step**, an AI model developed by **StepFun** (Èò∂Ë∑ÉÊòüËæ∞). I‚Äôm designed to understand and generate natural language, reason logically, and even interpret images. I can help with a wide range of tasks ‚Äî from answering questions and writing creatively, to analyzing data and solving problems.  

I support multiple languages (including Chinese, English, and more), and I aim to be honest, helpful, and respectful in our conversations.  

Feel free to ask me anything ‚Äî I‚Äôm here to assist! üòä

‚è±Ô∏è Response time: 2.60s
üìä Tokens - Prompt: 47, Completion: 220


---

## üß† Section 3: Reasoning Capabilities

Step 3.5 Flash supports reasoning-enabled responses that show step-by-step thinking process.

In [None]:
# Testing reasoning capabilities with a math problem
reasoning_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": """Solve this step by step:

A farmer has chickens and cows. If the animals have a total of 50 heads and 140 legs, how many chickens and how many cows are there?

Show your reasoning process clearly."""
        }
    ],
    temperature=0.7,
    max_tokens=1024
)

print("üßÆ Math Problem Solution:")
print("="*50)
print(reasoning_response.choices[0].message.content)

In [6]:
# Testing complex reasoning - Logic puzzle
logic_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": """Solve this logic puzzle:

Five houses in a row are each painted a different color. In each house lives a person of a different nationality.
Each person drinks a different beverage, smokes a different brand of cigarettes, and keeps a different pet.

Given clues:
1. The Brit lives in the red house.
2. The Swede keeps dogs.
3. The Dane drinks tea.
4. The green house is immediately to the left of the white house.
5. The owner of the green house drinks coffee.

Question: Based on these clues, what can you deduce about the arrangement?"""
        }
    ],
    temperature=0.5,
    max_tokens=1500
)

print("üß© Logic Puzzle Analysis:")
print("="*50)
print(logic_response.choices[0].message.content)

üß© Logic Puzzle Analysis:



---

## üíª Section 4: Coding Capabilities

Step 3.5 Flash achieves **74.4% on SWE-bench Verified**, making it excellent for coding tasks.

In [None]:
# Code generation test
code_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "system",
            "content": "You are an expert Python developer. Write clean, well-documented code with type hints."
        },
        {
            "role": "user",
            "content": """Create a Python class for a simple in-memory cache with the following features:
1. Set a value with an optional TTL (time-to-live in seconds)
2. Get a value (return None if expired or not found)
3. Delete a value
4. Clear all values
5. Get cache stats (hits, misses, size)

Include example usage."""
        }
    ],
    temperature=0.3,
)

print("üíª Generated Python Code:")
print("="*50)
print(code_response.choices[0].message.content)

In [None]:
# Code review and bug fixing
buggy_code = """
def find_duplicates(lst):
    duplicates = []
    for i in range(len(lst)):
        for j in range(len(lst)):
            if lst[i] == lst[j]:
                duplicates.append(lst[i])
    return duplicates
"""

review_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": f"""Review this Python code, identify the bugs/issues, and provide a corrected version:

```python
{buggy_code}
```

Explain each issue found."""
        }
    ],
    temperature=0.3,
    max_tokens=1500
)

print("üîç Code Review:")
print("="*50)
print(review_response.choices[0].message.content)

In [None]:
# Algorithm implementation
algo_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": """Implement a Trie (Prefix Tree) data structure in Python with the following methods:
1. insert(word) - Insert a word
2. search(word) - Return True if word exists
3. starts_with(prefix) - Return True if any word starts with prefix
4. get_all_words_with_prefix(prefix) - Return all words with given prefix

Include time complexity comments."""
        }
    ],
    temperature=0.3,
    max_tokens=2000
)

print("üå≥ Trie Implementation:")
print("="*50)
print(algo_response.choices[0].message.content)

---

## üåä Section 5: Streaming Responses

In [None]:
# Streaming response example
print("üìñ Streaming Story Generation:")
print("="*50)

stream = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": "Write a short story (about 200 words) about an AI learning to appreciate art."
        }
    ],
    stream=True,
    temperature=0.8,
    max_tokens=500
)

full_response = ""
for chunk in stream:
    if chunk.choices[0].delta.content:
        content = chunk.choices[0].delta.content
        full_response += content
        print(content, end="", flush=True)

print("\n\n‚úÖ Streaming complete!")

---

## üåç Section 6: Multilingual Capabilities

In [None]:
# Test multilingual capabilities
languages = [
    ("Chinese", "Áî®‰∏≠ÊñáÂÜô‰∏ÄÈ¶ñÂÖ≥‰∫é‰∫∫Â∑•Êô∫ËÉΩÁöÑÁü≠ËØó"),
    ("Japanese", "Êó•Êú¨Ë™û„ÅßAI„Å´„Å§„ÅÑ„Å¶„ÅÆ‰ø≥Âè•„ÇíÊõ∏„ÅÑ„Å¶„Åè„Å†„Åï„ÅÑ"),
    ("Spanish", "Escribe un breve p√°rrafo sobre el futuro de la IA en espa√±ol"),
    ("French", "√âcrivez trois avantages de l'intelligence artificielle en fran√ßais")
]

print("üåç Multilingual Test:")
print("="*50)

for lang_name, prompt in languages:
    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
    )
    print(f"\nüìå {lang_name}:")
    print("-" * 40)
    print(response.choices[0].message.content)
    print()

---

## üí¨ Section 7: Multi-Turn Conversation

In [None]:
# Multi-turn conversation example
conversation = [
    {
        "role": "system",
        "content": "You are a helpful AI tutor teaching Python programming. Be encouraging and provide examples."
    }
]

user_messages = [
    "Hi! I want to learn about Python decorators. What are they?",
    "Can you show me a simple example?",
    "How would I create a decorator that logs function execution time?"
]

print("üë®‚Äçüè´ Python Tutorial Conversation:")
print("="*50)

for user_msg in user_messages:
    conversation.append({"role": "user", "content": user_msg})

    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=conversation,
        temperature=0.5,
        max_tokens=800
    )

    assistant_msg = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": assistant_msg})

    print(f"\nüë§ User: {user_msg}")
    print(f"\nü§ñ Assistant: {assistant_msg}")
    print("-"*40)

---

## üìù Section 8: Long Context Demo (256K)

In [None]:
# Demonstrating long context processing
# Generate a large text to analyze

chapters = []
for i in range(1, 6):
    chapters.append(f"""
Chapter {i}: The Development of Artificial Intelligence

In this chapter, we explore the evolution of AI from its early beginnings to the present day.
The field of AI was formally founded in 1956 at the Dartmouth Conference.
Early AI focused on symbolic reasoning and expert systems.
The advent of machine learning in the 1980s marked a paradigm shift.
Deep learning, emerging in the 2010s, revolutionized the field.
Today, large language models represent the cutting edge of AI research.
The future promises even more advanced AI systems with enhanced reasoning capabilities.
""")

long_document = "\n".join(chapters)

# Analyze the document
analysis_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": f"""Here is a multi-chapter document:

{long_document}

Please provide:
1. A summary of the main themes
2. Key milestones mentioned
3. The overall narrative arc"""
        }
    ],
    temperature=0.4,
    max_tokens=800
)

print("üìö Long Context Analysis:")
print("="*50)
print(f"Input document: {len(long_document)} characters")
print("\n" + analysis_response.choices[0].message.content)

---

## ‚ö° Section 9: Performance Benchmarking

In [None]:
import time
from statistics import mean, stdev

# Performance benchmark
def benchmark_request(prompt, n_runs=5):
    times = []
    token_counts = []

    for _ in range(n_runs):
        start = time.time()
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=200
        )
        elapsed = time.time() - start
        times.append(elapsed)
        if response.usage:
            token_counts.append(response.usage.completion_tokens)

    return {
        "avg_time": mean(times),
        "std_time": stdev(times) if len(times) > 1 else 0,
        "avg_tokens": mean(token_counts) if token_counts else 0,
        "avg_tok_per_sec": mean(token_counts) / mean(times) if token_counts else 0
    }

print("‚ö° Performance Benchmark:")
print("="*50)
print("Running 5 iterations for each test...\n")

# Simple prompt
simple_result = benchmark_request("What is the capital of France?")
print(f"üìù Simple Query:")
print(f"   Avg Response Time: {simple_result['avg_time']:.2f}s (¬±{simple_result['std_time']:.2f}s)")
print(f"   Avg Tokens: {simple_result['avg_tokens']:.0f}")
print(f"   Throughput: {simple_result['avg_tok_per_sec']:.1f} tok/s")

# Complex prompt
complex_result = benchmark_request("Explain the key differences between REST and GraphQL APIs.")
print(f"\nüìù Complex Query:")
print(f"   Avg Response Time: {complex_result['avg_time']:.2f}s (¬±{complex_result['std_time']:.2f}s)")
print(f"   Avg Tokens: {complex_result['avg_tokens']:.0f}")
print(f"   Throughput: {complex_result['avg_tok_per_sec']:.1f} tok/s")

---

## üè¢ Section 10: Real-World Use Cases

In [None]:
# Use Case 1: Customer Support Agent
support_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "system",
            "content": """You are a helpful customer support agent for TechStore.
- Be polite and professional
- Provide clear solutions
- Offer alternatives when needed
- Acknowledge customer feelings"""
        },
        {
            "role": "user",
            "content": "I ordered a laptop 5 days ago and it still hasn't arrived. I'm very frustrated!"
        }
    ],
    temperature=0.5,
    max_tokens=400
)

print("üéß Customer Support Agent:")
print("="*50)
print(support_response.choices[0].message.content)

In [None]:
# Use Case 2: Data Analysis Assistant
data_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "system",
            "content": "You are a data analysis expert. Provide insights and Python code for analysis."
        },
        {
            "role": "user",
            "content": """I have sales data with columns: date, product, quantity, price, region.
I want to:
1. Find top 5 products by revenue
2. Calculate month-over-month growth
3. Identify best performing region

Provide Python pandas code for this analysis."""
        }
    ],
    temperature=0.3,
    max_tokens=1200
)

print("üìä Data Analysis Assistant:")
print("="*50)
print(data_response.choices[0].message.content)

In [None]:
# Use Case 3: Technical Documentation Writer
docs_response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[
        {
            "role": "user",
            "content": """Write API documentation for this Python function:

```python
def process_batch(
    items: List[Dict[str, Any]],
    batch_size: int = 100,
    retry_count: int = 3,
    timeout: float = 30.0,
    callback: Optional[Callable[[int, int], None]] = None
) -> BatchResult:
```

Include: description, parameters, return value, examples, and possible exceptions."""
        }
    ],
    temperature=0.4,
    max_tokens=1000
)

print("üìñ API Documentation:")
print("="*50)
print(docs_response.choices[0].message.content)

---

## üìã Summary & Key Takeaways

### ‚úÖ What We Tested

| Capability | Status | Notes |
|------------|--------|-------|
| Basic Chat | ‚úÖ | Fast response times |
| Reasoning | ‚úÖ | Strong math & logic |
| Coding | ‚úÖ | Excellent code generation |
| Streaming | ‚úÖ | Real-time token output |
| Multilingual | ‚úÖ | Chinese, Japanese, Spanish, French |
| Long Context | ‚úÖ | 256K token window |

### üåü Model Highlights

- **Efficiency**: 196B total params, only 11B active per token
- **Speed**: 100-300 tok/s generation throughput
- **Context**: 256K token window with efficient SWA
- **Coding**: 74.4% on SWE-bench Verified
- **Agency**: 51.0% on Terminal-Bench 2.0

### üìö Resources

- **OpenRouter**: https://openrouter.ai/stepfun/step-3.5-flash:free
- **ModelScope**: https://modelscope.cn/models/stepfun-ai/Step-3.5-Flash-FP8
- **Documentation**: https://static.stepfun.com/blog/step-3.5-flash/
- **License**: Apache 2.0

---

## üöÄ Next Steps

1. Explore more advanced agentic workflows
2. Build RAG applications with long context
3. Create custom tool integrations
4. Deploy in production with StepFun or OpenRouter APIs

---

<center>

*Built with ‚ù§Ô∏è by @BuildFastWithAI*

[www.buildfastwithai.com](https://www.buildfastwithai.com)

</center>