# Open Notebook & Additional Resources

<a target="_blank" href="https://colab.research.google.com/github/Nicolepcx/ORM-self-improving-ai-agents-course/blob/main/hands_on/session_05_HANDS_ON_MCP_server_Rube.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
<a target="_blank" href="https://learning.oreilly.com/library/view/ai-agents-the/0642572247775/">
  <img src="https://img.shields.io/badge/AI%20Agents%20Book-Read%20on%20O'Reilly-d40101?style=flat" alt="AI Agents Book ‚Äì Read on O'Reilly"/>
</a>





<font color="red" size="10">
<b>HANDS-ON TIME: 15 mins</b>
</font>

# About this Notebook

## What to do first (read this now)

‚è± **Hands-on time: 15 minutes**

During the live session, **do not try to run everything** in this notebook.

**What you should do during the course:**

1. Scroll to the **Hands-on** section
2. Adjust:
   - `USE_CASE`
   - optionally `NUM_SCENARIOS` and `MAX_TURNS`
3. Run the notebook
4. Inspect the generated scenarios and tool workflows

That‚Äôs it.

---

## What this notebook is really about (read later)

This notebook introduces **MCP based tool ecosystems** and how agents reason over **large, heterogeneous tool sets**.

Instead of hard coding tools, the agent discovers them dynamically via MCP servers such as:

* Rube (Composio)
* Exa (in a solution notebook integrated)

The focus is not execution performance, but **tool discovery, planning, and orchestration**.

---

## The core idea you are practicing

This notebook shows how agents move from:

* ‚ÄúI have one tool‚Äù <br>
‚Üí to
* ‚ÄúI have hundreds of tools, how do I decide?‚Äù

The important concepts are:

* **Tool discovery** via MCP
* **Tool selection** based on a natural language use case
* **Workflow planning** before execution
* **Multi turn tool reasoning**

The scenario generator (`generate_scenarios`) is the bridge between:
* raw tool schemas
* and realistic agent tasks

---

## What the TODO actually controls

The TODO section controls the **agent‚Äôs intent**.

By changing `USE_CASE`, you change:
* which tools are discovered
* how the agent frames the problem
* what kind of workflow is produced

By changing:
* `NUM_SCENARIOS`
* `MAX_TURNS`

you control the **breadth vs depth** of exploration.

This is exactly how you would prototype real world agent behavior before training.

---

## One takeaway

If you remember one thing:

**Modern agents don‚Äôt just call tools.  
They discover, plan, and compose tools dynamically.**

MCP is the infrastructure layer that makes that possible.


This notebook is for the *Hands-on* for Session 5 for Develop Self-Improving AI Agents with Reinforcement Learning Live Event with O'Reilly Media by
[Nicole Koenigstein](https://www.linkedin.com/in/nicole-koenigstein/).

# Timer

In [None]:
SET_TIMER = False  # False, True, or minutes as a number

import requests, types
url = "https://raw.githubusercontent.com/Nicolepcx/ORM-self-improving-ai-agents-course/main/timer.py"

timer = types.ModuleType("timer")
exec(requests.get(url).text, timer.__dict__)

timer.start_exam_timer(enabled=SET_TIMER, minutes=15, warn_minutes=5)

## Installation


In [1]:
%%capture
import os

if "COLAB_" not in "".join(os.environ.keys()):
    !uv pip install openpipe-art[backend]==0.4.11 tenacity "mcp>=1.11.0" "gql<4" aiohttp --prerelease allow --no-cache-dir
else:
    try:
        import numpy
        get_numpy = f"numpy=={numpy.__version__}"
    except:
        get_numpy = "numpy"
    try:
        import subprocess
        is_t4 = "Tesla T4" in str(subprocess.check_output(["nvidia-smi"]))
    except:
        is_t4 = False
    get_vllm, get_triton = (
        ("vllm==0.9.2", "triton==3.2.0") if is_t4 else ("vllm", "triton")
    )
    !uv pip install --upgrade \
        openpipe-art[backend]==0.4.11 tenacity pillow==11.3.0 protobuf==5.29.5 {get_vllm} {get_numpy} --prerelease allow --no-cache-dir
    !uv pip install -qqq {get_triton}


# Set API Keys

<font color="red" size="5">
<b>Attention for the Notebook to work </b>
</font>
<br>

you need an `OPENROUTER_API_KEY`! [get your key here](https://openrouter.ai/), the `RUBE_TOKEN` [get it here](https://rube.app/)

In [2]:
import os
from dotenv import load_dotenv


load_dotenv()

OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY')
RUBE_TOKEN = os.getenv('RUBE_TOKEN')

# Initialize variables that might be used later
mcp_scenarios = []



# Imports

In [None]:
import json
import random
import time
import json
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
from typing import Any

import art
from art.trajectories import Trajectory, Choice, TrajectoryGroup
from art.mcp import generate_scenarios
from art.mcp.generate_scenarios import preview_scenarios
from art.utils.logging import info, ok, step, warn, err

from openai import OpenAI

from contextlib import asynccontextmanager
from contextlib import AsyncExitStack

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
from mcp.client.session import ClientSession





## What are Agent Trajectories?

In ART, a **trajectory** represents a complete interaction sequence between an agent and its environment:

- **Messages**: User inputs, system prompts, assistant responses
- **Tool Calls**: Function invocations with arguments
- **Tool Results**: Responses from tool executions
- **Rewards**: Scores from RULER or other evaluators
- **Metrics**: Metadata about the interaction (turns, success, etc.)

Multiple trajectories are collected and grouped, then RULER performs **relative ranking** to determine which trajectories are better, enabling the model to learn from comparisons rather than absolute scores.


### Integrating Better Tools for Deep Search

**Why multiple tools matter:**
- Different search engines have different strengths (web search, code search, academic papers, etc.)
- Combining tools enables more comprehensive information gathering
- A "deep search agent" should leverage multiple sources for thorough research

**Available MCP Servers:**
- **Smithery**: Provides access to various tools and APIs
- **Rube**: New app from Composio offering integrations with 100+ tools and services via MCP
- **Exa**: Already integrated - provides web and code search

**Think about:**
- Which tools complement Exa's search capabilities?
- How should the agent decide which tool to use for different query types?
- What's the best strategy for combining results from multiple sources?


In [3]:
# Rube MCP Setup
# Rube is the new app from Composio for MCP connections
# No installation needed - just use the MCP URL and token
# Visit https://rube.app to generate your signed token

In [4]:
# @title Rube MCP Setup

# Rube MCP Configuration
# 1. Visit https://rube.app to generate your signed token
# 2. Set the token in your environment or .env file as RUBE_TOKEN
# 3. The MCP URL is fixed: https://rube.app/mcp

RUBE_MCP_URL = "https://rube.app/mcp"
RUBE_TOKEN = os.getenv('RUBE_TOKEN') or os.environ.get('RUBE_TOKEN', '')

if not RUBE_TOKEN:
    warn("RUBE_TOKEN not set. Please:")
    warn("1. Visit https://rube.app")
    warn("2. Generate a signed token")
    warn("3. Set it in your .env file as RUBE_TOKEN or export RUBE_TOKEN=your_token")
    RUBE_MCP_URL = None
else:
    ok("Rube MCP configured")
    print(f"MCP URL: {RUBE_MCP_URL}")
    print("Token: [configured]")


[09:26:26] [32mOK[0m    Rube MCP configured
MCP URL: https://rube.app/mcp
Token: [configured]


# Hands-on

<font color="red" size="10">
<b>TODO: </b>
</font>
<br>
<font color="black" size="5">
<b>Try out to generate different scenarios by adjusting  <code>USE_CASE</code> you can also experiment with <code>NUM_SCENARIOS</code> or <code>MAX_TURNS</code>.</b>
</font>



In [12]:
NUM_SCENARIOS = 5 # Small number for demo
MAX_TURNS = 5 # Small number for demo
USE_CASE = "I want to search for tools that can help me send an email. Show me what tools are available and what the workflow would look like."

In [5]:
# @title Rube MCP Helper Functions

# Rube MCP session helper (similar to Exa MCP setup)
if RUBE_MCP_URL and RUBE_TOKEN:
    RUBE_HEADERS = {"Authorization": f"Bearer {RUBE_TOKEN}"}

    @asynccontextmanager
    async def rube_mcp_session():
        """Create a Rube MCP session with authentication."""
        async with streamablehttp_client(RUBE_MCP_URL, headers=RUBE_HEADERS) as (read, write, _):
            async with ClientSession(read, write) as session:
                await session.initialize()
                yield session

    async def list_rube_tools_and_resources():
        """List available tools and resources from Rube MCP server."""
        async with rube_mcp_session() as session:
            tools = await session.list_tools()
            try:
                resources = await session.list_resources()
            except Exception:
                class _Empty:
                    resources = []
                resources = _Empty()
            return tools, resources

    ok("Rube MCP helpers configured")
else:
    warn("Rube MCP not configured - skipping Rube examples")


[09:26:30] [32mOK[0m    Rube MCP helpers configured


In [6]:
# @title OpenRouter + Rube MCP: Comprehensive Workflow Example

def convert_mcp_tool_to_openai(tool) -> dict[str, Any]:
    """Convert MCP tool to OpenAI function format."""
    schema = tool.inputSchema or {"type": "object", "properties": {}, "required": []}
    if schema.get("type") != "object":
        schema = {"type": "object", "properties": {}, "required": []}
    return {
        "type": "function",
        "function": {
            "name": tool.name,
            "description": tool.description or "",
            "parameters": schema,
        },
    }


def _extract_streams(transport):
    """Extract read/write streams from transport."""
    if isinstance(transport, tuple):
        if len(transport) >= 2:
            return transport[0], transport[1]
        raise ValueError(f"Unexpected tuple transport len={len(transport)}")
    if hasattr(transport, "read_stream") and hasattr(transport, "write_stream"):
        return transport.read_stream, transport.write_stream
    raise ValueError(f"Unsupported transport shape: {type(transport)}")


async def list_rube_tools_example(mcp_server_url: str, headers: dict):
    """Example: List all available Rube tools."""
    exit_stack = AsyncExitStack()

    transport = await exit_stack.enter_async_context(streamablehttp_client(mcp_server_url, headers=headers))
    read_stream, write_stream = _extract_streams(transport)

    session = await exit_stack.enter_async_context(ClientSession(read_stream, write_stream))
    await session.initialize()

    tools_resp = await session.list_tools()
    tools = tools_resp.tools or []

    print(f"üìã Found {len(tools)} Rube tools:\n")

    # Group tools by category
    tool_categories = {
        "Search & Discovery": ["RUBE_SEARCH_TOOLS", "RUBE_FIND_RECIPE", "RUBE_GET_TOOL_SCHEMAS"],
        "Execution": ["RUBE_MULTI_EXECUTE_TOOL", "RUBE_EXECUTE_RECIPE"],
        "Workflow Planning": ["RUBE_CREATE_PLAN"],
        "Connection Management": ["RUBE_MANAGE_CONNECTIONS"],
        "Recipe Management": ["RUBE_CREATE_UPDATE_RECIPE", "RUBE_GET_RECIPE_DETAILS", "RUBE_MANAGE_RECIPE_SCHEDULE"],
        "Remote Processing": ["RUBE_REMOTE_WORKBENCH", "RUBE_REMOTE_BASH_TOOL"],
    }

    for category, prefixes in tool_categories.items():
        category_tools = [t for t in tools if any(t.name.startswith(prefix) for prefix in prefixes)]
        if category_tools:
            print(f"  {category}:")
            for tool in category_tools:
                print(f"    - {tool.name}")
                if tool.description:
                    desc = tool.description.split('\n')[0][:80]
                    print(f"      {desc}...")
            print()

    await exit_stack.aclose()
    return tools


async def rube_workflow_example(mcp_server_url: str, headers: dict, use_case: str):
    """Example: Complete Rube workflow - search tools, plan, and execute."""
    exit_stack = AsyncExitStack()

    transport = await exit_stack.enter_async_context(streamablehttp_client(mcp_server_url, headers=headers))
    read_stream, write_stream = _extract_streams(transport)

    session = await exit_stack.enter_async_context(ClientSession(read_stream, write_stream))
    await session.initialize()

    tools_resp = await session.list_tools()
    openai_tools = [convert_mcp_tool_to_openai(t) for t in (tools_resp.tools or [])]

    llm = OpenAI(api_key=OPENROUTER_API_KEY, base_url="https://openrouter.ai/api/v1")

    system_prompt = """You are an AI assistant with access to Rube MCP tools for workflow automation.

Rube provides access to 500+ apps including:
- Communication: Slack, Gmail, Outlook, Teams, WhatsApp
- Development: GitHub, GitLab, Jira
- Productivity: Notion, Google Workspace, Microsoft 365
- Social: X (Twitter), Instagram, TikTok
- AI Tools: Various AI services

Key Rube tools:
1. RUBE_SEARCH_TOOLS - Always call this first to discover available tools for a use case
2. RUBE_MANAGE_CONNECTIONS - Connect to apps (OAuth, API keys, etc.)
3. RUBE_CREATE_PLAN - Generate execution plans for complex workflows
4. RUBE_MULTI_EXECUTE_TOOL - Execute tools in parallel
5. RUBE_FIND_RECIPE - Find existing reusable recipes
6. RUBE_EXECUTE_RECIPE - Run a saved recipe

Workflow pattern:
1. Search for tools using RUBE_SEARCH_TOOLS
2. Check connections, connect if needed via RUBE_MANAGE_CONNECTIONS
3. For complex tasks, create a plan with RUBE_CREATE_PLAN
4. Execute tools via RUBE_MULTI_EXECUTE_TOOL
5. Optionally save as recipe for reuse

Always use the session_id from RUBE_SEARCH_TOOLS in subsequent calls."""

    messages: list[dict[str, Any]] = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": use_case},
    ]

    print(f"üöÄ Starting Rube workflow for: {use_case}\n")

    # Multi-turn conversation to complete workflow
    max_turns = MAX_TURNS
    for turn in range(max_turns):
        resp = llm.chat.completions.create(
            model="openai/o4-mini",
            messages=messages,
            tools=openai_tools,
            tool_choice="auto" if openai_tools else None,
        )

        msg = resp.choices[0].message
        messages.append(msg.model_dump())

        if msg.content:
            print(f"üí¨ Assistant: {msg.content}\n")

        if msg.tool_calls:
            print(f"üîß Tool calls ({len(msg.tool_calls)}):")
            for tc in msg.tool_calls:
                tool_name = tc.function.name
                tool_args = json.loads(tc.function.arguments or "{}")

                # Truncate long arguments for display
                args_str = json.dumps(tool_args, indent=2)
                if len(args_str) > 200:
                    args_str = args_str[:200] + "..."

                print(f"  - {tool_name}")
                print(f"    Args: {args_str}")

                try:
                    tool_result = await session.call_tool(tool_name, tool_args)
                    result_content = tool_result.content

                    # Truncate long results
                    if len(result_content) > 500:
                        result_content = result_content[:500] + "... [truncated]"

                    print(f"    Result: {result_content[:200]}...\n")

                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "name": tool_name,
                        "content": tool_result.content,
                    })
                except Exception as e:
                    error_msg = f"Error: {str(e)}"
                    print(f"    {error_msg}\n")
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "name": tool_name,
                        "content": error_msg,
                    })
        else:
            # No more tool calls, workflow complete
            break

    await exit_stack.aclose()
    return messages[-1].get("content", "")


# Example 1: List available Rube tools
if RUBE_MCP_URL and RUBE_TOKEN:
    print("=" * 60)
    print("Example 1: Listing Rube Tools")
    print("=" * 60)
    try:
        tools = await list_rube_tools_example(RUBE_MCP_URL, RUBE_HEADERS)
        ok(f"Successfully listed {len(tools)} tools")
    except Exception as e:
        warn(f"Failed to list tools: {e}")
else:
    warn("Rube MCP not configured. Please set RUBE_TOKEN.")


# Example 2: Complete workflow (requires OpenRouter API key)
if RUBE_MCP_URL and RUBE_TOKEN and OPENROUTER_API_KEY:
    print("\n" + "=" * 60)
    print("Example 2: Rube Workflow - Search for Tools")
    print("=" * 60)

    # Simple example: search for tools to send an email
    try:
        result = await rube_workflow_example(
            RUBE_MCP_URL,
            RUBE_HEADERS,
            use_case = USE_CASE
        )
        print("\n‚úÖ Workflow completed!")
    except Exception as e:
        warn(f"Workflow failed: {e}")
else:
    info("\nüí° To run the full workflow example:")
    info("   1. Set RUBE_TOKEN in your .env file")
    info("   2. Set OPENROUTER_API_KEY in your .env file")
    info("   3. Re-run this cell")


Example 1: Listing Rube Tools
üìã Found 12 Rube tools:

  Search & Discovery:
    - RUBE_FIND_RECIPE
      ...
    - RUBE_SEARCH_TOOLS
      ...
    - RUBE_GET_TOOL_SCHEMAS
      Retrieve input schemas for tools by slug. Returns complete parameter definitions...

  Execution:
    - RUBE_MULTI_EXECUTE_TOOL
      ...
    - RUBE_EXECUTE_RECIPE
      Executes a Recipe...

  Workflow Planning:
    - RUBE_CREATE_PLAN
      ...

  Connection Management:
    - RUBE_MANAGE_CONNECTIONS
      ...

  Recipe Management:
    - RUBE_MANAGE_RECIPE_SCHEDULE
      ...
    - RUBE_CREATE_UPDATE_RECIPE
      Convert executed workflow into a reusable notebook. Only use when workflow is co...
    - RUBE_GET_RECIPE_DETAILS
      ...

  Remote Processing:
    - RUBE_REMOTE_BASH_TOOL
      ...
    - RUBE_REMOTE_WORKBENCH
      ...

[09:26:32] [32mOK[0m    Successfully listed 12 tools

Example 2: Rube Workflow - Search for Tools
üöÄ Starting Rube workflow for: I want to search for tools that can help me send

In [7]:
# @title Generate Scenarios with Rube Tools

# Simple example: Generate scenarios for a selected Rube tool

if RUBE_MCP_URL and RUBE_TOKEN and OPENROUTER_API_KEY:
    # Step 1: Get available tools from Rube
    tools_result, resources_result = await list_rube_tools_and_resources()

    # Step 2: Choose a tool (or tools) to generate scenarios for
    # For this example, let's pick a simple tool like RUBE_SEARCH_TOOLS
    # You can modify this to select different tools
    selected_tool_name = "RUBE_SEARCH_TOOLS"  # Change this to any Rube tool

    # Find the selected tool
    selected_tool = None
    for tool in (tools_result.tools or []):
        if tool.name == selected_tool_name:
            selected_tool = tool
            break

    if selected_tool:
        info(f"Selected tool: {selected_tool_name}")

        # Step 3: Prepare tool info for scenario generation
        tool_info = {
            "name": selected_tool.name,
            "description": selected_tool.description or "",
            "parameters": selected_tool.inputSchema or {}
        }

        # Step 4: Generate scenarios (similar to Exa example)
        try:
            scenario_collection = await generate_scenarios(
                tools=[tool_info],
                resources=[],
                num_scenarios=NUM_SCENARIOS,
                show_preview=True,
                generator_model="openai/o4-mini",
                generator_api_key=OPENROUTER_API_KEY,
            )

            scenarios = [{"task": s.task, "difficulty": s.difficulty} for s in scenario_collection.scenarios]
            ok(f"Generated {len(scenarios)} scenarios for {selected_tool_name}")

            info("Sample scenarios:")
            preview_scenarios(scenarios, n=min(3, len(scenarios)))

        except Exception as e:
            warn(f"Scenario generation failed: {e}")
    else:
        warn(f"Tool '{selected_tool_name}' not found. Available tools:")
        for tool in (tools_result.tools or [])[:10]:  # Show first 10
            print(f"  - {tool.name}")
else:
    warn("Rube MCP or OpenRouter not configured. Please set RUBE_TOKEN and OPENROUTER_API_KEY.")

[09:26:46] [34mINFO[0m  Selected tool: RUBE_SEARCH_TOOLS
[09:26:46] [32mOK[0m    Using model: openai/o4-mini
[09:26:46] [34mINFO[0m  Available: 1 tool(s), 0 resource(s).
[09:26:46] [36mSTEP[0m  Preparing prompt & JSON schema &
[09:26:46] [36mSTEP[0m  Calling model: [1mopenai/o4-mini[0m &
[09:27:04] [32mOK[0m    Model responded in 17.25s.
[09:27:04] [34mINFO[0m  Raw content length: 1316 chars.
[09:27:04] [32mOK[0m    Parsed 5 scenario(s) successfully.
[09:27:04] [34mINFO[0m  Difficulty distribution:
[2m   1/5:   1  ‚ñà[0m
[2m   2/5:   1  ‚ñà[0m
[2m   3/5:   1  ‚ñà[0m
[2m   4/5:   1  ‚ñà[0m
[2m   5/5:   1  ‚ñà[0m
[2m   1. Search X (formerly Twitter) for tweets mentioning our brand in the last 24 hours and generate a summary report of the ke‚Ä¶  [90m(difficulty 1/5)[0m[0m
[2m   2. Send a templated welcome email via Gmail to a list of new hires, schedule an onboarding meeting for each in Google Calen‚Ä¶  [90m(difficulty 2/5)[0m[0m
[2m   3. Retrieve al

In [9]:
# @title Collect Tools from MCP
search_tools = []
all_tools = {}

# Collect Rube tools
if RUBE_MCP_URL and RUBE_TOKEN:
    try:
        tools_result, resources_result = await list_rube_tools_and_resources()
        rube_tools = []
        for tool in (tools_result.tools or []):
            tool_dict = {
                "name": tool.name,
                "description": tool.description or "",
                "parameters": tool.inputSchema or {},
                "source": "rube"
            }
            search_tools.append(tool_dict)
            rube_tools.append(tool_dict)
        all_tools["rube"] = rube_tools
        if rube_tools:
            ok(f"Collected {len(rube_tools)} tool(s) from Rube")
    except Exception as e:
        warn(f"Failed to collect Rube tools: {e}")
        all_tools["rube"] = []

if search_tools:
    info(f"Total tools collected: {len(search_tools)} from {len(all_tools)} source(s)")
else:
    warn("No tools collected. Make sure Exa and/or Rube MCP servers are configured.")

[09:27:47] [32mOK[0m    Collected 12 tool(s) from Rube
[09:27:47] [34mINFO[0m  Total tools collected: 12 from 1 source(s)


In [11]:
# @title Generate Scenarios with Enhanced Tool Set

async def generate_scenarios_with_enhanced_tools(search_tools: List[Dict], num_scenarios: int = 15):
    """
    Generate scenarios that leverage the enhanced tool set.
    These scenarios should encourage the agent to use multiple tools effectively.
    """
    if not OPENROUTER_API_KEY:
        warn("OPENROUTER_API_KEY required for scenario generation")
        return []

    # Organize tools by source for scenario generation
    tools_by_source = {}
    for tool in search_tools:
        source = tool["source"]
        if source not in tools_by_source:
            tools_by_source[source] = []
        tools_by_source[source].append({
            "name": tool["name"],
            "description": tool.get("description", ""),
            "parameters": tool.get("parameters", {})
        })

    info(f"Generating {num_scenarios} scenarios with enhanced tool set...")
    info(f"Tools available from: {', '.join(tools_by_source.keys())}")

    # Combine all tools for scenario generation
    all_tools_flat = [
        {
            "name": tool["name"],
            "description": tool.get("description", ""),
            "parameters": tool.get("parameters", {})
        }
        for tool in search_tools
    ]

    try:
        scenario_collection = await generate_scenarios(
            tools=all_tools_flat,
            resources=[],  # Can add resources if available
            num_scenarios=num_scenarios,
            show_preview=False,
            generator_model="openai/o4-mini",
            generator_api_key=OPENROUTER_API_KEY,
        )

        enhanced_scenarios = [
            {
                "task": s.task,
                "difficulty": s.difficulty,
                "tools_available": len(all_tools_flat)
            }
            for s in scenario_collection.scenarios
        ]

        ok(f"Generated {len(enhanced_scenarios)} enhanced scenarios")

        info("\nSample enhanced scenarios (encouraging multi-tool usage):")
        preview_scenarios(enhanced_scenarios, n=min(5, len(enhanced_scenarios)))

        return enhanced_scenarios

    except Exception as e:
        warn(f"Scenario generation failed: {e}")
        return []

# Generate enhanced scenarios
if search_tools and OPENROUTER_API_KEY:
    enhanced_scenarios = await generate_scenarios_with_enhanced_tools(
        search_tools,
        num_scenarios=15
    )


[09:29:17] [34mINFO[0m  Generating 15 scenarios with enhanced tool set...
[09:29:17] [34mINFO[0m  Tools available from: rube
[09:29:17] [32mOK[0m    Using model: openai/o4-mini
[09:29:17] [34mINFO[0m  Available: 12 tool(s), 0 resource(s).
[09:29:17] [36mSTEP[0m  Preparing prompt & JSON schema &
[09:29:17] [36mSTEP[0m  Calling model: [1mopenai/o4-mini[0m &
[09:29:47] [32mOK[0m    Model responded in 29.85s.
[09:29:47] [34mINFO[0m  Raw content length: 3528 chars.
[09:29:47] [32mOK[0m    Parsed 15 scenario(s) successfully.
[09:29:47] [34mINFO[0m  Difficulty distribution:
[2m   1/5:   3  ‚ñà‚ñà‚ñà[0m
[2m   2/5:   2  ‚ñà‚ñà[0m
[2m   3/5:   3  ‚ñà‚ñà‚ñà[0m
[2m   4/5:   3  ‚ñà‚ñà‚ñà[0m
[2m   5/5:   4  ‚ñà‚ñà‚ñà‚ñà[0m
[09:29:47] [32mOK[0m    Generated 15 scenarios in 29.98s total.
[09:29:47] [32mOK[0m    Generated 15 enhanced scenarios
[09:29:47] [34mINFO[0m  
Sample enhanced scenarios (encouraging multi-tool usage):
[2m   1. Locate and run the recipe that

### Summary: Improving Your Agent

**What we've covered:**

1. **More Comparisons** ‚úÖ
   - Generate 5-10+ trajectories per scenario (not just 2-3)
   - Use RULER to rank trajectories relatively
   - Vary strategies (different models, prompts, tool usage patterns)
   - More comparisons = better learning signal for the model

2. **Better Tools** ‚úÖ
   - Integrate multiple MCP servers (Exa + Rube)
   - Select complementary tools for deep search
   - Generate scenarios that encourage multi-tool usage
   - Train agent to intelligently combine results


**Remember**: RULER learns what "good" means from your specific MCP servers and scenarios - no labeled data required!


### Additional Resources

- **ART Documentation**: https://art.openpipe.ai
- **RULER Guide**: https://art.openpipe.ai/fundamentals/ruler
- **MCP Protocol**: https://modelcontextprotocol.io
- **Rube**: https://rube.app (generate your signed token for MCP access)
