# Chapter 11 — Multi-Agent Orchestration

## 1. Introduction

**Orchestration** = how your app coordinates agents: who runs, in what order, and how the next step is chosen. In practice you’ll mix two styles:

- **LLM-led orchestration** — let an agent plan, call tools, and hand off to specialists.
- **Code-led orchestration** — drive the flow with explicit Python logic (deterministic, cheap, fast).

Use both: start LLM-led for open-ended tasks, then “fence” it with code for reliability, cost, and latency.

---

## 2. LLM-led orchestration

An agent with **instructions + tools + handoffs** can plan: search, retrieve files, run code, and delegate to specialists.

**Works best for**: fuzzy goals, exploratory research, variable paths.  
**Keys to success**:
- Tight **instructions** (what tools exist, when/how to use, boundaries).
- **Handoffs** to specialists (writer, planner, analyst, reviewer).
- **Self-reflection** loops (ask the model to check its own plan/output).
- **Observability** (hooks + tracing) and iterative prompt tuning.
- **Eval sets** to prevent regressions.



In [None]:
from agents import Agent, Runner, AgentHooks, RunContextWrapper, set_tracing_disabled
from agents.extensions.models.litellm_model import LitellmModel
from dotenv import load_dotenv
import os

# --- Model setup ---
load_dotenv()
api_key = os.getenv("API_KEY")

llm = LitellmModel(
    model="gpt-4.1-nano-2025-04-14",   
    api_key=api_key,
    base_url="https://api.openai.com/v1",
)

set_tracing_disabled(True)  # keep traces off for a minimal demo

# --- Hook: accept both positional/keyword forms to avoid signature errors ---
class HandoffLog(AgentHooks):
    async def on_handoff(self, *args, **kwargs):
        """
        SDKs may call on_handoff with positional OR keyword args depending on version.
        We'll extract safely either way.
        Expected names (when kw): context, agent (to), source (from)
        """
        # kw-first
        from_agent = kwargs.get("source")
        to_agent = kwargs.get("agent")

        # fallback to positionals if needed: (self, context, agent, source) OR variants
        if to_agent is None and len(args) >= 2:
            # args[1] might be context or agent depending on version; try to detect
            for a in args:
                if isinstance(a, Agent):
                    # first Agent we see (other than self) could be 'agent' or 'source'
                    if to_agent is None:
                        to_agent = a
                    elif from_agent is None:
                        from_agent = a

        if from_agent and to_agent:
            print(f"[HANDOFF] {from_agent.name} → {to_agent.name}")
        else:
            print("[HANDOFF] (could not parse from/to agent names for this SDK version)")

# --- Specialists ---
history_tutor = Agent(
    name="History Tutor",
    handoff_description="Specialist for historical questions.",
    instructions="Answer history questions clearly, with key events and context.",
    model=llm,
)

math_tutor = Agent(
    name="Math Tutor",
    handoff_description="Specialist for math questions.",
    instructions="Solve the math problem step-by-step, then give a concise final answer.",
    model=llm,
)

# --- Triage (LLM-led orchestration: it decides via handoffs) ---
triage = Agent(
    name="Triage Agent",
    instructions=(
        "Decide which specialist should handle the user's question. "
        "If it's math, hand off to Math Tutor; if it's history, hand off to History Tutor. "
        "You must NOT answer yourself—always hand off to the appropriate specialist."
    ),
    handoffs=[history_tutor, math_tutor],
    model=llm,
    hooks=HandoffLog(),  # prints handoff decisions
)

# --- Try two queries; triage chooses the path ---
res1 = await Runner.run(triage, "What is the capital of France?")
print("\n[ANSWER 1]\n", res1.final_output)

res2 = await Runner.run(triage, "Solve 2x + 3 = 11. What is x?")
print("\n[ANSWER 2]\n", res2.final_output)


[HANDOFF] Triage Agent → History Tutor

[ANSWER 1]
 The capital of France is Paris.
[HANDOFF] Triage Agent → Math Tutor

[ANSWER 2]
 Let's solve the equation \( 2x + 3 = 11 \) step by step:

1. Subtract 3 from both sides to isolate the term with \( x \):
   \[
   2x + 3 - 3 = 11 - 3
   \]
   \[
   2x = 8
   \]

2. Divide both sides by 2 to solve for \( x \):
   \[
   \frac{2x}{2} = \frac{8}{2}
   \]
   \[
   x = 4
   \]

**Final answer:** \( x = 4 \).


## 3. Code-led orchestration

LLMs can self-orchestrate, but for **speed, cost, and determinism**, code-led orchestration is rock solid. The mindset: **use the model to *judge*, use code to *decide***.


In [6]:
import os, re, asyncio
from dotenv import load_dotenv
from agents import Agent, Runner, set_tracing_disabled, ModelSettings
from agents.extensions.models.litellm_model import LitellmModel

# --- 0) Model setup (used only to PRODUCE content, not to decide orchestration) ---
load_dotenv()
api_key = os.getenv("API_KEY")
llm = LitellmModel(
    model="gpt-4.1-nano-2025-04-14",
    api_key=api_key,
    base_url="https://api.openai.com/v1",
)
set_tracing_disabled(True)

# --- 1) Specialists: content generators only ---
research_agent = Agent(
    name="Research",
    instructions="Answer general knowledge questions clearly and concisely.",
    model=llm,
    model_settings=ModelSettings(temperature=0.4),
)

math_agent = Agent(
    name="Math",
    instructions="Solve math problems step-by-step, end with a single final numeric answer.",
    model=llm,
    model_settings=ModelSettings(temperature=0.2),
)

history_agent = Agent(
    name="History",
    instructions="Explain historical facts with key context and dates in 2–3 sentences.",
    model=llm,
    model_settings=ModelSettings(temperature=0.2),
)

editor_agent = Agent(
    name="Editor",
    instructions="Rewrite the text to be clearer and more concise while preserving meaning.",
    model=llm,
    model_settings=ModelSettings(temperature=0.3),
)

reviewer_agent = Agent(
    name="Reviewer",
    instructions=(
        "Review the text for clarity and correctness. "
        "Respond ONLY with PASS if it is good, otherwise respond with FAIL and one short reason."
    ),
    model=llm,
    model_settings=ModelSettings(temperature=0.0),
)

# --- 2) PURE-PYTHON ROUTER: code decides which agent to run ---
HISTORY_HINTS = {"who was", "when was", "in what year", "battle of", "revolution", "dynasty", "emperor", "king", "queen"}

def determine_route(query: str) -> str:
    q = query.lower().strip()
    # (a) crude math detection: equation-like or mathy keywords
    if re.search(r"[\d\)\]]\s*([+\-/*^]|=)\s*[\d\(\[]", q) or "solve" in q or "derivative" in q:
        return "math"
    # (b) crude history detection: keyword hints
    if any(h in q for h in HISTORY_HINTS):
        return "history"
    # (c) default to research
    return "research"

# --- 3) CODE-LED PIPELINE RUNNERS ---
async def run_code_led_once(query: str):
    # Step 1: route in code
    route = determine_route(query)
    print(f"[ROUTER(code)] → {route}")

    # Step 2: run the chosen specialist (NO model planning; we call directly)
    agent_by_route = {"math": math_agent, "history": history_agent, "research": research_agent}
    base = await Runner.run(agent_by_route[route], query)
    print(f"[{route.upper()}] {base.final_output}")

    # Step 3: serial pipeline: edit → review loop (code-enforced stop condition)
    edited = await Runner.run(editor_agent, base.final_output)

    # Review until PASS or max rounds
    text = edited.final_output
    for i in range(3):
        verdict = await Runner.run(reviewer_agent, text)
        v = verdict.final_output.strip().upper()
        print(f"[REVIEW round {i+1}] {v}")
        if v == "PASS":
            break
        # If FAIL, ask the editor to revise based on that single reason
        text = (await Runner.run(editor_agent, f"Revise this to address the issue: {verdict.final_output}\n\n{text}")).final_output

    print("\n[FINAL OUTPUT]\n" + text + "\n" + "-"*60)

# --- 4) PARALLEL DEMO (independent queries in parallel; code decides to parallelize) ---
async def run_parallel_demo():
    queries = [
        "Who was Napoleon?",               # history
        "Solve 2x + 3 = 11",               # math
        "What is quantum computing?",      # research
    ]
    await asyncio.gather(*(run_code_led_once(q) for q in queries))

# --- 5) Try it ---
await run_parallel_demo()

[ROUTER(code)] → history
[ROUTER(code)] → math
[ROUTER(code)] → research
[RESEARCH] Quantum computing is a type of computing that uses principles of quantum mechanics to perform calculations. Unlike classical computers, which use bits (0s and 1s), quantum computers use quantum bits or qubits. Qubits can exist in multiple states simultaneously thanks to superposition, and they can be entangled with each other, allowing quantum computers to process complex problems more efficiently than classical computers for certain tasks. Quantum computing has potential applications in cryptography, optimization, drug discovery, and solving complex scientific problems.
[HISTORY] Napoleon Bonaparte was a French military and political leader who rose to prominence during the French Revolution and became Emperor of the French from 1804 to 1814, and briefly in 1815 during the Hundred Days. He is known for his extensive military campaigns across Europe, which significantly shaped European history, and for 