<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</center>

# LangGraph Agents: Orchestrator–Worker Pattern

In this tutorial, we’ll build a multi-agent system using LangGraph's **Orchestrator–Worker pattern**, ideal for dynamically decomposing a task into subtasks, assigning them to specialized LLM agents, and synthesizing their responses.

This pattern is particularly well-suited when the structure of subtasks is unknown ahead of time—such as when writing modular code, creating multi-section reports, or conducting research. The **orchestrator** plans and delegates, while the **workers** each complete their assigned section.

We’ll also use **Phoenix** to trace and debug the orchestration process. With Phoenix, you can visually inspect which tasks the orchestrator generated, how each worker handled its section, and how the final output was assembled.

By the end of this notebook, you’ll learn how to:
- Use structured outputs to plan subtasks dynamically.
- Assign subtasks to LLM workers via LangGraph's `Send` API.
- Collect and synthesize multi-step LLM outputs.
- Trace and visualize orchestration using Phoenix.


In [None]:
!pip install langgraph langchain langchain_community "arize-phoenix==9.0.1" arize-phoenix-otel openinference-instrumentation-langchain

In [None]:
!pip install langchain_openai

In [3]:
import os
from getpass import getpass

from langgraph.graph import END, START, StateGraph



In [4]:
os.environ["OPENAI_API_KEY"] = getpass("🔑 Enter your OpenAI API key: ")

# Configure Phoenix Tracing

Make sure you go to https://app.phoenix.arize.com/ and generate an API key. This will allow you to trace your Langgraph application with Phoenix.

In [5]:
if "PHOENIX_API_KEY" not in os.environ:
    os.environ["PHOENIX_API_KEY"] = getpass("🔑 Enter your Phoenix API key: ")

if "PHOENIX_COLLECTOR_ENDPOINT" not in os.environ:
    os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = getpass("🔑 Enter your Phoenix Collector Endpoint")

In [6]:
from phoenix.otel import register

tracer_provider = register(project_name="Orchestrator", auto_instrument=True)

  from .autonotebook import tqdm as notebook_tqdm


AttributeError: DEEPSEEK

Orchestrator‑Workers • Research‑Paper Generator
----------------------------------------------
The orchestrator plans research‑paper *subsections* (abstract, background …),
spawns one worker per subsection, then stitches everything into a full draft.

In [7]:
import operator
from typing import Annotated, List, TypedDict

from IPython.display import Markdown
from langchain_core.messages import HumanMessage, SystemMessage

# Step 1: Defining the Planning Schema
To begin, we define a structured output schema using Pydantic. This schema ensures that the LLM returns well-formatted, predictable output when tasked with planning the structure of a research paper.

We create two models:

Subsection: Represents a single unit of the paper, including its name and a brief description of what it should cover.

Subsections: A wrapper that holds a list of these units.

By using these models with LangGraph’s with_structured_output feature, we enforce that the orchestrator LLM returns an organized plan — rather than freeform text — that downstream nodes (worker LLMs) can reliably use.

This schema acts as the blueprint for the rest of the workflow.

In [8]:
from langchain_core.pydantic_v1 import BaseModel, Field
from langgraph.constants import Send


class Subsection(BaseModel):
    name: str = Field(description="Name for this subsection of the research paper.")
    description: str = Field(
        description="Concise description of the general subjects to be covered in this subsection."
    )


class Subsections(BaseModel):
    Subsections: List[Subsection] = Field(description="All subsections of the research paper.")


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)
/var/folders/wk/s_jl7wls0fg8n6yfszzw84440000gn/T/ipykernel_55016/3678950455.py:2: LangGraphDeprecatedSinceV10: Importing Send from langgraph.constants is deprecated. Please use 'from langgraph.types import Send' instead. Deprecated in LangGraph V1.0 to be removed in V2.0.
  from langgraph.constants import Send


# Step 2: Set Up LLM and Tools
We initialize gpt-3.5-turbo as our base LLM and bind it to the Subsections schema to create the orchestrator. We also load a DuckDuckGo search tool to allow worker agents to enrich sections with live web data.

In [16]:
TAVILIY_API_KEY = getpass.getpass("Tavily API Key:")
os.environ["TAVILY_API_KEY"] = TAVILIY_API_KEY

AttributeError: 'function' object has no attribute 'getpass'

In [10]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
orchestrator_llm = llm.with_structured_output(Subsections)

search = TavilySearchResults(k=5)

  search = TavilySearchResults(k=5)


ValidationError: 1 validation error for TavilySearchAPIWrapper
  Value error, Did not find tavily_api_key, please add an environment variable `TAVILY_API_KEY` which contains it, or pass `tavily_api_key` as a named parameter. [type=value_error, input_value={}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

# Step 3: Define Graph State
We define two state schemas:

State holds the overall research paper workflow, including the topic, planned subsections, completed text, and final output.

WorkerState captures the task assigned to each worker — a single subsection — and where their contributions are accumulated.

This shared state structure lets LangGraph coordinate work between the orchestrator and its worker agents.

In [11]:
class State(TypedDict):
    topic: str
    subsections: List[Subsection]
    completed_subsections: Annotated[List[str], operator.add]
    final_paper: str
    search_results: str


class WorkerState(TypedDict):
    subsection: Annotated[Subsection, lambda x, y: y]
    completed_subsections: Annotated[List[str], operator.add]
    search_results: str

# Step 4: Define Nodes
We define three core nodes in the graph:

orchestrator: Dynamically plans the structure of the paper by generating a list of subsections using structured output.

subsection_writer: Acts as a worker that writes one full subsection in academic Markdown, using the provided description and scope.

synthesiser: Merges all completed subsections into a single cohesive draft, separating sections with visual dividers.

Each node contributes to a modular, scalable paper-writing pipeline — and with Phoenix tracing, you can inspect every generation step in detail.

In [12]:
def orchestrator(state: State):
    """Plan the research‑paper subsections dynamically."""
    plan = orchestrator_llm.invoke(
        [
            SystemMessage(content="Generate a detailed subsection plan for a research paper."),
            HumanMessage(content=f"Paper topic: {state['topic']}"),
        ]
    )
    return {"subsections": plan.Subsections}


def subsection_writer(state: WorkerState):
    sub = state["subsection"]
    search_info = state.get("search_results", "")

    response = llm.invoke(
        [
            SystemMessage(
                content=(
                    "You're writing a research-paper subsection using the following web search result as background and also your own knowledge."
                )
            ),
            HumanMessage(
                content=(
                    f"Subsection: {sub.name}\n"
                    f"Description: {sub.description}\n"
                    f"Shared Search Results:\n{search_info}\n\n"
                    "Now write the section."
                )
            ),
        ]
    )
    return {"completed_subsections": [response.content]}


def synthesiser(state: State):
    """Concatenate all finished subsections into the final paper draft."""
    full_paper = "\n\n---\n\n".join(state["completed_subsections"])
    return {"final_paper": full_paper}


def search_tool(state: State):
    query = f"{state['topic']} research summary"
    search_results = search.invoke(query)
    return {"search_results": search_results}

# Step 5: Assign Workers Dynamically
This function uses LangGraph’s Send API to launch a separate subsection_writer worker for each planned subsection. By dynamically spawning one worker per section, the system scales flexibly based on the topic’s complexity.

This approach is ideal for research paper generation, where the number of sections is not known ahead of time — and Phoenix helps trace the output from each worker node independently.

In [13]:
def assign_workers(state: State):
    """Launch one subsection_writer per planned subsection (after shared search)."""
    return [
        Send("subsection_writer", {"subsection": s, "search_results": state["search_results"]})
        for s in state["subsections"]
    ]

# Step 6: Construct the LangGraph Workflow
Here, we build the full LangGraph pipeline using a StateGraph. The workflow begins with the orchestrator node (to plan subsections), dynamically routes work to subsection_writer nodes (via assign_workers), and then aggregates all outputs in the synthesiser node.

LangGraph’s conditional edges and Send API enable scalable parallelism — and with Phoenix tracing enabled, you can view how each section is created, tracked, and stitched together.

In [14]:
builder = StateGraph(State)

builder.add_node("orchestrator", orchestrator)
builder.add_node("search_tool", search_tool)
builder.add_node("subsection_writer", subsection_writer)
builder.add_node("synthesiser", synthesiser)

builder.add_edge(START, "orchestrator")
builder.add_edge("orchestrator", "search_tool")
builder.add_conditional_edges("search_tool", assign_workers, ["subsection_writer"])
builder.add_edge("subsection_writer", "synthesiser")
builder.add_edge("synthesiser", END)


research_paper_workflow = builder.compile()

# Step 7: Run the Research Paper Generator
We now invoke the compiled LangGraph with a sample topic: “Scaling Laws for Large Language Models.” The orchestrator plans the outline, each worker drafts a subsection in parallel, and the synthesizer assembles the full paper.

With Phoenix integrated, every step is traced — from section planning to writing and synthesis — giving you full visibility into the execution flow and helping debug or refine outputs.

In [15]:
research_topics = [
    "How do scaling laws impact the performance of large language models?",
    "What are the key challenges in training very large transformer models?",
    "How much data is needed to train a performant LLM?",
    "Explain the relationship between model size and accuracy in language models.",
    "Why are modern language models undertrained, and how can we fix it?",
    "What is compute-optimal training for LLMs?",
    "Compare different scaling strategies for training foundation models.",
    "How do researchers determine the best size for a transformer model?",
    "What are the trade-offs between training time and model performance?",
    "Summarize recent findings on training efficiency for large-scale language models.",
]

for topic in research_topics:
    state = research_paper_workflow.invoke({"topic": topic})

print("===== RESEARCH PAPER DRAFT =====\n")
Markdown(state["final_paper"])

Failed to export batch code: 401, reason: 


NameError: name 'search' is not defined

# Step 8: Check out your traces in Phoenix!

# Let's add some Evaluations (Evals)

In this section we will evaluate Agent Path Convergence.

**avg(minimum steps taken for this query / steps in the run)**

This helps compute the consistency of your orchestrator, across similar queries.

See https://arize.com/docs/phoenix/evaluation/how-to-evals/running-pre-tested-evals/agent-path-convergence

In [None]:
from phoenix.client import Client
from phoenix.client.types.spans import SpanQuery

client = Client()
df = client.spans.get_spans_dataframe(
    query=SpanQuery().where("name == 'LangGraph'"), 
    project_identifier="Orchestrator"
)
df

In [None]:
optimal_path_length = 7  # adjust this for your use case

## Generate scores

In [None]:
import json

all_steps = []
for row in df["attributes.output.value"]:
    data = json.loads(row)
    num_subsections = len(data.get("subsections", []))
    all_steps.append(num_subsections)

convergences = []
optimal = min(all_steps)

ratios = [optimal / step for step in all_steps]

df["score"] = ratios
df["explanation"] = ["Minimum path length / this path length"] * 11

In [None]:
df

# View your Evals in Phoenix

At the top of your traces you will see a score under "Agent Path Convergence". That is the average of the scores we computed and should serve as your final metric for this evaluation.

In [None]:
from phoenix.client import AsyncClient

px_client = AsyncClient()
await px_client.annotations.log_span_annotations_dataframe(
    dataframe=df,
    annotation_name="Agent Path Convergence",
    annotator_kind="LLM",
)