<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</center>

# <center>OpenAI Agents: Orchestrator-Worker Patterns</center>

A starter guide for building an agent loop using the `openai-agents` library.

This pattern uses orchestators and workers. The orchestrator chooses which worker to use for a specific sub-task. The worker attempts to complete the sub-task and return a result. The orchestrator then uses the result to choose the next worker to use until a final result is returned.

In the following example, we'll build an agent which creates a portfolio of stocks and ETFs based on a user's investment strategy.
1.  **Orchestrator:** Chooses which worker to use based on the user's investment strategy.
2.  **Research Agent:** Searches the web for information about stocks and ETFs that could support the user's investment strategy.
3.  **Evaluation Agent:** Evaluates the research report and provides feedback on what data is missing.
4.  **Portfolio Agent:** Creates a portfolio of stocks and ETFs based on the research report.

⚠️ You'll need a [free Phoenix Cloud](https://app.arize.com/auth/phoenix/login) account and an OpenAI Key for this tutorial

### Install Libraries

In [None]:
# Install base libraries for OpenAI
%pip install -qq openai openai-agents

# Install libraries for OpenInference/OpenTelemetry tracing
%pip install -qqq arize-phoenix openinference-instrumentation-openai-agents openinference-instrumentation-openai openinference-instrumentation

### Setup Keys

Add your OpenAI API key to the environment variable `OPENAI_API_KEY`.

Copy your Phoenix `API_KEY` from your settings page at [app.phoenix.arize.com](https://app.phoenix.arize.com).

In [None]:
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("🔑 Enter your OpenAI API key: ")

if "PHOENIX_API_KEY" not in os.environ:
    os.environ["PHOENIX_API_KEY"] = getpass("🔑 Enter your Phoenix API key: ")

if "PHOENIX_COLLECTOR_ENDPOINT" not in os.environ:
    os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = getpass("🔑 Enter your Phoenix Collector Endpoint")

### Setup Tracing and Phoenix Client

In [None]:
from phoenix.otel import register

PROJECT_NAME = "openai-agents-orchestrator-workers"
tracer_provider = register(project_name=PROJECT_NAME, auto_instrument=True)

In [None]:
from phoenix.client import AsyncClient

px_client = AsyncClient()

## Creating the agents

In [None]:
from pprint import pprint
from textwrap import dedent
from typing import Literal

from agents import Agent, Runner, TResponseInputItem, WebSearchTool
from agents.model_settings import ModelSettings
from pydantic import BaseModel, Field


class PortfolioItem(BaseModel):
    ticker: str = Field(description="The ticker of the stock or ETF.")
    allocation: float = Field(
        description="The percentage allocation of the ticker in the portfolio. The sum of all allocations should be 100."
    )
    reason: str = Field(description="The reason why this ticker is included in the portfolio.")


class Portfolio(BaseModel):
    tickers: list[PortfolioItem] = Field(
        description="A list of tickers that could support the user's stated investment strategy."
    )


class EvaluationFeedback(BaseModel):
    feedback: str = Field(
        description="What data is missing in order to create a portfolio of stocks and ETFs based on the user's investment strategy."
    )
    score: Literal["pass", "needs_improvement", "fail"] = Field(
        description="A score on the research report. Pass if you have at least 5 tickers with data that supports the user's investment strategy to create a portfolio, needs_improvement if you do not have enough supporting data, and fail if you have no tickers."
    )


evaluation_agent = Agent(
    name="Evaluation Agent",
    instructions=dedent(
        """You are a senior financial analyst. You will be provided with a stock research report with positive and negative catalysts. Your task is to evaluate the report and provide feedback on what to improve."""
    ),
    model="gpt-4.1",
    output_type=EvaluationFeedback,
)

portfolio_agent = Agent(
    name="Portfolio Agent",
    instructions=dedent(
        """You are a senior financial analyst. You will be provided with a stock research report. Your task is to create a portfolio of stocks and ETFs that could support the user's stated investment strategy. Include facts and data from the research report in the stated reasons for the portfolio allocation."""
    ),
    model="o4-mini",
    output_type=Portfolio,
)

research_agent = Agent(
    name="FinancialSearchAgent",
    instructions=dedent(
        """You are a research assistant specializing in financial topics. Given a stock ticker, use web search to retrieve up‑to‑date context and produce a short summary of at most 50 words. Focus on key numbers, events, or quotes that will be useful to a financial analyst."""
    ),
    model="gpt-4.1",
    tools=[WebSearchTool()],
    model_settings=ModelSettings(tool_choice="required", parallel_tool_calls=True),
)

orchestrator_agent = Agent(
    name="Routing Agent",
    instructions=dedent("""You are a senior financial analyst. You are trying to create a portfolio based on my stated investment strategy. Your task is to handoff to the appropriate agent or tool.

    First, handoff to the research_agent to give you a report on stocks and ETFs that could support the user's stated investment strategy.
    Then, handoff to the evaluation_agent to give you a score on the research report. If the evaluation_agent returns a needs_improvement or fail, continue using the research_agent to gather more information.
    Once the evaluation_agent returns a pass, handoff to the portfolio_agent to create a portfolio."""),
    model="gpt-4.1",
    handoffs=[
        research_agent,
        evaluation_agent,
        portfolio_agent,
    ],
)

### Run our Workflow! 

Run the cell below and enter your investment strategy

In [None]:
import asyncio
from uuid import uuid4

import opentelemetry.trace as trace
from openinference.semconv.trace import SpanAttributes

tracer = trace.get_tracer("openai-agents-orchestrator-workers")

MAX_PASSES = 10
PASS_TIMEOUT = 500


async def run_agent_workflow():
    user_input = input("Enter your investment strategy: ")
    input_items: list[TResponseInputItem] = [{"content": user_input, "role": "user"}]

    with tracer.start_as_current_span(
        "Agent workflow",
        attributes={
            SpanAttributes.OPENINFERENCE_SPAN_KIND: "agent",
            SpanAttributes.INPUT_VALUE: user_input,
            SpanAttributes.SESSION_ID: str(uuid4()),
        },
    ) as root_span:
        passes = 0
        while passes < MAX_PASSES:
            try:
                orchestrator = await asyncio.wait_for(
                    Runner.run(orchestrator_agent, input_items),
                    timeout=PASS_TIMEOUT,
                )
            except asyncio.TimeoutError:
                print(f"Pass {passes + 1} hit the {PASS_TIMEOUT}s timeout—aborting.")
                break

            out = orchestrator.final_output
            pprint(out)

            if isinstance(out, Portfolio):
                break

            input_items = orchestrator.to_input_list()
            passes += 1

        root_span.set_attribute(SpanAttributes.OUTPUT_VALUE, str(out))

    print("AGENT COMPLETE")


await run_agent_workflow()

# Evaluate the Agent

Here, we will evaluate the agent’s trajectory. This means checking whether the sequence of steps it took was logical, efficient, and aligned with completing the user’s request. Then, we will log those results back to Phoenix.

In [None]:
import pandas as pd

df = await px_client.spans.get_spans_dataframe(project_name=PROJECT_NAME)

trace_df = df.groupby("context.trace_id").agg(
    {
        "attributes.input.value": "first",
        "attributes.output.value": lambda x: " ".join(x.dropna()),
    }
)


def extract_input_content(input_value):
    try:
        if pd.isna(input_value) or input_value is None:
            return None

        # JSON string
        if isinstance(input_value, str):
            import json

            try:
                parsed = json.loads(input_value)
                inputs = parsed.get("input", [])
                if isinstance(inputs, list) and len(inputs) > 0:
                    return inputs[0].get("content")
                return None
            except Exception:
                return None

        return None

    except (AttributeError, TypeError, KeyError):
        return None


# Apply function row by row
trace_df["attributes.input.value"] = trace_df["attributes.input.value"].apply(extract_input_content)
trace_df.head()

In [None]:
TRAJECTORY_PERFORMANCE_PROMPT = """
You are a helpful AI bot that checks whether an AI agent's internal trajectory is accurate and effective.

You will be given:
1. You will be given an input query from a user that the agent responded to
2. The agent's actual trajectory of tool calls and responses

An accurate trajectory:
- Progresses logically from step to step
- Follows the golden trajectory where reasonable
- Shows a clear path toward completing a goal
- Is reasonably efficient (doesn't take unnecessary detours)

##

User Query:
{attributes.input.value}

Trajectory:
{attributes.output.value}

##

Your response must be a single string, either `correct` or `incorrect`, and must not include any additional text.

- Respond with `correct` if the agent's trajectory adheres to the rubric and accomplishes the task effectively.
- Respond with `incorrect` if the trajectory is confusing, misaligned with the goal, inefficient, or does not accomplish the task.
"""

In [None]:
from phoenix.evals import OpenAIModel, llm_classify
from phoenix.trace import suppress_tracing

model = OpenAIModel(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o-mini",
    temperature=0.0,
)

rails = ["correct", "incorrect"]

with suppress_tracing():
    eval_results = llm_classify(
        dataframe=trace_df,
        template=TRAJECTORY_PERFORMANCE_PROMPT,
        model=model,
        rails=rails,
        provide_explanation=True,
        verbose=False,
        concurrency=20,
    )

eval_results["score"] = eval_results["label"].apply(lambda x: 1 if x == "correct" else 0)

In [None]:
root_spans = df[df["parent_id"].isna()][["context.trace_id", "context.span_id"]]
eval_results = eval_results[["score", "label", "explanation"]]

trajectory_eval_df = pd.merge(trace_df, eval_results, left_index=True, right_index=True, how="left")

trajectory_eval_df = pd.merge(
    trajectory_eval_df.reset_index(), root_spans, on="context.trace_id", how="left"
).set_index("context.span_id", drop=False)

### Log Evals to Phoenix

In [None]:
await px_client.annotations.log_span_annotations_dataframe(
    dataframe=trajectory_eval_df,
    annotation_name="TRAJECTORY PERFORMANCE",
    annotator_kind="LLM",
)

# View Results in Phoenix

When viewing the traces in Phoenix, you can see how the agent delegated subtasks to specialized agents step-by-step. The trace shows the order in which each agent responded, making it easy to verify the flow from flight planning to hotel booking to activity suggestions. 

The agent trajectory evaluation is tied to the root span of the trace, allowing you to assess the overall sequence of steps. 

![Results](https://storage.googleapis.com/arize-phoenix-assets/assets/images/openai-agentframework-eval.png)
