<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</center>

# <center>AutoGen Agents: Orchestrator-Worker Patterns</center>

In this tutorial, we'll explore orchestrator agent workflows with [AutoGen GroupChats](https://microsoft.github.io/autogen/dev//user-guide/core-user-guide/design-patterns/group-chat.html).

This pattern enables collaboration among multiple specialized agents, activating only the most relevant one based on the current subtask context. Instead of relying on a fixed sequence, agents dynamically participate depending on the state of the conversation. At termination, results are synthesized together.

Agent orchestrator workflows simplifies this routing pattern through a central orchestrator (`GroupChatManager`) that selectively delegates tasks to the appropriate agents (workers). Each agent monitors the conversation but only contributes when their specific expertise is required. With Phoenix tracing, you get full visibility into the orchestration flow to see which agents engaged, when they were activated, and why.

In this example, we'll build a smart trip planning assistant where subtasks like destination research, hotel booking, and activity suggestions are dynamically sent to the right specialized agent.

By the end of this tutorial, you’ll learn how to:

- Set up multiple specialized AutoGen agents in a `GroupChat`

- Use a `GroupChatManager` to enable dynamic agent routing

- Incorporate human feedback in your agent set up

- Trace and visualize agent interactions using Phoenix

⚠️ You'll need a [free Phoenix Cloud](https://app.arize.com/auth/phoenix/login) account and an OpenAI Key for this tutorial


## Set up Keys and Dependencies


In [None]:
%pip install -qqqq autogen-agentchat autogen_ext openai "ag2[openai]"

In [None]:
%pip install -qqqq arize-phoenix arize-phoenix-otel openinference-instrumentation-openai openinference-instrumentation-autogen-agentchat

In [None]:
import os
from getpass import getpass

import autogen
import pandas as pd

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("🔑 Enter your OpenAI API key: ")

if "PHOENIX_API_KEY" not in os.environ:
    os.environ["PHOENIX_API_KEY"] = getpass("🔑 Enter your Phoenix API key: ")

if "PHOENIX_COLLECTOR_ENDPOINT" not in os.environ:
    os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = getpass("🔑 Enter your Phoenix Collector Endpoint")

## Configure Tracing


In [None]:
from phoenix.otel import register

project_name = "autogen-agents-orchestrator-worker"
tracer_provider = register(
    project_name=project_name,
    auto_instrument=True,
)

## Example Orchestrator Task: Travel Agent

This example shows how to build a dynamic travel planning assistant. A `GroupChatManager` coordinates specialized agents to adapt to the user's evolving travel needs.

**User Interaction**:
A `UserProxyAgent` acts as the human user, configured with `human_input_mode="TERMINATE" `and a custom `is_termination_msg` that ends the session when a message ends with TERMINATE.

**Specialized Travel Agents**:
Three AssistantAgents handle specific tasks.

- Flight Planner — suggests flight options.

- Hotel Finder — recommends accommodations.

- Activity Suggester — proposes activities and attractions.

**GroupChat Setup**:
A GroupChat bundles the user and specialized agents, managing message flow with a maximum round limit (ex: 10 rounds).

**Orchestrator**:
The `GroupChatManager` oversees the conversation, routing tasks to the right agent based on context.

![Diagram](https://storage.googleapis.com/arize-phoenix-assets/assets/images/autogen_orchestrator_diagram.png)

## Define Agent

The `llm_config` specifies the configuration used for all the assistant agents.


In [None]:
llm_config = {
    "model": "gpt-4o",
    "api_key": os.environ["OPENAI_API_KEY"],
}

In [None]:
# Specialized LLM Agents
flight_planner = autogen.AssistantAgent(
    name="FlightPlanner",
    llm_config=llm_config,
    system_message="You are a flight planning assistant. You help book flights and find the best travel routes. Focus on using freely accessible sources.",
)

hotel_finder = autogen.AssistantAgent(
    name="HotelFinder",
    llm_config=llm_config,
    system_message="You are a hotel booking assistant. You help find the best accommodations. Focus on using freely accessible sources.",
)

activity_suggester = autogen.AssistantAgent(
    name="ActivitySuggester",
    llm_config=llm_config,
    system_message="You are a travel activity expert. You suggest interesting activities and tours in a destination.",
)

In [None]:
user_proxy = autogen.UserProxyAgent(
    name="UserProxy",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
    system_message="A human user seeking travel planning assistance. Reply TERMINATE when the task is done.",
)

agents = [user_proxy, flight_planner, hotel_finder, activity_suggester]

group_chat = autogen.GroupChat(agents=agents, messages=[], max_round=10)

manager = autogen.GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config,
    system_message="""
    You are a coordinator managing a travel planning discussion between a user, a flight planner, a hotel finder, and an activity suggester.
    Your goal is to ensure the user's request is fully addressed by coordinating the specialists.
    Ensure each specialist contributes relevant information sequentially (e.g., flights first, then hotels, then activities, unless the user specifies otherwise).
    Summarize the final plan. Be sure to reply TERMINATE when the plan is complete.
    """,
)

## Run Agent

In [None]:
from opentelemetry.trace import StatusCode

tracer = tracer_provider.get_tracer(__name__)
with tracer.start_as_current_span(
    "TravelAgent",
    openinference_span_kind="agent",
) as agent_span:
    agent_span.set_status(StatusCode.OK)
    user_proxy.initiate_chat(
        manager,
        message="I want to plan a 2-day trip to Cabo sometime in October. I'm interested in good food. Find flight options from SFO, suggest mid-range hotels near the city center, and recommend some relevant activities.",
    )

# Evaluating the Agent

Here, we will evaluate the agent’s trajectory. This means checking whether the sequence of steps it took was logical, efficient, and aligned with completing the user’s request. Then, we will log those results back to Phoenix.

In [None]:
from phoenix.client import Client

px_client = Client()

df = px_client.spans.get_spans_dataframe(project_identifier=project_name, timeout=None)

trace_df = df.groupby("context.trace_id").agg(
    {
        "attributes.input.value": "first",
        "attributes.output.value": lambda x: " ".join(x.dropna()),
    }
)


def extract_input_content(input_value):
    try:
        if pd.isna(input_value) or input_value is None:
            return None

        # JSON string
        if isinstance(input_value, str):
            import json

            try:
                parsed = json.loads(input_value)
                inputs = parsed.get("messages", [])
                if isinstance(inputs, list) and len(inputs) > 0:
                    return inputs[0].get("content")
                return None
            except Exception:
                return None

        return input_value

    except (AttributeError, TypeError, KeyError):
        return input_value


# Apply function row by row
trace_df["attributes.input.value"] = trace_df["attributes.input.value"].apply(extract_input_content)


trace_df.head()

In [None]:
TRAJECTORY_PERFORMANCE_PROMPT = """
You are a helpful AI bot that checks whether an AI agent's internal trajectory is accurate and effective.

You will be given:
1. You will be given an input query from a user that the agent responded to
2. The agent's actual trajectory of tool calls and responses

An accurate trajectory:
- Progresses logically from step to step
- Follows the golden trajectory where reasonable
- Shows a clear path toward completing a goal
- Is reasonably efficient (doesn't take unnecessary detours)

##

User Query:
{attributes.input.value}

Trajectory:
{attributes.output.value}

##

Your response must be a single string, either `correct` or `incorrect`, and must not include any additional text.

- Respond with `correct` if the agent's trajectory adheres to the rubric and accomplishes the task effectively.
- Respond with `incorrect` if the trajectory is confusing, misaligned with the goal, inefficient, or does not accomplish the task.
"""

In [None]:
from phoenix.evals import OpenAIModel, llm_classify
from phoenix.trace import suppress_tracing

model = OpenAIModel(
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o-mini",
    temperature=0.0,
)

rails = ["correct", "incorrect"]

with suppress_tracing():
    eval_results = llm_classify(
        dataframe=trace_df,
        template=TRAJECTORY_PERFORMANCE_PROMPT,
        model=model,
        rails=rails,
        provide_explanation=True,
        verbose=False,
        concurrency=20,
    )

eval_results["score"] = eval_results["label"].apply(lambda x: 1 if x == "correct" else 0)

In [None]:
root_spans = df[df["parent_id"].isna()][["context.trace_id", "context.span_id"]]
eval_results = eval_results[["score", "label", "explanation"]]

trajectory_eval_df = pd.merge(trace_df, eval_results, left_index=True, right_index=True, how="left")

trajectory_eval_df = pd.merge(
    trajectory_eval_df.reset_index(), root_spans, on="context.trace_id", how="left"
).set_index("context.span_id", drop=False)

### Log Evals to Phoenix

In [None]:
px_client.annotations.log_span_annotations_dataframe(
    dataframe=trajectory_eval_df,
    annotation_name="TRAJECTORY PERFORMANCE",
    annotator_kind="LLM",
)

## View Results in Phoenix

When viewing the traces in Phoenix, you can see how the `GroupChatManager` delegated subtasks to specialized agents step-by-step. The trace shows the order in which each agent responded, making it easy to verify the flow from flight planning to hotel booking to activity suggestions. 

The agent trajectory evaluation is tied to the root span of the trace, allowing you to assess the overall sequence of steps. 

![Results](https://storage.googleapis.com/arize-phoenix-assets/assets/images/autogen-agentframework-eval.png)
