<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://docs.arize.com/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</center>

# AutoGen Agents: Parallelization

In this tutorial, we'll explore parallel task execution with [AutoGen](https://microsoft.github.io/autogen/stable//index.html) and how to trace the workflow using Phoenix.

Parallelization is a powerful agent pattern where multiple tasks are run concurrently, significantly speeding up the overall process. Unlike purely sequential workflows, this approach is suitable when tasks are independent and can be processed simultaneously. Tracing allows us to monitor this concurrent flow and understand the timing of parallel branches.

AutoGen doesn't have a built-in parallel execution manager, but its core agent capabilities integrate seamlessly with standard Python concurrency libraries. We can use these libraries to launch multiple agent interactions concurrently.

By the end of this tutorial, you’ll learn how to:

- Set up a basic AutoGen agent to perform specific tasks.

- Execute a series of agent tasks in parallel using Python's threading library.

- Use Phoenix to trace parallel agent interactions.

- Compare parallelized and sequential agent workflows.

⚠️ You'll need an OpenAI Key for this tutorial.

## Set up Keys and Dependencies


In [None]:
!pip install -qq pyautogen==0.9 autogen-agentchat~=0.2

In [None]:
!pip install -qqq arize-phoenix arize-phoenix-otel openinference-instrumentation-openai

In [None]:
import os
from getpass import getpass

import autogen

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")

os.environ["OPENAI_API_KEY"] = openai_api_key

In [None]:
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"
if not os.environ.get("PHOENIX_CLIENT_HEADERS"):
    os.environ["PHOENIX_CLIENT_HEADERS"] = "api_key=" + getpass("Enter your Phoenix API key: ")

## Configure Tracing


In [None]:
from phoenix.otel import register

tracer_provider = register(
    project_name="autogen-agents",
    endpoint="https://app.phoenix.arize.com/v1/traces",
    auto_instrument=True,
)

## Example Parallelization Task: Product Description Generator


## Define Agent With Parallelization

This section defines a parallelized workflow where multiple tasks are handled by a specialized `AssistantAgent`. Each task runs in its own thread, and tracing captures task-specific spans, including timing. A temporary `UserProxyAgent` is created for each thread to manage isolated chats with the agent.

As each task completes, the generated content and task metadata are collected into a shared results list using a lock. This setup enables concurrent LLM interactions.

In this example, we'll generate different components of a product description for a smartwatch (features, value proposition, target customer, tagline) by calling a marketing agent.

![Diagram](https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize_autogen_parallelization.png)

The `llm_config` specifies the configuration used for all the assistant agents.


In [None]:
llm_config = {
    "config_list": [
        {
            "model": "gpt-4",
            "api_key": os.environ.get("OPENAI_API_KEY"),
        }
    ]
}

In [None]:
# Specialized LLM AssistantAgent
import threading

marketing_writer_agent = autogen.AssistantAgent(
    name="MarketingWriter",
    llm_config=llm_config,
    system_message="You are a concise and persuasive marketing copywriter. Generate content based only on the specific prompt you receive. Keep your response focused on the requested information.",
)

results = []
results_lock = threading.Lock()

In [None]:
import time

import opentelemetry.context as context_api
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode

tracer = tracer_provider.get_tracer(__name__)


def run_agent_task_with_tracing(agent, task_prompt, task_id, parent_context):
    token = context_api.attach(parent_context)
    thread_name = threading.current_thread().name

    with tracer.start_as_current_span(task_id, openinference_span_kind="agent") as child_span:
        start_time = time.time()

        # Create a temp UserProxyAgent for current thread
        temp_user_proxy = autogen.UserProxyAgent(
            name=f"UserProxy_{thread_name}",
            human_input_mode="NEVER",
            max_consecutive_auto_reply=1,
            is_termination_msg=lambda x: True,
            code_execution_config=False,
        )

        # Initiate the chat
        temp_user_proxy.initiate_chat(
            agent,
            message=task_prompt,
            clear_history=True,
            silent=True,
        )

        # Get last message
        assistant_reply = "No reply found."
        if temp_user_proxy.last_message(agent):
            assistant_reply = temp_user_proxy.last_message(agent).get("content", "No reply found.")
        child_span.set_status(Status(StatusCode.OK))

        end_time = time.time()
        duration = end_time - start_time
        print(f"Thread {task_id}: Finished in {duration:.2f} seconds.")

        child_span.set_status(trace.StatusCode.OK)

        results.append(
            {"task_id": task_id, "generated_content": assistant_reply, "duration": duration}
        )

    context_api.detach(token)

### Run and Trace Agent

Next, we run our agent. After all threads complete, the results are collected, sorted based on the original task order, and printed out.

In [None]:
start_total_time = time.time()

results = []

with tracer.start_as_current_span("Parallelized", openinference_span_kind="agent") as parent_span:
    tasks = [
        {"id": "Features", "prompt": "List key features of the new SmartX Pro smartwatch..."},
        {"id": "ValueProp", "prompt": "Describe the main value proposition of SmartX Pro..."},
        {"id": "TargetCustomer", "prompt": "Describe the ideal customer profile for SmartX Pro..."},
        {"id": "Tagline", "prompt": "Create 3 catchy marketing taglines for SmartX Pro."},
    ]

    parent_context_to_pass = context_api.get_current()

    threads = []
    for i, task_info in enumerate(tasks):
        thread = threading.Thread(
            target=run_agent_task_with_tracing,
            args=(
                marketing_writer_agent,
                task_info["prompt"],
                task_info["id"],
                parent_context_to_pass,
            ),
            name=f"{task_info['id']}",
        )
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

end_total_time = time.time()
total_duration = end_total_time - start_total_time
print(f"\nAll threads completed in {total_duration:.2f} seconds.")

# --- Print Results ---
results.sort(key=lambda x: [t["id"] for t in tasks].index(x["task_id"]))
print("--- Results ---")
for result in results:
    print(f"--- {result['task_id']} ---")
    print(f"Generated Content:\n{result['generated_content']}")

## Define Agent Without Parallelization (Latency Comparison)

This section runs the same set of content generation tasks, but sequentially instead of in parallel. Each task is executed one after the other. This allows for a direct comparison of total execution time between sequential and parallel workflows.

In [None]:
results = []


def run_single_task(agent, task_prompt, task_id):
    with tracer.start_as_current_span(f"{task_id}", openinference_span_kind="agent") as span:
        start_time = time.time()

        # Create a temp UserProxyAgent
        temp_user_proxy = autogen.UserProxyAgent(
            name="UserProxy_Sequential",
            human_input_mode="NEVER",
            max_consecutive_auto_reply=1,
            is_termination_msg=lambda x: True,
            code_execution_config=False,
        )

        reply = "No reply found."
        temp_user_proxy.initiate_chat(
            agent,
            message=task_prompt,
            clear_history=True,
            silent=True,
        )

        # Get last message
        if temp_user_proxy.last_message(agent):
            reply = temp_user_proxy.last_message(agent).get("content", "No reply found.")
        span.set_status(Status(StatusCode.OK))

        end_time = time.time()
        duration = end_time - start_time
        print(f"Finished task '{task_id}' in {duration:.2f} seconds.")

        results.append({"task_id": task_id, "prompt": task_prompt, "generated_content": reply})

### Run and Trace Sequential Agent

In [None]:
start_total_time = time.time()

with tracer.start_as_current_span("Sequential", openinference_span_kind="agent") as parent_span:
    # Define tasks
    tasks = [
        {
            "id": "Features",
            "prompt": "List key features of the new SmartX Pro smartwatch. Be specific (e.g., screen type, battery life, sensors, connectivity).",
        },
        {
            "id": "ValueProp",
            "prompt": "Describe the main value proposition of SmartX Pro for busy professionals. Focus on how it solves their problems or improves their lives.",
        },
        {
            "id": "TargetCustomer",
            "prompt": "Describe the ideal customer profile for the SmartX Pro smartwatch. Include demographics, lifestyle, and needs.",
        },
        {
            "id": "Tagline",
            "prompt": "Create 3 catchy and distinct marketing taglines for SmartX Pro.",
        },
    ]

    for i, task_info in enumerate(tasks):
        parent_span.add_event(f"Starting task {i+1}: {task_info['id']}")
        run_single_task(
            agent=marketing_writer_agent, task_prompt=task_info["prompt"], task_id=task_info["id"]
        )
        parent_span.add_event(f"Finished task {i+1}: {task_info['id']}")


end_total_time = time.time()
total_duration = end_total_time - start_total_time
print(f"\nAll tasks completed sequentially in {total_duration:.2f} seconds.")


print("--- Results ---")
for result in results:
    print(f"--- {result['task_id']} ---")
    print(f"Generated Content:\n{result['generated_content']}")

## View Results in Phoenix

The tracing results below focus on the performance of the parallelized workflow. Compared to the parallelized agent, the sequential version takes 3–4 times longer to complete. In Phoenix, we can view a detailed breakdown of LLM inputs and outputs, along with metadata such as latency, token counts, and model configurations.

![Results](https://storage.googleapis.com/arize-phoenix-assets/assets/gifs/phoenix_autogen_parallelized.gif)