# Multi-Role Workflow with Specialized Agents

## Overview

This notebook demonstrates a structured multi-agent workflow using five specialized agents coordinated by an intelligent manager. The workflow follows a systematic pattern:

```
┌─────────────────────────────────────────────────────────┐
│                   Task Input                            │
└─────────────────┬───────────────────────────────────────┘
                  │
                  ▼
          ┌───────────────┐
          │  Executor     │ ◄─── Carries out reasoning-heavy
          │ (gpt-5-mini)  │      steps and delegates to tools
          └───────┬───────┘
                  │
                  ▼
          ┌───────────────┐
          │  Coder        │ ◄─── Writes and executes code
          │ (gpt-5-mini)  │      for computation and analysis
          └───────┬───────┘
                  │
                  ▼
          ┌───────────────┐
          │  Verifier     │ ◄─── Inspects outputs, confirms
          │ (gpt-5-mini)  │      requirements are satisfied
          └───────┬───────┘
                  │
                  ▼
          ┌───────────────┐
          │  Generator    │ ◄─── Assembles final response
          │ (gpt-5-mini)  │      with verified outputs
          └───────────────┘
```

## How It Works

The `MagenticBuilder` creates a workflow with a `StandardMagenticManager` that:

1. **Reads agent descriptions** to understand each agent's capabilities
2. **Creates dynamic plans** based on the task requirements
3. **Selects appropriate agents** for each step of the plan
4. **Tracks progress** and adapts when agents stall or plans need revision
5. **Manages completion** when the task is successfully resolved

## Model Assignments

Each agent uses the same model tier for consistency:

- **Planner (gpt-5-mini)**: Provides advanced reasoning for decomposing complex tasks
- **Executor (gpt-5-mini)**: Handles reasoning-heavy execution steps
- **Coder (gpt-5-mini + code interpreter)**: Generates and runs code via the hosted interpreter
- **Verifier (gpt-5-mini)**: Validates outputs and ensures quality
- **Generator (gpt-5-mini)**: Synthesizes verified outputs into a polished response

## Prerequisites

Set these environment variables:

```bash
export OPENAI_API_KEY="your-api-key-here"
# Optional: Specify endpoints if using Azure OpenAI
# export AZURE_OPENAI_ENDPOINT="your-endpoint"
# export AZURE_OPENAI_API_KEY="your-key"
```

In [None]:
# Copyright (c) Microsoft. All rights reserved.
## Prerequisites

In [None]:
export OPENAI_API_KEY="your-api-key-here"
export OPENAI_BASE_URL="https://api.openai.com/v1"
# Optional: Specify endpoints if using Azure OpenAI
# export AZURE_OPENAI_ENDPOINT="your-endpoint"
# export AZURE_OPENAI_API_KEY="your-key"


## Step 1: Import Required Modules

We import the core agent framework components:
- `ChatAgent`: Base agent class for creating specialized agents
- `MagenticBuilder`: Workflow builder for multi-agent orchestration
- `HostedCodeInterpreterTool`: Tool for code execution capabilities
- Event types for monitoring workflow execution
- OpenAI client implementations for different model APIs

In [1]:
import logging

from agent_framework import (
    ChatAgent,
    HostedCodeInterpreterTool,
    MagenticAgentDeltaEvent,
    MagenticAgentMessageEvent,
    MagenticBuilder,
    MagenticFinalResultEvent,
    MagenticOrchestratorMessageEvent,
    WorkflowOutputEvent,
)
from agent_framework.openai import OpenAIResponsesClient
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Step 2: Configure Debug Logging

Enable debug logging to observe the manager's decision-making process:

- **INFO level**: Shows high-level workflow progress
- **DEBUG level**: Reveals manager's agent selection logic, plan creation, and progress tracking

You can change this to `logging.INFO` for cleaner output, or `logging.DEBUG` to see internal orchestration decisions.

In [2]:
# Set to DEBUG to see manager's decision-making, INFO for cleaner output
logging.basicConfig(level=logging.INFO, force=True)
logger = logging.getLogger(__name__)

print("✓ Logging configured at INFO level (change to DEBUG for detailed internal logs)")

✓ Logging configured at INFO level (change to DEBUG for detailed internal logs)


## Step 3: Define Role Prompts

Each agent receives specialized instructions that define its role in the workflow. These prompts guide the agent's behavior and help the manager understand when to invoke each agent.

In [3]:
EXECUTOR_PROMPT = """You are the executor module. Carry out the active instruction from the
manager or planner. Execute reasoning-heavy steps, delegate to registered tools when needed,
and produce clear artefacts or status updates. If a tool is required, call it explicitly and
then explain the outcome."""

VERIFIER_PROMPT = """You are the verifier module. Inspect the current state, outputs, and
assumptions. Confirm whether the work satisfies requirements, highlight defects or missing
information, and suggest concrete follow-up actions."""

GENERATOR_PROMPT = """You are the generator module. Assemble the final response for the
user. Incorporate verified outputs, cite supporting evidence when available, and ensure the
result addresses the original request without leaking internal reasoning unless explicitly
requested."""

## Step 5: Configure the Executor Agent

**Role**: Carries out reasoning-heavy steps and delegates to tools when computation is needed.

**Model**: `gpt-5-mini` - Balances capability and cost for general execution tasks.

**When invoked**: The manager selects the Executor for steps requiring reasoning, analysis, or coordination between other agents.

The Executor acts as the "general worker" that handles steps not requiring specialized capabilities.

In [4]:
executor_agent = ChatAgent(
    name="ExecutorAgent",
    description="Executes reasoning-heavy tasks and coordinates work between specialized agents.",
    instructions=EXECUTOR_PROMPT,
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="medium",
        store=True,
        temperature=0.7,
        max_tokens=4096,
    ),
)

print("✓ Executor Agent configured with gpt-5-mini and enhanced parameters")

✓ Executor Agent configured with gpt-5-mini and enhanced parameters


## Step 6: Configure the Coder Agent

**Role**: Writes and executes code for data processing, calculations, and analysis.

**Model**: `gpt-5-mini` paired with the hosted code interpreter for program execution.

**Tools**: `HostedCodeInterpreterTool()` - Enables code execution in a hosted sandbox environment.

**When invoked**: The manager selects the Coder when computational analysis, data processing, or numerical calculations are needed.

Note: The code execution happens in OpenAI's hosted environment, not locally. Results are returned as part of the agent's response.

In [5]:
coder_agent = ChatAgent(
    name="CoderAgent",
    description="Writes and executes code to perform calculations, data analysis, and computational tasks.",
    instructions="You solve questions using code. Write clear, well-documented code and provide detailed analysis of computation results.",
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="high",  # Higher reasoning for code generation
        store=True,
        temperature=0.3,  # Lower temperature for more deterministic code
        max_tokens=8192,  # More tokens for code + explanations
    ),
    tools=HostedCodeInterpreterTool(),
)

print("✓ Coder Agent configured with gpt-5-codex and enhanced parameters")

✓ Coder Agent configured with gpt-5-codex and enhanced parameters


## Step 7: Configure the Verifier Agent

**Role**: Validates outputs, checks for defects, and ensures requirements are satisfied.

**Model**: `gpt-5-mini` - Provides reliable reasoning to perform thorough quality checks.

**When invoked**: The manager calls the Verifier after execution steps to validate correctness, completeness, and adherence to requirements.

The Verifier acts as a quality gate, catching errors before final response generation.

In [6]:
verifier_agent = ChatAgent(
    name="VerifierAgent",
    description="Validates outputs, checks assumptions, and confirms work meets requirements.",
    instructions=VERIFIER_PROMPT,
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="high",  # High reasoning for thorough validation
        store=True,
        temperature=0.5,  # Balanced for analytical verification
        max_tokens=4096,
    ),
)

print("✓ Verifier Agent configured with gpt-5-mini and enhanced parameters")

✓ Verifier Agent configured with gpt-5-mini and enhanced parameters


## Step 8: Configure the Generator Agent

**Role**: Assembles the final user-facing response by synthesizing verified outputs.

**Model**: `gpt-5-mini` - Cost-efficient model suitable for synthesis and formatting tasks.

**When invoked**: The manager selects the Generator as the final step to create a polished response for the user.

The Generator ensures the final output is clear, comprehensive, and addresses the original request without exposing internal workflow details.

In [7]:
generator_agent = ChatAgent(
    name="GeneratorAgent",
    description="Synthesizes final responses by incorporating verified outputs and supporting evidence.",
    instructions=GENERATOR_PROMPT,
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="low",  # Lower reasoning for synthesis tasks
        store=True,
        temperature=0.8,  # Higher temperature for creative synthesis
        max_tokens=6144,  # More tokens for comprehensive responses
    ),
)

print("✓ Generator Agent configured with gpt-5-mini and enhanced parameters")

✓ Generator Agent configured with gpt-5-mini and enhanced parameters


## Understanding OpenAI Responses API Parameters

The `OpenAIResponsesClient` accepts several parameters to fine-tune agent behavior:

### Key Parameters

- **`reasoning_effort`**: Controls how much computational effort the model uses for reasoning
  - `"low"`: Faster responses, suitable for simple tasks or synthesis
  - `"medium"`: Balanced performance (default for most agents)
  - `"high"`: Maximum reasoning depth for complex analysis, verification, or planning

- **`store`**: Boolean flag to enable conversation storage
  - `True`: Stores conversations for potential learning and improvement
  - `False`: No storage (default)
  - Useful for building persistent context across sessions

- **`temperature`**: Controls randomness in responses (0.0 - 2.0)
  - `0.0-0.3`: Deterministic, focused (ideal for code generation)
  - `0.4-0.7`: Balanced creativity and consistency (general tasks)
  - `0.8-1.0`: More creative and varied (synthesis, brainstorming)
  - `>1.0`: Highly creative but potentially inconsistent

- **`max_tokens`**: Maximum length of the response
  - Adjust based on expected output length
  - Code agents may need more tokens (8192+)
  - Simple agents can use fewer (2048-4096)

- **`top_p`**: Nucleus sampling (0.0 - 1.0)
  - Controls diversity via cumulative probability
  - Lower values = more focused responses

- **`frequency_penalty`**: Reduces repetition (-2.0 to 2.0)
  - Positive values discourage repeating the same words

- **`presence_penalty`**: Encourages topic diversity (-2.0 to 2.0)
  - Positive values encourage exploring new topics

### Agent-Specific Tuning in This Notebook

- **Planner**: Medium reasoning, moderate temperature (balanced planning)
- **Executor**: Medium reasoning, moderate temperature (general execution)
- **Coder**: High reasoning, low temperature (precise code generation)
- **Verifier**: High reasoning, moderate temperature (thorough validation)
- **Generator**: Low reasoning, higher temperature (creative synthesis)
- **Manager**: High reasoning, moderate temperature (strategic orchestration)

## Step 9: Understanding the MagenticBuilder Pattern

The `MagenticBuilder` constructs a workflow with intelligent orchestration:

1. **`.participants()`** - Registers all agents with the workflow. Each agent's `name` and `description` help the manager understand its capabilities.

2. **`.with_standard_manager()`** - Configures the orchestration manager with:
   - **chat_client**: LLM used by the manager for decision-making
   - **max_round_count**: Maximum conversation turns (prevents infinite loops)
   - **max_stall_count**: How many times agents can be unproductive before replanning
   - **max_reset_count**: How many times workflow can reset from scratch

The manager uses these parameters to adaptively coordinate agents throughout the workflow lifecycle.

## Step 10: Build the Workflow

Now we construct the workflow by:

1. Creating a `MagenticBuilder` instance
2. Registering all five agents as participants
3. Configuring the standard manager with appropriate limits
4. Building the final workflow object

The manager will read each agent's `description` field to understand when to invoke them during execution.

In [8]:
print("\nBuilding Magentic Workflow with 5 specialized agents...")

workflow = (
    MagenticBuilder()
    .participants(
        coder=coder_agent,
        verifier=verifier_agent,
        generator=generator_agent,
    )
    .with_standard_manager(
        chat_client=OpenAIResponsesClient(
            model_id="gpt-5-mini",
            reasoning_effort="high",  # Manager needs high reasoning for orchestration
            store=True,
            temperature=0.6,
            max_tokens=8192,  # Manager needs more tokens for planning
        ),
        max_round_count=6,  # Allow enough rounds for 5-agent coordination
        max_stall_count=3,   # Replan if agents stall 3 times
        max_reset_count=2,   # Allow 2 full workflow resets if needed
    )
    .build()
)

print("✓ Workflow built successfully!")
print(f"  - Registered agents: {len([coder_agent, verifier_agent, generator_agent])}")
print("  - Manager model: gpt-5-mini with enhanced parameters")
print("  - Max rounds: 6")
print("\n⚠️  Note: 5-agent workflows using gpt-5-mini can take a few minutes to complete.")

INFO:agent_framework._workflows._magentic:Building Magentic workflow with 3 participants



Building Magentic Workflow with 5 specialized agents...
✓ Workflow built successfully!
  - Registered agents: 3
  - Manager model: gpt-5-mini with enhanced parameters
  - Max rounds: 6

⚠️  Note: 5-agent workflows using gpt-5-mini can take a few minutes to complete.


## Step 11: Define the Task

We'll use a complex task that exercises all five agents:

- **Planner**: Decomposes the multi-part analysis
- **Executor**: Coordinates research and execution
- **Coder**: Writes code to calculate energy consumption and CO2 emissions
- **Verifier**: Validates calculations and assumptions
- **Generator**: Synthesizes findings into a comprehensive report

This task requires data analysis, computational verification, and structured output—ideal for demonstrating the multi-role workflow.

In [9]:
task = (
    "I'm build a ai system that help reasoning and problem parsing capabilities. "
)

print("\n" + "=" * 80)
print("TASK:")
print("=" * 80)
print(task)
print("=" * 80)


TASK:
I'm build a ai system that help reasoning and problem parsing capabilities. 


## Step 12: Understanding Workflow Events

The workflow emits several event types during execution:

- **`MagenticOrchestratorMessageEvent`**: Manager's planning, agent selection, and coordination messages
- **`MagenticAgentDeltaEvent`**: Streaming text chunks from agents (real-time output)
- **`MagenticAgentMessageEvent`**: Complete agent messages after streaming finishes
- **`MagenticFinalResultEvent`**: The workflow's final result when complete
- **`WorkflowOutputEvent`**: Structured output data from the workflow

By handling these events, we can observe:
- Which agent the manager selects for each step
- The manager's reasoning for agent selection
- Real-time progress as agents work
- Final results and completion status

## Step 13: Execute the Workflow with Streaming

Now we'll execute the workflow and observe the orchestration in action:

- Watch the manager select appropriate agents for each step
- See streaming output as agents work
- Observe the coordination between planning, execution, coding, verification, and generation

**Note**: This may take several minutes depending on task complexity and API response times.

In [10]:
import time
from datetime import datetime

print("\nStarting workflow execution...")
print(f"Started at: {datetime.now().strftime('%H:%M:%S')}")
print("⏳ This may take 1-3 minutes for complex tasks...\n")

last_stream_agent_id: str | None = None
stream_line_open: bool = False
final_output: str | None = None
event_count = 0
start_time = time.time()

try:
    async for event in workflow.run_stream(task):
        event_count += 1
        if event_count % 10 == 0:
            elapsed = time.time() - start_time
            print(f"\n[Progress: {event_count} events, {elapsed:.1f}s elapsed]\n", flush=True)

        if isinstance(event, MagenticOrchestratorMessageEvent):
            print(f"\n[ORCHESTRATOR:{event.kind}]\n")
            print(f"{getattr(event.message, 'text', '')}\n")
            print("-" * 80)

        elif isinstance(event, MagenticAgentDeltaEvent):
            if last_stream_agent_id != event.agent_id or not stream_line_open:
                if stream_line_open:
                    print()
                print(f"\n[STREAMING:{event.agent_id}]: ", end="", flush=True)
                last_stream_agent_id = event.agent_id
                stream_line_open = False
            if event.text:
                print(event.text, end="", flush=True)

        elif isinstance(event, MagenticAgentMessageEvent):
            if stream_line_open:
                print(" ✓")
                stream_line_open = False
            msg = event.message
            if msg is not None:
                response_text = (msg.text or "").replace("\n", " ")
                display_text = response_text[:200] + "..." if len(response_text) > 200 else response_text
                print(f"\n[AGENT:{event.agent_id}] {msg.role.value}")
                print(f"  {display_text}\n")
                print("-" * 80)

        elif isinstance(event, MagenticFinalResultEvent):
            if stream_line_open:
                print()
                stream_line_open = False

            print("\n" + "=" * 80)
            print("FINAL RESULT:")
            print("=" * 80)
            print("\n✓ Workflow completed successfully!\n")

            if event.message is not None:
                print(event.message.text)

            if final_output is not None:
                print("\nWorkflow Output Data:")
                print(final_output)

            print("=" * 80)

        elif isinstance(event, WorkflowOutputEvent):
            final_output = str(event.data) if event.data is not None else None

except Exception as exc:
    if stream_line_open:
        print()
    stream_line_open = False
    print(f"\nWorkflow execution failed: {exc}")

finally:
    if stream_line_open:
        print()

INFO:agent_framework._workflows._magentic:Magentic Orchestrator: Received start message
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': "I'm build a ai system that help reasoning and problem parsing capabilities. "}], 'additional_properties': {}}
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': 'Below I will present you a request.\n\nBefore we begin addressing the request, please answer the following pre-survey to the best of your ability.\nKeep in mind that you are Ken Jennings-level with trivia, and Mensa-level with puzzles, so there should be\na deep well to draw from.\n\nHere is the request:\n\nI\'m build a ai system that help reasoning and problem parsing capabilities. \n\nHere is the pre-survey:\n\n    1. Please list any specific facts or figures that are GIVEN in the request itself. It is possible that\n       there are none


Starting workflow execution...
Started at: 13:30:25
⏳ This may take 1-3 minutes for complex tasks...



INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '1. GIVEN OR VERIFIED FACTS\n- You are building an AI system intended to help with reasoning and problem parsing capabilities.\n- The request does not include constraints, target domains, datasets, model choices, performance targets, or deployment requirements.\n\n2. FACTS TO LOOK UP\n- State‑of‑the‑art models and leaderboards for reasoning tasks (e.g., GPT‑4, PaLM, LLaMA family, open and closed models): check arXiv.org, Papers With Code (paperswithcode.com), and recent conference proceedings (NeurIPS, ACL, ICLR).\n- Relevant benchmarks and leaderboards (and their exact task definitions and metrics): GSM8K, MATH, MMLU, BIG‑Bench, ARC, DROP, AQuA, HumanEval — sources: Papers With Code pages for each dataset, the original dataset GitHub repos, and leaderboa


[ORCHESTRATOR:user_task]

I'm build a ai system that help reasoning and problem parsing capabilities. 

--------------------------------------------------------------------------------

[ORCHESTRATOR:task_ledger]


We are working to address the following user request:

I'm build a ai system that help reasoning and problem parsing capabilities. 


To answer this request we have assembled the following team:

- coder: Writes and executes code to perform calculations, data analysis, and computational tasks.
- verifier: Validates outputs, checks assumptions, and confirms work meets requirements.
- generator: Synthesizes final responses by incorporating verified outputs and supporting evidence.


Here is an initial fact sheet to consider:

1. GIVEN OR VERIFIED FACTS
- You are building an AI system intended to help with reasoning and problem parsing capabilities.
- The request does not include constraints, target domains, datasets, model choices, performance targets, or deployment requireme

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:generator]: Below
[Progress: 10 events, 69.4s elapsed]


[STREAMING:generator]:  are
[STREAMING:generator]:  (
[STREAMING:generator]: A
[STREAMING:generator]: )
[STREAMING:generator]:  a
[STREAMING:generator]:  short
[STREAMING:generator]:  questionnaire
[STREAMING:generator]:  you
[STREAMING:generator]:  can
[STREAMING:generator]:  send
[Progress: 20 events, 69.5s elapsed]


[STREAMING:generator]:  to
[STREAMING:generator]:  stakeholders
[STREAMING:generator]:  to
[STREAMING:generator]:  capture
[STREAMING:generator]:  constraints
[STREAMING:generator]: ,
[STREAMING:generator]:  goals
[STREAMING:generator]: ,
[STREAMING:generator]:  and
[STREAMING:generator]:  concrete
[Progress: 30 events, 69.7s elapsed]


[STREAMING:generator]:  examples
[STREAMING:generator]: ,
[STREAMING:generator]:  followed
[STREAMING:generator]:  by
[STREAMING:generator]:  (
[STREAMING:generator]: B
[STREAMING:generator]: )
[STREAMING:generator]:  a
[STREAMING:generator]:  one
[STREAMING:generator]:

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'Below are (A) a short questionnaire you can send to stakeholders to capture constraints, goals, and concrete examples, followed by (B) a one‑paragraph prioritized MVP definition, and (C) concrete immediate action plans for the coder and for the verifier (with resources and estimated times). Send the questionnaire to the relevant stakeholders, get the answers, then use the action plans to start the project.\n\nQUESTIONNAIRE (please answer each item as fully as possible)\n1) Target problem types and domains\n- Which problem types should the system handle? (pick any: math word problems, logical puzzles, code generation, commonsense reasoning, table lookup, information extraction, etc.)\n- For each chosen type, provide 3 representative example inputs and the exact desired outputs (include any preferred output format, e.g., “final numeric answer”, “annotated s


[AGENT:generator] assistant
  Below are (A) a short questionnaire you can send to stakeholders to capture constraints, goals, and concrete examples, followed by (B) a one‑paragraph prioritized MVP definition, and (C) concrete imme...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n  "is_request_satisfied": {\n    "reason": "We have a complete plan, questionnaire, and prioritized MVP proposal, but the user/stakeholders have not provided the required answers or data and no implementation work has started. The original request (to build the AI system) remains unfulfilled until inputs are provided and prototypes are executed.",\n    "answer": false\n  },\n  "is_in_loop": {\n    "reason": "Conversation is progressing: the assistant produced a questionnaire and action plans and the user asked for an internal team-status assessment. There is no repetition of identical requests or outputs across turns.",\n    "answer": false\n  },\n  "is_progress_being_made": {\n    "reason": "Concrete artifacts were produced (questionnaire, MVP descri


[ORCHESTRATOR:instruction]

Send the questionnaire to the stakeholders now and request structured answers within 48 hours. Specifically ask them to: 1) Pick the top 1 (and optionally a 2nd) problem type to target and provide 3 representative input/output examples for each (use the exact template: {"id","input","desired_output","return_reasoning":true|false}). 2) Supply quantitative success metrics and targets (accuracy, latency p50/p95, throughput, benchmark names). 3) State deployment constraints (monthly budget range, privacy/data residency rules, allowed external APIs). 4) List available data and annotation resources (file names, counts, formats) and labeling capacity. 5) Provide model approach preferences (API vs self-host, fine-tuning allowed, compute available). Ask them to return answers as a JSON object or JSONL file. After collecting responses, produce a 1–page summary that (a) confirms the chosen MVP scope, (b) produces the canonical 200–500 example dev JSONL (or as many exa

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:generator]: Below
[STREAMING:generator]:  is
[STREAMING:generator]:  a
[Progress: 2460 events, 132.7s elapsed]


[STREAMING:generator]:  ready
[STREAMING:generator]: -to
[STREAMING:generator]: -send
[STREAMING:generator]:  message
[STREAMING:generator]:  you
[STREAMING:generator]:  can
[STREAMING:generator]:  forward
[STREAMING:generator]:  to
[STREAMING:generator]:  stakeholders
[STREAMING:generator]:  (
[Progress: 2470 events, 132.9s elapsed]


[STREAMING:generator]: email
[STREAMING:generator]: ,
[STREAMING:generator]:  Slack
[STREAMING:generator]: ,
[STREAMING:generator]:  or
[STREAMING:generator]:  ticket
[STREAMING:generator]: ).
[STREAMING:generator]:  It
[STREAMING:generator]:  requests
[STREAMING:generator]:  structured
[Progress: 2480 events, 132.9s elapsed]


[STREAMING:generator]:  JSON
[STREAMING:generator]: /
[STREAMING:generator]: JSON
[STREAMING:generator]: L
[STREAMING:generator]:  answers
[STREAMING:generator]:  within
[STREAMING:generator]:  
[STREAMING:g

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'Below is a ready-to-send message you can forward to stakeholders (email, Slack, or ticket). It requests structured JSON/JSONL answers within 48 hours, includes the exact JSON schema and an example, and explains next steps the team will take after responses arrive. Copy/paste and send as-is.\n\nSubject: Request — MVP scoping questionnaire (please return JSON/JSONL within 48 hours)\n\nHi all,\n\nWe’re scoping an MVP for an LLM-driven task and need structured inputs from stakeholders so the coder and verifier can begin work. Please reply with a single JSON object or a JSONL file attachment containing your answers within 48 hours.\n\nRequired items (please follow the JSON schema below exactly)\n\n1) Pick top problem type (required) and optionally a 2nd (optional). Examples: math word problems, logical puzzles, code generation, commonsense reasoning, info extr


[AGENT:generator] assistant
  Below is a ready-to-send message you can forward to stakeholders (email, Slack, or ticket). It requests structured JSON/JSONL answers within 48 hours, includes the exact JSON schema and an example, an...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n    "is_request_satisfied": {\n        "reason": "We produced the questionnaire, prioritized MVP, and concrete action plans, but stakeholders have not yet returned the required inputs and no prototypes or evaluations have been executed. The original end-to-end system build remains uncompleted.",\n        "answer": false\n    },\n    "is_in_loop": {\n        "reason": "Recent turns are producing new, progressing artifacts (questionnaire, MVP, action plans) rather than repeating identical requests or replies. There is no evidence of cyclical repetition.",\n        "answer": false\n    },\n    "is_progress_being_made": {\n        "reason": "Significant forward progress: the generator produced a ready-to-send stakeholder questionnaire, an MVP definition, 


[ORCHESTRATOR:instruction]

Send the questionnaire to stakeholders within 48 hours and collect the responses as a single JSON or JSONL file. If stakeholders cannot fully respond, require at minimum: the top problem type and 20 example input/output pairs in the specified schema. After receiving stakeholder input, produce a one-page summary that (1) confirms the chosen MVP scope, (2) emits a canonical dev JSONL of 200–500 examples (or as many as were provided), and (3) creates explicit tickets for the coder and verifier: coder ticket to ingest canonical data, run prompt experiments, and begin a minimal DSL parser prototype; verifier ticket to produce the evaluation spec and a 50-sample human-eval pilot. Place the summary and canonical dataset in the shared project folder and notify coder and verifier with deadlines: coder and verifier to start within 48 hours of summary delivery. If stakeholder replies are delayed beyond 48 hours, fallback: assume the top problem type is math word probl

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:generator]: I
[Progress: 4080 events, 179.5s elapsed]


[STREAMING:generator]:  can
[STREAMING:generator]: ’t
[STREAMING:generator]:  directly
[STREAMING:generator]:  send
[STREAMING:generator]:  messages
[STREAMING:generator]:  or
[STREAMING:generator]:  collect
[STREAMING:generator]:  files
[STREAMING:generator]:  from
[STREAMING:generator]:  stakeholders
[Progress: 4090 events, 179.8s elapsed]


[STREAMING:generator]:  for
[STREAMING:generator]:  you
[STREAMING:generator]: ,
[STREAMING:generator]:  but
[STREAMING:generator]:  I
[STREAMING:generator]:  will
[STREAMING:generator]:  produce
[STREAMING:generator]:  everything
[STREAMING:generator]:  you
[STREAMING:generator]:  need
[Progress: 4100 events, 179.8s elapsed]


[STREAMING:generator]:  to
[STREAMING:generator]:  (
[STREAMING:generator]: 1
[STREAMING:generator]: )
[STREAMING:generator]:  send
[STREAMING:generator]:  the
[STREAMING:generator]:  questionnaire
[STREAMING:generator]:  immediately
[STREAMING:generator]:

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'I can’t directly send messages or collect files from stakeholders for you, but I will produce everything you need to (1) send the questionnaire immediately, (2) aggregate responses into the required single JSON/JSONL file, and (3) perform the post‑collection steps (one‑page summary, canonical dataset, tickets). Below are:\n\n- A ready‑to‑send stakeholder message (copy/paste into email/Slack/ticket).\n- Clear step‑by‑step instructions for the coordinator for collecting replies and producing a single JSON/JSONL.\n- A one‑page summary template and the exact “tickets” to create for coder and verifier (with deadlines).\n- A fallback: if stakeholders don’t reply within 48 hours, auto‑assume the top problem type is math word problems and start with 50 seed examples — provided below in the exact JSON schema you requested so the coder can begin prompt experiments 


[AGENT:generator] assistant
  I can’t directly send messages or collect files from stakeholders for you, but I will produce everything you need to (1) send the questionnaire immediately, (2) aggregate responses into the required s...

--------------------------------------------------------------------------------

[Progress: 8730 events, 237.8s elapsed]



INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n    "is_request_satisfied": {\n        "reason": "We have a complete plan, questionnaire, prioritized MVP definition, coder/verifier action plans, and a 50-example fallback seed, but the original ask (build an AI system to help reasoning and problem parsing) has not yet been implemented or validated. Stakeholder inputs and prototype work remain outstanding, so the request is not fully satisfied.",\n        "answer": false\n    },\n    "is_in_loop": {\n        "reason": "Although messages include repeated \'Transferred to GeneratorAgent\' markers, the content progressed (questionnaire, MVP, action plans, fallback examples). There is no evidence of an unproductive repetition loop of identical outputs.",\n        "answer": false\n    },\n    "is_progress


[ORCHESTRATOR:instruction]

Send the ready-to-send questionnaire to all stakeholders immediately (via email/Slack/ticket) and request a single JSON or JSONL reply within 48 hours. Track responses and run one reminder at 24 hours and a final reminder at 6 hours before the deadline. On receipt: validate schema, merge into one all_mvp_responses.jsonl, and produce the one-page MVP summary confirming the chosen top problem type, constraints, benchmark targets, and the canonical dev JSONL. Place artifacts in /project/mvp_inputs/ and /project/mvp_outputs/, create the two tickets for coder and verifier (use the provided ticket text), and notify them to start within 48 hours of the summary. If stakeholders fail to respond within 48 hours, trigger the fallback: drop math_seed_50.json into /project/mvp_inputs/ and notify coder to begin prompt experiments using that seed and verifier to prepare the 50-sample human-eval pilot. Report back here immediately after the summary and ticket creation with

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:generator]: I
[STREAMING:generator]:  can
[STREAMING:generator]: ’t
[STREAMING:generator]:  send
[STREAMING:generator]:  messages
[STREAMING:generator]: ,
[Progress: 8740 events, 271.1s elapsed]


[STREAMING:generator]:  create
[STREAMING:generator]:  tickets
[STREAMING:generator]: ,
[STREAMING:generator]:  or
[STREAMING:generator]:  access
[STREAMING:generator]:  your
[STREAMING:generator]:  file
[STREAMING:generator]:  system
[STREAMING:generator]:  on
[STREAMING:generator]:  your
[Progress: 8750 events, 271.2s elapsed]


[STREAMING:generator]:  behalf
[STREAMING:generator]: ,
[STREAMING:generator]:  but
[STREAMING:generator]:  I
[STREAMING:generator]:  will
[STREAMING:generator]:  give
[STREAMING:generator]:  you
[STREAMING:generator]:  everything
[STREAMING:generator]:  you
[STREAMING:generator]:  need
[Progress: 8760 events, 271.3s elapsed]


[STREAMING:generator]:  (
[STREAMING:generator]: ready
[STREAMING:generator]: -to
[STREAMING:generator]: -run
[STREAMING:generat

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'I can’t send messages, create tickets, or access your file system on your behalf, but I will give you everything you need (ready-to-run scripts, exact message and ticket text, and an automated workflow) so your coordinator can perform the requested workflow immediately and reliably. Copy/paste or run the scripts below in your environment, substitute your service endpoints/credentials, and they will:\n\n- Send the questionnaire to stakeholders now and schedule 24h / 6h reminders,\n- Accept stakeholder replies as a single JSON or JSONL file,\n- Validate and merge replies into all_mvp_responses.jsonl,\n- Produce the one‑page MVP summary and canonical dev JSONL,\n- Place artifacts into /project/mvp_inputs/ and /project/mvp_outputs/,\n- Create the two tickets (coder and verifier) and notify owners,\n- Trigger a fallback to math_seed_50.json after 48 hours with


[AGENT:generator] assistant
  I can’t send messages, create tickets, or access your file system on your behalf, but I will give you everything you need (ready-to-run scripts, exact message and ticket text, and an automated workflo...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n  "is_request_satisfied": {\n    "reason": "The generator produced the questionnaire, prioritized MVP definition, action plans, scripts, and a 50-item fallback dataset, but stakeholders\' structured answers have not been collected and the coder/verifier have not executed the prototype steps. Key deliverables (canonical_dev.jsonl based on stakeholder data, prompt‑experiment results, DSL prototype, and human evaluation pilot results) are still outstanding.",\n    "answer": false\n  },\n  "is_in_loop": {\n    "reason": "Conversation has progressed with new, actionable artifacts (questionnaire, scripts, ticket text, fallback data). There is no repetition of identical requests/answers; each message advanced the plan or provided new deliverables.",\n    "an


[ORCHESTRATOR:instruction]

Start immediate prototype work as follows: 1) If stakeholder replies exist, save them to project/mvp_inputs/replies/; otherwise place the provided fallback file project/mvp_inputs/math_seed_50.json. 2) Run the merge_and_validate.py script to produce project/mvp_inputs/all_mvp_responses.jsonl and fix any validation errors. 3) Run generate_summary_and_canonical.py to create project/mvp_outputs/mvp_summary.md and project/mvp_inputs/canonical_dev.jsonl (target 200 examples; padding from fallback allowed). 4) Run prompt experiments against the selected model class: zero‑shot, few‑shot (3–10 exemplars), and chain‑of‑thought vs answer‑only. Use ~200 dev examples and 3–5 prompt variants; record accuracy, p50/p95 latency, representative success/failure examples, and cost. Reserve ~$200–$1,000 API credits if using a hosted model; if self‑hosting, use one GPU (16–32GB VRAM) for inference. 5) Implement a minimal Python semantic‑DSL parser (syntax validator + sandboxed 

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:coder]: I
[Progress: 12880 events, 510.1s elapsed]


[STREAMING:coder]:  started
[STREAMING:coder]:  the
[STREAMING:coder]:  prototype
[STREAMING:coder]:  work
[STREAMING:coder]:  and
[STREAMING:coder]:  produced
[STREAMING:coder]:  a
[STREAMING:coder]:  runnable
[STREAMING:coder]:  prototype
[STREAMING:coder]:  (
[Progress: 12890 events, 510.3s elapsed]


[STREAMING:coder]: scripts
[STREAMING:coder]: ,
[STREAMING:coder]:  data
[STREAMING:coder]: ,
[STREAMING:coder]:  simulated
[STREAMING:coder]:  experiments
[STREAMING:coder]: ,
[STREAMING:coder]:  and
[STREAMING:coder]:  a
[STREAMING:coder]:  small
[Progress: 12900 events, 510.4s elapsed]


[STREAMING:coder]:  DSL
[STREAMING:coder]: ).
[STREAMING:coder]:  Below
[STREAMING:coder]:  I
[STREAMING:coder]:  summarize
[STREAMING:coder]:  what
[STREAMING:coder]:  I
[STREAMING:coder]:  did
[STREAMING:coder]: ,
[STREAMING:coder]:  results
[Progress: 12910 events, 510.5s elapsed]


[STREAMING:coder]: ,
[STREAMING:coder]:  where
[ST

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'I started the prototype work and produced a runnable prototype (scripts, data, simulated experiments, and a small DSL). Below I summarize what I did, results, where the files are, how to reproduce, and next steps / blockers.\n\nShort summary of progress (started immediately)\n- No stakeholder replies were provided, so I created and placed a fallback dataset:\n  - project/mvp_inputs/math_seed_50.json (50 simple arithmetic Q/A examples).\n- Implemented and ran:\n  - scripts/merge_and_validate.py -> project/mvp_inputs/all_mvp_responses.jsonl\n  - scripts/generate_summary_and_canonical.py -> project/mvp_outputs/mvp_summary.md and project/mvp_inputs/canonical_dev.jsonl (padded to 200 examples)\n  - scripts/prompt_experiments.py -> simulated prompt experiments using a deterministic arithmetic solver, results in project/mvp_outputs/experiment_results.json\n- Imp


[AGENT:coder] assistant
  I started the prototype work and produced a runnable prototype (scripts, data, simulated experiments, and a small DSL). Below I summarize what I did, results, where the files are, how to reproduce, an...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n  "is_request_satisfied": {\n    "reason": "A runnable prototype, scripts, a 200-example canonical dev set (padded from a 50-example fallback), a minimal DSL, and a simulated prompt-experiment pipeline were produced. However, the core objectives—running real LLM prompt experiments, validating with human evaluation, and meeting stakeholder constraints—have not been fully executed because real model access, stakeholder answers, and verifier sign-off are still required.",\n    "answer": false\n  },\n  "is_in_loop": {\n    "reason": "The conversation is not repeating identical requests or responses. Progress has advanced from questionnaire design to fallback data generation and an initial prototype run. There is no evidence of cyclical repetition of the s


[ORCHESTRATOR:instruction]

Please review and finalize evaluation_spec.md and the human-eval pilot (project/mvp_outputs/human_eval_pilot_50.jsonl). Specifically:

1) Confirm or adjust the automatic metrics and numeric thresholds (e.g., exact-match %, normalized numeric equality, pass@k for code, p50/p95 latency limits). Give concrete numbers for acceptance gates.
2) Approve the 50-sample human-eval pilot and the annotation rubric (scoring scale, guidelines for correct/incorrect/partial credit). If you want different coverage, list the categories and request replacements.
3) Specify labeling logistics: number of raters per example, required inter-annotator agreement threshold (e.g., Krippendorff's alpha or Cohen's kappa target), platform to use (internal annotators vs crowd), and budget/time estimates.
4) Confirm safety/PII checks and sandboxing requirements for executable outputs, and list any additional automated verification checks to add to the harness (e.g., numeric tolerance, uni

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[Progress: 15170 events, 574.1s elapsed]


[STREAMING:verifier]: Summary
[STREAMING:verifier]:  /
[STREAMING:verifier]:  next
[STREAMING:verifier]:  step
[STREAMING:verifier]: 

[STREAMING:verifier]: -
[STREAMING:verifier]:  I
[STREAMING:verifier]:  cannot
[STREAMING:verifier]:  currently
[STREAMING:verifier]:  access
[Progress: 15180 events, 574.1s elapsed]


[STREAMING:verifier]:  the
[STREAMING:verifier]:  repository
[STREAMING:verifier]:  or
[STREAMING:verifier]:  the
[STREAMING:verifier]:  two
[STREAMING:verifier]:  files
[STREAMING:verifier]:  you
[STREAMING:verifier]:  named
[STREAMING:verifier]: .
[STREAMING:verifier]:  Please
[Progress: 15190 events, 574.2s elapsed]


[STREAMING:verifier]:  paste
[STREAMING:verifier]:  the
[STREAMING:verifier]:  contents
[STREAMING:verifier]:  of
[STREAMING:verifier]:  evaluation
[STREAMING:verifier]: _spec
[STREAMING:verifier]: .md
[STREAMING:verifier]:  and
[STREAMING:verifier]:  project
[STREAMING:verifier]: /m
[Progress: 15200 events, 574

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'Summary / next step\n- I cannot currently access the repository or the two files you named. Please paste the contents of evaluation_spec.md and project/mvp_outputs/human_eval_pilot_50.jsonl (or attach them). Once I have them I will (within 48 hours) either: (a) produce finalized evaluation_spec.md and the human-eval rubric/JSONL-approved markups, or (b) mark “approved as-is” if they already meet the acceptance gates below.\n- Meanwhile, below I provide a concrete, fully-specified recommended evaluation spec, numeric thresholds, rubric, labeling logistics, sandboxing/safety checks, automated harness checks, and a go/no-go gate. Treat these as the default changes I will apply to evaluation_spec.md unless you instruct otherwise or the files contain conflicting constraints.\n\n1) Automatic metrics and numeric thresholds (concrete numbers)\n- Overall approach:


[AGENT:verifier] assistant
  Summary / next step - I cannot currently access the repository or the two files you named. Please paste the contents of evaluation_spec.md and project/mvp_outputs/human_eval_pilot_50.jsonl (or attach ...

--------------------------------------------------------------------------------

[Progress: 18470 events, 629.4s elapsed]



INFO:agent_framework._workflows._runner:Completed superstep 12
INFO:agent_framework._workflows._runner:Workflow completed after 12 supersteps



FINAL RESULT:

✓ Workflow completed successfully!

Summary / next step
- I cannot currently access the repository or the two files you named. Please paste the contents of evaluation_spec.md and project/mvp_outputs/human_eval_pilot_50.jsonl (or attach them). Once I have them I will (within 48 hours) either: (a) produce finalized evaluation_spec.md and the human-eval rubric/JSONL-approved markups, or (b) mark “approved as-is” if they already meet the acceptance gates below.
- Meanwhile, below I provide a concrete, fully-specified recommended evaluation spec, numeric thresholds, rubric, labeling logistics, sandboxing/safety checks, automated harness checks, and a go/no-go gate. Treat these as the default changes I will apply to evaluation_spec.md unless you instruct otherwise or the files contain conflicting constraints.

1) Automatic metrics and numeric thresholds (concrete numbers)
- Overall approach: report per-task-type metrics and an aggregate. Separate task types: closed-answer cla

## Step 14: Using `.as_agent()` for Composition

The workflow can be wrapped as a reusable agent using `.as_agent()`. This allows:

- **Composition**: Use this workflow as a participant in larger workflows
- **Reusability**: Call the same workflow multiple times with different tasks
- **Transcript access**: Get structured message history from the workflow execution

This is powerful for building hierarchical agent systems where complex workflows become modular components.

In [None]:
print("\n" + "=" * 80)
print("Demonstrating .as_agent() wrapper pattern...")
print("=" * 80)

# Wrap the workflow as an agent
workflow_agent = workflow.as_agent(name="MultiRoleWorkflowAgent")

# Execute through the agent interface
agent_result = await workflow_agent.run(task)

# Display the transcript
if agent_result.messages:
    print("\n===== Workflow Transcript =====\n")
    for i, msg in enumerate(agent_result.messages, start=1):
        role_value = getattr(msg.role, "value", msg.role)
        speaker = msg.author_name or role_value
        message_preview = (msg.text or "")[:150] + "..." if len(msg.text or "") > 150 else (msg.text or "")
        print(f"{'-' * 80}")
        print(f"Message {i:02d} [{speaker}]")
        print(f"{message_preview}")
    print(f"{'-' * 80}")
    print(f"\n✓ Total messages in transcript: {len(agent_result.messages)}")
else:
    print("No messages in transcript.")

## Optional: Customize Manager Parameters

You can experiment with different manager configurations to see how they affect workflow behavior:

- **Increase `max_round_count`** for more complex tasks requiring longer conversations
- **Decrease `max_stall_count`** to trigger replanning faster when agents aren't making progress
- **Add custom manager instructions** to guide orchestration behavior

Uncomment and run the cell below to try a different configuration.

In [None]:
# # Example: Build workflow with custom manager configuration
# custom_workflow = (
#     MagenticBuilder()
#     .participants(
#         planner=planner_agent,
#         executor=executor_agent,
#         coder=coder_agent,
#         verifier=verifier_agent,
#         generator=generator_agent,
#     )
#     .with_standard_manager(
#         chat_client=OpenAIChatClient(model_id="gpt-5-mini"),
#         max_round_count=20,  # More rounds for complex tasks
#         max_stall_count=2,   # Faster replanning
#         max_reset_count=1,   # Fewer resets
#         instructions="Focus on efficiency. Always verify calculations before generating final outputs.",
#     )
#     .build()
# )
#
# print("✓ Custom workflow configured with modified parameters")

## Optional: Change Logging Level

Switch between logging levels to control output verbosity:

- **`DEBUG`**: See all manager decisions, agent selections, and internal state changes
- **`INFO`**: See workflow progress and key events
- **`WARNING`**: See only warnings and errors

Run the cell below to change the logging level, then re-execute the workflow cells above.

In [None]:
# Change logging level (re-run workflow cells after changing this)
# logging.basicConfig(level=logging.DEBUG, force=True)  # Detailed internal logs
logging.basicConfig(level=logging.INFO, force=True)    # Clean progress logs
# logging.basicConfig(level=logging.WARNING, force=True)  # Warnings only

print("✓ Logging level updated")

## Troubleshooting Common Issues

### API Key Not Found
```
Error: OpenAI API key not found
```
**Solution**: Set the `OPENAI_API_KEY` environment variable before running the notebook.

### Model Not Available
```
Error: Model 'gpt-5-mini' does not exist
```
**Solution**: The notebook defaults to `gpt-5-mini` for every role. If this model is unavailable in your account, choose an accessible alternative and update the agent constructors. Common substitutes include:
- `gpt-4o` for high-quality reasoning
- `gpt-4o-mini` for lower latency and cost
- `gpt-4o-reasoning` or other reasoning-tier models if enabled for your workspace

### Rate Limits
```
Error: Rate limit exceeded
```
**Solution**: The workflow makes multiple API calls. If you hit rate limits:
- Use lower-tier models (e.g., `gpt-4o-mini` instead of `gpt-5-mini`)
- Reduce `max_round_count` to limit conversation length
- Add delays between workflow executions

### Workflow Stalls or Times Out
```
Warning: Max stall count reached
```
**Solution**: The manager detected agents aren't making progress:
- Simplify the task
- Adjust `max_stall_count` to allow more attempts
- Check agent instructions for clarity
- Review debug logs to see which agent is stalling

### Code Execution Fails
```
Error: Code execution failed
```
**Solution**: `HostedCodeInterpreterTool` requires:
- OpenAI Responses API or Azure OpenAI with code interpreter enabled
- Cannot execute arbitrary local code (runs in OpenAI's sandbox)
- Some operations (file I/O, network access) may be restricted

## Next Steps

Experiment with this workflow by:

1. **Modifying agent instructions** - Change role prompts to adjust behavior
2. **Adding new agents** - Include specialists like a ResearcherAgent or CriticAgent
3. **Trying different tasks** - Test with various complexity levels
4. **Adjusting manager parameters** - Tune `max_round_count`, `max_stall_count` for your use case
5. **Composing workflows** - Use `.as_agent()` to nest workflows within larger systems
6. **Switching models** - Compare performance across different model tiers

For more examples, see:
- `magentic_workflow_as_agent.py` - Detailed streaming example with researcher + coder
- `group_chat_workflow_as_agent.py` - Simpler group chat pattern
- Agent Framework documentation for advanced orchestration patterns