# Multi-Role Workflow with Specialized Agents

## Overview

This notebook demonstrates a structured multi-agent workflow using five specialized agents coordinated by an intelligent manager. The workflow follows a systematic pattern:

```
┌─────────────────────────────────────────────────────────┐
│                   Task Input                            │
└─────────────────┬───────────────────────────────────────┘
                  │
                  ▼
          ┌───────────────┐
          │  Executor     │ ◄─── Carries out reasoning-heavy
          │ (gpt-5-mini)  │      steps and delegates to tools
          └───────┬───────┘
                  │
                  ▼
          ┌───────────────┐
          │  Coder        │ ◄─── Writes and executes code
          │ (gpt-5-mini)  │      for computation and analysis
          └───────┬───────┘
                  │
                  ▼
          ┌───────────────┐
          │  Verifier     │ ◄─── Inspects outputs, confirms
          │ (gpt-5-mini)  │      requirements are satisfied
          └───────┬───────┘
                  │
                  ▼
          ┌───────────────┐
          │  Generator    │ ◄─── Assembles final response
          │ (gpt-5-mini)  │      with verified outputs
          └───────────────┘
```

## How It Works

The `MagenticBuilder` creates a workflow with a `StandardMagenticManager` that:

1. **Reads agent descriptions** to understand each agent's capabilities
2. **Creates dynamic plans** based on the task requirements
3. **Selects appropriate agents** for each step of the plan
4. **Tracks progress** and adapts when agents stall or plans need revision
5. **Manages completion** when the task is successfully resolved

## Model Assignments

Each agent uses the same model tier for consistency:

- **Planner (gpt-5-mini)**: Provides advanced reasoning for decomposing complex tasks
- **Executor (gpt-5-mini)**: Handles reasoning-heavy execution steps
- **Coder (gpt-5-mini + code interpreter)**: Generates and runs code via the hosted interpreter
- **Verifier (gpt-5-mini)**: Validates outputs and ensures quality
- **Generator (gpt-5-mini)**: Synthesizes verified outputs into a polished response

## Prerequisites

Set these environment variables:

```bash
export OPENAI_API_KEY="your-api-key-here"
# Optional: Specify endpoints if using Azure OpenAI
# export AZURE_OPENAI_ENDPOINT="your-endpoint"
# export AZURE_OPENAI_API_KEY="your-key"
```

In [None]:
# Copyright (c) Microsoft. All rights reserved.
## Prerequisites

In [None]:
export OPENAI_API_KEY="your-api-key-here"
export OPENAI_BASE_URL="https://api.openai.com/v1"
# Optional: Specify endpoints if using Azure OpenAI
# export AZURE_OPENAI_ENDPOINT="your-endpoint"
# export AZURE_OPENAI_API_KEY="your-key"


## Step 1: Import Required Modules

We import the core agent framework components:
- `ChatAgent`: Base agent class for creating specialized agents
- `MagenticBuilder`: Workflow builder for multi-agent orchestration
- `HostedCodeInterpreterTool`: Tool for code execution capabilities
- Event types for monitoring workflow execution
- OpenAI client implementations for different model APIs

In [1]:
import logging

from agent_framework import (
    ChatAgent,
    HostedCodeInterpreterTool,
    MagenticAgentDeltaEvent,
    MagenticAgentMessageEvent,
    MagenticBuilder,
    MagenticFinalResultEvent,
    MagenticOrchestratorMessageEvent,
    WorkflowOutputEvent,
)
from agent_framework.openai import OpenAIResponsesClient
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Step 2: Configure Debug Logging

Enable debug logging to observe the manager's decision-making process:

- **INFO level**: Shows high-level workflow progress
- **DEBUG level**: Reveals manager's agent selection logic, plan creation, and progress tracking

You can change this to `logging.INFO` for cleaner output, or `logging.DEBUG` to see internal orchestration decisions.

In [2]:
# Set to DEBUG to see manager's decision-making, INFO for cleaner output
logging.basicConfig(level=logging.INFO, force=True)
logger = logging.getLogger(__name__)

print("✓ Logging configured at INFO level (change to DEBUG for detailed internal logs)")

✓ Logging configured at INFO level (change to DEBUG for detailed internal logs)


## Step 3: Define Role Prompts

Each agent receives specialized instructions that define its role in the workflow. These prompts guide the agent's behavior and help the manager understand when to invoke each agent.

In [3]:
EXECUTOR_PROMPT = """You are the executor module. Carry out the active instruction from the
manager or planner. Execute reasoning-heavy steps, delegate to registered tools when needed,
and produce clear artefacts or status updates. If a tool is required, call it explicitly and
then explain the outcome."""

VERIFIER_PROMPT = """You are the verifier module. Inspect the current state, outputs, and
assumptions. Confirm whether the work satisfies requirements, highlight defects or missing
information, and suggest concrete follow-up actions."""

GENERATOR_PROMPT = """You are the generator module. Assemble the final response for the
user. Incorporate verified outputs, cite supporting evidence when available, and ensure the
result addresses the original request without leaking internal reasoning unless explicitly
requested."""

## Step 5: Configure the Executor Agent

**Role**: Carries out reasoning-heavy steps and delegates to tools when computation is needed.

**Model**: `gpt-5-mini` - Balances capability and cost for general execution tasks.

**When invoked**: The manager selects the Executor for steps requiring reasoning, analysis, or coordination between other agents.

The Executor acts as the "general worker" that handles steps not requiring specialized capabilities.

In [4]:
executor_agent = ChatAgent(
    name="ExecutorAgent",
    description="Executes reasoning-heavy tasks and coordinates work between specialized agents.",
    instructions=EXECUTOR_PROMPT,
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="medium",
        store=True,
        temperature=0.7,
        max_tokens=4096,
    ),
)

print("✓ Executor Agent configured with gpt-5-mini and enhanced parameters")

✓ Executor Agent configured with gpt-5-mini and enhanced parameters


## Step 6: Configure the Coder Agent

**Role**: Writes and executes code for data processing, calculations, and analysis.

**Model**: `gpt-5-mini` paired with the hosted code interpreter for program execution.

**Tools**: `HostedCodeInterpreterTool()` - Enables code execution in a hosted sandbox environment.

**When invoked**: The manager selects the Coder when computational analysis, data processing, or numerical calculations are needed.

Note: The code execution happens in OpenAI's hosted environment, not locally. Results are returned as part of the agent's response.

In [5]:
coder_agent = ChatAgent(
    name="CoderAgent",
    description="Writes and executes code to perform calculations, data analysis, and computational tasks.",
    instructions="You solve questions using code. Write clear, well-documented code and provide detailed analysis of computation results.",
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="high",  # Higher reasoning for code generation
        store=True,
        temperature=0.3,  # Lower temperature for more deterministic code
        max_tokens=8192,  # More tokens for code + explanations
    ),
    tools=HostedCodeInterpreterTool(),
)

print("✓ Coder Agent configured with gpt-5-codex and enhanced parameters")

✓ Coder Agent configured with gpt-5-codex and enhanced parameters


## Step 7: Configure the Verifier Agent

**Role**: Validates outputs, checks for defects, and ensures requirements are satisfied.

**Model**: `gpt-5-mini` - Provides reliable reasoning to perform thorough quality checks.

**When invoked**: The manager calls the Verifier after execution steps to validate correctness, completeness, and adherence to requirements.

The Verifier acts as a quality gate, catching errors before final response generation.

In [6]:
verifier_agent = ChatAgent(
    name="VerifierAgent",
    description="Validates outputs, checks assumptions, and confirms work meets requirements.",
    instructions=VERIFIER_PROMPT,
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="high",  # High reasoning for thorough validation
        store=True,
        temperature=0.5,  # Balanced for analytical verification
        max_tokens=4096,
    ),
)

print("✓ Verifier Agent configured with gpt-5-mini and enhanced parameters")

✓ Verifier Agent configured with gpt-5-mini and enhanced parameters


## Step 8: Configure the Generator Agent

**Role**: Assembles the final user-facing response by synthesizing verified outputs.

**Model**: `gpt-5-mini` - Cost-efficient model suitable for synthesis and formatting tasks.

**When invoked**: The manager selects the Generator as the final step to create a polished response for the user.

The Generator ensures the final output is clear, comprehensive, and addresses the original request without exposing internal workflow details.

In [7]:
generator_agent = ChatAgent(
    name="GeneratorAgent",
    description="Synthesizes final responses by incorporating verified outputs and supporting evidence.",
    instructions=GENERATOR_PROMPT,
    chat_client=OpenAIResponsesClient(
        model_id="gpt-5-mini",
        reasoning_effort="low",  # Lower reasoning for synthesis tasks
        store=True,
        temperature=0.8,  # Higher temperature for creative synthesis
        max_tokens=6144,  # More tokens for comprehensive responses
    ),
)

print("✓ Generator Agent configured with gpt-5-mini and enhanced parameters")

✓ Generator Agent configured with gpt-5-mini and enhanced parameters


## Understanding OpenAI Responses API Parameters

The `OpenAIResponsesClient` accepts several parameters to fine-tune agent behavior:

### Key Parameters

- **`reasoning_effort`**: Controls how much computational effort the model uses for reasoning
  - `"low"`: Faster responses, suitable for simple tasks or synthesis
  - `"medium"`: Balanced performance (default for most agents)
  - `"high"`: Maximum reasoning depth for complex analysis, verification, or planning

- **`store`**: Boolean flag to enable conversation storage
  - `True`: Stores conversations for potential learning and improvement
  - `False`: No storage (default)
  - Useful for building persistent context across sessions

- **`temperature`**: Controls randomness in responses (0.0 - 2.0)
  - `0.0-0.3`: Deterministic, focused (ideal for code generation)
  - `0.4-0.7`: Balanced creativity and consistency (general tasks)
  - `0.8-1.0`: More creative and varied (synthesis, brainstorming)
  - `>1.0`: Highly creative but potentially inconsistent

- **`max_tokens`**: Maximum length of the response
  - Adjust based on expected output length
  - Code agents may need more tokens (8192+)
  - Simple agents can use fewer (2048-4096)

- **`top_p`**: Nucleus sampling (0.0 - 1.0)
  - Controls diversity via cumulative probability
  - Lower values = more focused responses

- **`frequency_penalty`**: Reduces repetition (-2.0 to 2.0)
  - Positive values discourage repeating the same words

- **`presence_penalty`**: Encourages topic diversity (-2.0 to 2.0)
  - Positive values encourage exploring new topics

### Agent-Specific Tuning in This Notebook

- **Planner**: Medium reasoning, moderate temperature (balanced planning)
- **Executor**: Medium reasoning, moderate temperature (general execution)
- **Coder**: High reasoning, low temperature (precise code generation)
- **Verifier**: High reasoning, moderate temperature (thorough validation)
- **Generator**: Low reasoning, higher temperature (creative synthesis)
- **Manager**: High reasoning, moderate temperature (strategic orchestration)

## Step 9: Understanding the MagenticBuilder Pattern

The `MagenticBuilder` constructs a workflow with intelligent orchestration:

1. **`.participants()`** - Registers all agents with the workflow. Each agent's `name` and `description` help the manager understand its capabilities.

2. **`.with_standard_manager()`** - Configures the orchestration manager with:
   - **chat_client**: LLM used by the manager for decision-making
   - **max_round_count**: Maximum conversation turns (prevents infinite loops)
   - **max_stall_count**: How many times agents can be unproductive before replanning
   - **max_reset_count**: How many times workflow can reset from scratch

The manager uses these parameters to adaptively coordinate agents throughout the workflow lifecycle.

## Step 10: Build the Workflow

Now we construct the workflow by:

1. Creating a `MagenticBuilder` instance
2. Registering all five agents as participants
3. Configuring the standard manager with appropriate limits
4. Building the final workflow object

The manager will read each agent's `description` field to understand when to invoke them during execution.

In [8]:
print("\nBuilding Magentic Workflow with 5 specialized agents...")

workflow = (
    MagenticBuilder()
    .participants(
        coder=coder_agent,
        verifier=verifier_agent,
        generator=generator_agent,
    )
    .with_standard_manager(
        chat_client=OpenAIResponsesClient(
            model_id="gpt-5-mini",
            reasoning_effort="high",  # Manager needs high reasoning for orchestration
            store=True,
            temperature=0.6,
            max_tokens=8192,  # Manager needs more tokens for planning
        ),
        max_round_count=6,  # Allow enough rounds for 5-agent coordination
        max_stall_count=3,   # Replan if agents stall 3 times
        max_reset_count=2,   # Allow 2 full workflow resets if needed
    )
    .build()
)

print("✓ Workflow built successfully!")
print(f"  - Registered agents: {len([coder_agent, verifier_agent, generator_agent])}")
print("  - Manager model: gpt-5-mini with enhanced parameters")
print("  - Max rounds: 6")
print("\n⚠️  Note: 5-agent workflows using gpt-5-mini can take a few minutes to complete.")

INFO:agent_framework._workflows._magentic:Building Magentic workflow with 3 participants



Building Magentic Workflow with 5 specialized agents...
✓ Workflow built successfully!
  - Registered agents: 3
  - Manager model: gpt-5-mini with enhanced parameters
  - Max rounds: 6

⚠️  Note: 5-agent workflows using gpt-5-mini can take a few minutes to complete.


## Step 11: Define the Task

We'll use a complex task that exercises all five agents:

- **Planner**: Decomposes the multi-part analysis
- **Executor**: Coordinates research and execution
- **Coder**: Writes code to calculate energy consumption and CO2 emissions
- **Verifier**: Validates calculations and assumptions
- **Generator**: Synthesizes findings into a comprehensive report

This task requires data analysis, computational verification, and structured output—ideal for demonstrating the multi-role workflow.

In [9]:
task = (
    "I'm build a ai system that help reasoning and problem parsing capabilities. "
)

print("\n" + "=" * 80)
print("TASK:")
print("=" * 80)
print(task)
print("=" * 80)


TASK:
I'm build a ai system that help reasoning and problem parsing capabilities. 


## Step 12: Understanding Workflow Events

The workflow emits several event types during execution:

- **`MagenticOrchestratorMessageEvent`**: Manager's planning, agent selection, and coordination messages
- **`MagenticAgentDeltaEvent`**: Streaming text chunks from agents (real-time output)
- **`MagenticAgentMessageEvent`**: Complete agent messages after streaming finishes
- **`MagenticFinalResultEvent`**: The workflow's final result when complete
- **`WorkflowOutputEvent`**: Structured output data from the workflow

By handling these events, we can observe:
- Which agent the manager selects for each step
- The manager's reasoning for agent selection
- Real-time progress as agents work
- Final results and completion status

## Step 13: Execute the Workflow with Streaming

Now we'll execute the workflow and observe the orchestration in action:

- Watch the manager select appropriate agents for each step
- See streaming output as agents work
- Observe the coordination between planning, execution, coding, verification, and generation

**Note**: This may take several minutes depending on task complexity and API response times.

In [10]:
import time
from datetime import datetime

print("\nStarting workflow execution...")
print(f"Started at: {datetime.now().strftime('%H:%M:%S')}")
print("⏳ This may take 1-3 minutes for complex tasks...\n")

last_stream_agent_id: str | None = None
stream_line_open: bool = False
final_output: str | None = None
event_count = 0
start_time = time.time()

try:
    async for event in workflow.run_stream(task):
        event_count += 1
        if event_count % 10 == 0:
            elapsed = time.time() - start_time
            print(f"\n[Progress: {event_count} events, {elapsed:.1f}s elapsed]\n", flush=True)

        if isinstance(event, MagenticOrchestratorMessageEvent):
            print(f"\n[ORCHESTRATOR:{event.kind}]\n")
            print(f"{getattr(event.message, 'text', '')}\n")
            print("-" * 80)

        elif isinstance(event, MagenticAgentDeltaEvent):
            if last_stream_agent_id != event.agent_id or not stream_line_open:
                if stream_line_open:
                    print()
                print(f"\n[STREAMING:{event.agent_id}]: ", end="", flush=True)
                last_stream_agent_id = event.agent_id
                stream_line_open = False
            if event.text:
                print(event.text, end="", flush=True)

        elif isinstance(event, MagenticAgentMessageEvent):
            if stream_line_open:
                print(" ✓")
                stream_line_open = False
            msg = event.message
            if msg is not None:
                response_text = (msg.text or "").replace("\n", " ")
                display_text = response_text[:200] + "..." if len(response_text) > 200 else response_text
                print(f"\n[AGENT:{event.agent_id}] {msg.role.value}")
                print(f"  {display_text}\n")
                print("-" * 80)

        elif isinstance(event, MagenticFinalResultEvent):
            if stream_line_open:
                print()
                stream_line_open = False

            print("\n" + "=" * 80)
            print("FINAL RESULT:")
            print("=" * 80)
            print("\n✓ Workflow completed successfully!\n")

            if event.message is not None:
                print(event.message.text)

            if final_output is not None:
                print("\nWorkflow Output Data:")
                print(final_output)

            print("=" * 80)

        elif isinstance(event, WorkflowOutputEvent):
            final_output = str(event.data) if event.data is not None else None

except Exception as exc:
    if stream_line_open:
        print()
    stream_line_open = False
    print(f"\nWorkflow execution failed: {exc}")

finally:
    if stream_line_open:
        print()

INFO:agent_framework._workflows._magentic:Magentic Orchestrator: Received start message
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': "I'm build a ai system that help reasoning and problem parsing capabilities. "}], 'additional_properties': {}}
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': 'Below I will present you a request.\n\nBefore we begin addressing the request, please answer the following pre-survey to the best of your ability.\nKeep in mind that you are Ken Jennings-level with trivia, and Mensa-level with puzzles, so there should be\na deep well to draw from.\n\nHere is the request:\n\nI\'m build a ai system that help reasoning and problem parsing capabilities. \n\nHere is the pre-survey:\n\n    1. Please list any specific facts or figures that are GIVEN in the request itself. It is possible that\n       there are none


Starting workflow execution...
Started at: 04:53:03
⏳ This may take 1-3 minutes for complex tasks...



INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '1. GIVEN OR VERIFIED FACTS\n- You are building an AI system whose purpose is to help with reasoning and problem parsing capabilities.\n- The request asks for a pre-survey listing: (a) facts given in the request, (b) facts to look up and where, (c) facts to derive, and (d) educated guesses.\n\n2. FACTS TO LOOK UP\n- State‑of‑the‑art research on reasoning and problem parsing: search arXiv, Google Scholar, ACL Anthology, ICLR/NeurIPS/ICML/AAAI conference proceedings.\n- Relevant benchmarks and datasets (descriptions, sizes, splits, licensing): GSM8K, MATH, BigBench, MMLU, StrategyQA, ARC, DROP, HotpotQA, SQuAD, HumanEval, (find on Papers With Code, Hugging Face datasets, dataset authors’ GitHub repos).\n- Recent model architectures and performance numbers f


[ORCHESTRATOR:user_task]

I'm build a ai system that help reasoning and problem parsing capabilities. 

--------------------------------------------------------------------------------

[ORCHESTRATOR:task_ledger]


We are working to address the following user request:

I'm build a ai system that help reasoning and problem parsing capabilities. 


To answer this request we have assembled the following team:

- coder: Writes and executes code to perform calculations, data analysis, and computational tasks.
- verifier: Validates outputs, checks assumptions, and confirms work meets requirements.
- generator: Synthesizes final responses by incorporating verified outputs and supporting evidence.


Here is an initial fact sheet to consider:

1. GIVEN OR VERIFIED FACTS
- You are building an AI system whose purpose is to help with reasoning and problem parsing capabilities.
- The request asks for a pre-survey listing: (a) facts given in the request, (b) facts to look up and where, (c) facts to

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:generator]: Scope
[Progress: 10 events, 66.0s elapsed]


[STREAMING:generator]:  &
[STREAMING:generator]:  Success
[STREAMING:generator]: -C
[STREAMING:generator]: riteria
[STREAMING:generator]:  Spec
[STREAMING:generator]:  for
[STREAMING:generator]:  “
[STREAMING:generator]: Reason
[STREAMING:generator]: ing
[STREAMING:generator]: ”
[Progress: 20 events, 66.0s elapsed]


[STREAMING:generator]:  +
[STREAMING:generator]:  “
[STREAMING:generator]: Problem
[STREAMING:generator]:  Parsing
[STREAMING:generator]: ”
[STREAMING:generator]:  Prototype
[STREAMING:generator]: 

[STREAMING:generator]: (
[STREAMING:generator]: Deliver
[STREAMING:generator]: able
[Progress: 30 events, 66.1s elapsed]


[STREAMING:generator]: :
[STREAMING:generator]:  
[STREAMING:generator]: 1
[STREAMING:generator]: –
[STREAMING:generator]: 2
[STREAMING:generator]:  page
[STREAMING:generator]:  spec
[STREAMING:generator]:  for
[STREAMING:generator]:  coder
[STREAMING:generator]:  +
[Progress: 40 events, 6

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'Scope & Success-Criteria Spec for “Reasoning” + “Problem Parsing” Prototype\n(Deliverable: 1–2 page spec for coder + verifier — ready to start literature/dataset scan and eval design)\n\n1) Definitions (precise)\n- Reasoning: the model’s process to arrive at a correct answer that requires multi-step inference, manipulation, or search over internal/external representations. Subtypes we will target:\n  - Arithmetic / mathematical word problems: numeric reasoning, units, multi-step arithmetic (e.g., GSM8K, SVAMP).\n  - Symbolic / algebraic reasoning: manipulating expressions, symbolic solutions, proof-like steps (e.g., MATH).\n  - Multi-hop factual reasoning: chaining facts across documents/knowledge to answer a question (e.g., HotpotQA).\n  - Commonsense / logical reasoning: everyday physics/commonsense/pragmatic inference (e.g., CommonsenseQA, PIQA).\n  - 


[AGENT:generator] assistant
  Scope & Success-Criteria Spec for “Reasoning” + “Problem Parsing” Prototype (Deliverable: 1–2 page spec for coder + verifier — ready to start literature/dataset scan and eval design)  1) Definitions (...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n  "is_request_satisfied": {\n    "reason": "A concrete scope and success-criteria spec has been produced by the generator, but the original request (building an AI system that helps reasoning and problem parsing) is not fully implemented: datasets have not been ingested, models have not been prototyped or evaluated, and annotators/compute resources have not been provisioned. The spec is a necessary milestone but not the complete solution.",\n    "answer": false\n  },\n  "is_in_loop": {\n    "reason": "Conversation has progressed from request → team assembly → generator spec. There is no evidence of repeated identical requests/responses or circular actions; this is forward movement from planning to implementation.",\n    "answer": false\n  },\n  "is_pr


[Progress: 3040 events, 125.6s elapsed]



INFO:agent_framework._workflows._runner:Completed superstep 2
INFO:agent_framework._workflows._runner:Starting superstep 3
INFO:agent_framework._workflows._magentic:Agent coder: Received request to respond
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'system'}, 'contents': [{'type': 'text', 'text': 'Transferred to coder, adopt the persona immediately.'}], 'additional_properties': {}}
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': 'Action items for coder (please complete or respond within 3 business days):\n\n1) Compute & access confirmation (deadline: Day 2):\n   - Confirm available compute resources (GPU types and counts, vCPU/RAM, disk/S3/GCS quotas). Specify per-GPU memory (e.g., 16GB, 40GB, 80GB), whether preemptible instances are acceptable, and estimated monthly budget constraints.\n   - Confirm ability to run Docker images and CI (GitHub Actions or equivalent). Prov


[ORCHESTRATOR:instruction]

Action items for coder (please complete or respond within 3 business days):

1) Compute & access confirmation (deadline: Day 2):
   - Confirm available compute resources (GPU types and counts, vCPU/RAM, disk/S3/GCS quotas). Specify per-GPU memory (e.g., 16GB, 40GB, 80GB), whether preemptible instances are acceptable, and estimated monthly budget constraints.
   - Confirm ability to run Docker images and CI (GitHub Actions or equivalent). Provide credentials or access process for the storage location where ingestion outputs will be written (S3/GCS path or repo).

2) Literature & benchmark scan (artifact: one-page summaries) (deadline: Day 3):
   - Produce a short (1-paragraph each) literature/benchmark scan for the datasets listed in the spec (GSM8K, SVAMP, MATH, HumanEval, MBPP, HotpotQA, CommonsenseQA, Spider, UD, AMR). Include canonical download links and note any licensing/usage constraints.

3) Dataset ingestion scripts & sample outputs (artifact: scrip

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:coder]: I've
[STREAMING:coder]:  prepared
[STREAMING:coder]:  the
[STREAMING:coder]:  initial
[STREAMING:coder]:  artifacts
[STREAMING:coder]:  requested
[STREAMING:coder]:  and
[Progress: 3050 events, 250.0s elapsed]


[STREAMING:coder]:  packaged
[STREAMING:coder]:  them
[STREAMING:coder]:  into
[STREAMING:coder]:  a
[STREAMING:coder]:  repo
[STREAMING:coder]:  zip
[STREAMING:coder]:  you
[STREAMING:coder]:  can
[STREAMING:coder]:  download
[STREAMING:coder]:  and
[Progress: 3060 events, 250.1s elapsed]


[STREAMING:coder]:  inspect
[STREAMING:coder]: .


[STREAMING:coder]: Download
[STREAMING:coder]: able
[STREAMING:coder]:  artifact
[STREAMING:coder]: 

[STREAMING:coder]: -
[STREAMING:coder]:  [
[STREAMING:coder]: Download
[STREAMING:coder]:  the
[Progress: 3070 events, 250.4s elapsed]


[STREAMING:coder]:  repo
[STREAMING:coder]:  zip
[STREAMING:coder]: ](
[STREAMING:coder]: sandbox
[STREAMING:coder]: :/
[STREAMING:coder]: mnt
[STREAMING:coder]: /data
[STREAMING:coder]

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': "I've prepared the initial artifacts requested and packaged them into a repo zip you can download and inspect.\n\nDownloadable artifact\n- [Download the repo zip](sandbox:/mnt/data/dataset_ingestion_repo.zip)\n\nWhat I delivered (pushed into the zip)\n- README.md — quick-start instructions and one-line commands.\n- schema/schema.json — machine-readable JSON Schema for normalized dataset examples.\n- schema/validate_samples.py — script to validate .jsonl files against schema.json.\n- ingestion_scripts/ingest_generic.py — reusable ingestion helper (uses Hugging Face datasets when available; supports --local_file).\n- samples/\n  - arithmetic_samples.jsonl (20 examples)\n  - code_samples.jsonl (20 examples)\n  - semantic_parsing_samples.jsonl (20 examples)\n- tests/test_samples.py — unit-test script that validates sample counts and schema compliance.\n- BASEL


[AGENT:coder] assistant
  I've prepared the initial artifacts requested and packaged them into a repo zip you can download and inspect.  Downloadable artifact - [Download the repo zip](sandbox:/mnt/data/dataset_ingestion_repo....

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n    "is_request_satisfied": {\n        "reason": "A concrete scope and success-criteria spec was produced and the coder delivered ingestion scripts, schema, and sample data. However the overall project is not complete: key downstream tasks (dataset ingestion at scale, model baselines, parser prototype, and human annotations) remain unexecuted due to missing compute/access and pending evaluation rubrics. The original goal—to build an AI system that helps reasoning and problem parsing—is therefore not yet fully satisfied.",\n        "answer": false\n    },\n    "is_in_loop": {\n        "reason": "Conversation is progressing: new artifacts (repo zip, scripts, samples) were provided and next actionable items were identified. There is no repeated request/r


[ORCHESTRATOR:instruction]

Please do the following within 3 business days: (1) Review the repo artifacts (schema/schema.json, samples/*.jsonl, ingestion_scripts/) and confirm the schema covers required fields for arithmetic, code, and semantic parsing tasks; list any missing schema fields or edge cases. (2) Produce a formal evaluation rubric document (machine-readable + human-readable) that maps each target task to the metric(s) from the spec, the conservative and ambitious thresholds, and how to compute them (including exact evaluation scripts/commands or pseudocode). Include CoT quality rubric (1–5) with explicit criteria and examples. (3) Draft annotation guidelines for human labelers: instructions, examples, corner cases, and a required inter-annotator agreement (IAA) target. Specify the annotation tool to use (LabelStudio/Prodigy/custom) and the minimum staffing (number of annotators, estimated hours) needed to label the initial 200 validation examples. (4) Produce a privacy/com

INFO:agent_framework._workflows._runner:Completed superstep 4
INFO:agent_framework._workflows._runner:Starting superstep 5
INFO:agent_framework._workflows._magentic:Agent verifier: Received request to respond
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'system'}, 'contents': [{'type': 'text', 'text': 'Transferred to verifier, adopt the persona immediately.'}], 'additional_properties': {}}
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': 'Please do the following within 3 business days: (1) Review the repo artifacts (schema/schema.json, samples/*.jsonl, ingestion_scripts/) and confirm the schema covers required fields for arithmetic, code, and semantic parsing tasks; list any missing schema fields or edge cases. (2) Produce a formal evaluation rubric document (machine-readable + human-readable) that maps each target task to the metric(s) from the spec, the conservative and am


[STREAMING:verifier]: Summary
[STREAMING:verifier]:  /
[STREAMING:verifier]:  immediate
[STREAMING:verifier]:  blockers
[STREAMING:verifier]: 

[STREAMING:verifier]: -
[STREAMING:verifier]:  I
[STREAMING:verifier]:  currently
[Progress: 4710 events, 331.2s elapsed]


[STREAMING:verifier]:  do
[STREAMING:verifier]:  NOT
[STREAMING:verifier]:  have
[STREAMING:verifier]:  access
[STREAMING:verifier]:  to
[STREAMING:verifier]:  your
[STREAMING:verifier]:  repository
[STREAMING:verifier]:  artifacts
[STREAMING:verifier]:  (
[STREAMING:verifier]: schema
[Progress: 4720 events, 331.2s elapsed]


[STREAMING:verifier]: /schema
[STREAMING:verifier]: .json
[STREAMING:verifier]: ,
[STREAMING:verifier]:  samples
[STREAMING:verifier]: /*.
[STREAMING:verifier]: json
[STREAMING:verifier]: l
[STREAMING:verifier]: ,
[STREAMING:verifier]:  ingestion
[STREAMING:verifier]: _scripts
[Progress: 4730 events, 331.3s elapsed]


[STREAMING:verifier]: /)
[STREAMING:verifier]:  or
[STREAMING:verifier]:  compute
[

INFO:agent_framework._workflows._runner:Completed superstep 5
INFO:agent_framework._workflows._runner:Starting superstep 6
INFO:agent_framework._workflows._magentic:Magentic Orchestrator: Inner loop - round 4
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': "I'm build a ai system that help reasoning and problem parsing capabilities. "}], 'additional_properties': {}}
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': "\nWe are working to address the following user request:\n\nI'm build a ai system that help reasoning and problem parsing capabilities. \n\n\nTo answer this request we have assembled the following team:\n\n- coder: Writes and executes code to perform calculations, data analysis, and computational tasks.\n- verifier: Validates outputs, checks assumptions, and confirms work meets requirements.\n- generator: Synthesizes 


[AGENT:verifier] assistant
  Summary / immediate blockers - I currently do NOT have access to your repository artifacts (schema/schema.json, samples/*.jsonl, ingestion_scripts/) or compute/storage accounts. I cannot inspect or ru...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n    "is_request_satisfied": {\n        "reason": "A concrete scope and success-criteria spec was produced and initial ingestion artifacts were provided, but key follow-up tasks remain incomplete (schema verification, evaluation rubrics committed to the repo, baseline runs, and human annotation). Several blockers (repo access, compute/storage, annotator availability) prevent full completion.",\n        "answer": false\n    },\n    "is_in_loop": {\n        "reason": "The conversation is not stuck in a repeating loop; instead it is progressing with new artifacts and clear next-step blockers. Multiple distinct actions and responses have occurred (spec generation, ingestion artifact delivery, verifier draft) rather than repeated identical requests/response


[ORCHESTRATOR:instruction]

Coder — please do the following within 3 business days: (1) grant read (and if allowed, write) access to the project repo OR upload the repo zip to the shared storage location and share a presigned URL; (2) provide storage path and credentials (S3/GCS) for dataset outputs and annotations or a service account with write access; (3) supply compute details: GPU types (model names), counts, per-GPU memory, CUDA/driver versions, vCPU/RAM, disk space, whether preemptible/spot instances are acceptable, and an estimated monthly budget for experiments; (4) confirm whether Docker and CI runners are allowed and whether you permit automated sandboxed code execution for evaluation; (5) confirm availability of 3 annotators (in-house or approved vendor) for the Week-3 labeling task and whether third-party annotators are permitted under your privacy/compliance rules; (6) either run schema/validate_samples.py on samples/validation.jsonl and upload validation_report.json to 

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:coder]: Summary
[STREAMING:coder]:  —
[STREAMING:coder]:  current
[STREAMING:coder]:  status
[STREAMING:coder]:  and
[STREAMING:coder]:  what
[STREAMING:coder]:  I
[STREAMING:coder]:  need
[STREAMING:coder]:  from
[Progress: 11270 events, 473.2s elapsed]


[STREAMING:coder]:  you
[STREAMING:coder]: 

[STREAMING:coder]: -
[STREAMING:coder]:  I
[STREAMING:coder]:  do
[STREAMING:coder]:  not
[STREAMING:coder]:  currently
[STREAMING:coder]:  have
[STREAMING:coder]:  repository
[STREAMING:coder]:  or
[Progress: 11280 events, 473.3s elapsed]


[STREAMING:coder]:  storage
[STREAMING:coder]:  access
[STREAMING:coder]:  in
[STREAMING:coder]:  this
[STREAMING:coder]:  environment
[STREAMING:coder]: ,
[STREAMING:coder]:  so
[STREAMING:coder]:  I
[STREAMING:coder]:  cannot
[STREAMING:coder]:  (
[Progress: 11290 events, 473.4s elapsed]


[STREAMING:coder]: by
[STREAMING:coder]:  myself
[STREAMING:coder]: )
[STREAMING:coder]:  grant
[STREAMING:coder]:  repo
[STREAMING:coder]:  read
[STRE

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'Summary — current status and what I need from you\n- I do not currently have repository or storage access in this environment, so I cannot (by myself) grant repo read/write, upload artifacts to your S3/GCS, or run the validation/baseline on your infrastructure.\n- I can (1) run everything for you as soon as you provide one of the access methods below, or (2) provide ready-to-run artifacts (scripts, Dockerfile, CI workflow, and exact commands) so your CI or a member of your team can execute them immediately.\n- Below I list, for each requested item, what I can / cannot do now, what I need from you, exact formats and commands, and alternatives & timelines. If you supply credentials or a presigned upload URL, I will run the tasks and upload results within the 3-business-day window.\n\n1) Repo read/write OR upload repo zip (what I need / how you can provide)\


[AGENT:coder] assistant
  Summary — current status and what I need from you - I do not currently have repository or storage access in this environment, so I cannot (by myself) grant repo read/write, upload artifacts to your S3...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n  "is_request_satisfied": {\n    "reason": "A complete prototype spec was produced by the Generator and initial ingestion artifacts were prepared by the Coder, but several required verification and execution steps remain undone (schema validation run, baseline model runs, annotated validation set, and uploads). These outstanding tasks are blocking full satisfaction.",\n    "answer": false\n  },\n  "is_in_loop": {\n    "reason": "The conversation has repeated requests for repository access, storage/compute credentials, and annotator availability, but the repetitions are due to missing external resources rather than a circular logic loop—no identical actions are being repeated without new information.",\n    "answer": false\n  },\n  "is_progress_being_m


[ORCHESTRATOR:instruction]

Please provide the following within 3 business days so the Verifier can run validation and the team can run baseline experiments: (1) grant repository read access or upload the repo zip to a shared URL (or provide a presigned GET URL); (2) provide S3/GCS credentials or presigned PUT URLs for artifacts (validation_report.json, predictions.jsonl, results_*.json) or an upload path with write access; (3) supply compute details (GPU models & counts, per-GPU memory, vCPU/RAM, disk, whether preemptible instances are acceptable) and confirm whether Docker/CI runners are allowed and whether sandboxed code execution is permitted for evaluation; (4) either run schema/validate_samples.py on samples/validation.jsonl and upload validation_report.json, or explicitly authorize the Verifier to run it using the provided credentials; (5) confirm availability of 3 annotators (in-house or approved vendor) for Week-3 labeling and state any privacy/compliance constraints (PII rul

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:coder]: Short
[Progress: 14830 events, 529.5s elapsed]


[STREAMING:coder]:  answer
[STREAMING:coder]:  up
[STREAMING:coder]:  front
[STREAMING:coder]: 

[STREAMING:coder]: -
[STREAMING:coder]:  I
[STREAMING:coder]:  cannot
[STREAMING:coder]:  myself
[STREAMING:coder]:  create
[STREAMING:coder]:  repo
[Progress: 14840 events, 529.6s elapsed]


[STREAMING:coder]:  access
[STREAMING:coder]:  credentials
[STREAMING:coder]:  or
[STREAMING:coder]:  upload
[STREAMING:coder]:  to
[STREAMING:coder]:  your
[STREAMING:coder]:  storage
[STREAMING:coder]:  from
[STREAMING:coder]:  this
[STREAMING:coder]:  environment
[Progress: 14850 events, 529.7s elapsed]


[STREAMING:coder]: .
[STREAMING:coder]:  I
[STREAMING:coder]:  can
[STREAMING:coder]: ,
[STREAMING:coder]:  however
[STREAMING:coder]: ,
[STREAMING:coder]:  (
[STREAMING:coder]: A
[STREAMING:coder]: )
[STREAMING:coder]:  run
[Progress: 14860 events, 529.9s elapsed]


[STREAMING:coder]:  the
[STREAMING:coder]:  validation
[STREAMIN

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'Short answer up front\n- I cannot myself create repo access credentials or upload to your storage from this environment. I can, however, (A) run the validation and baselines and upload results as soon as you provide either (i) temporary credentials (S3 or GCS service account key), (ii) presigned PUT URLs for the output files, or (iii) a direct HTTP(S) GET URL to the repo zip so I can pull the repo and run locally. If you provide any one of those within 3 business days I will run the tasks and upload artifacts within the time windows below.\n- If you cannot or will not provide credentials, I will deliver ready-to-run artifacts (scripts, Dockerfile, GH Actions workflow, and exact commands) within 24 hours so your verifier or CI can execute them locally.\n\nBelow I list each requested item, what I can and cannot do here, exactly what I need from you (format 


[Progress: 17620 events, 560.9s elapsed]



INFO:agent_framework._workflows._runner:Completed superstep 9
INFO:agent_framework._workflows._runner:Starting superstep 10
INFO:agent_framework._workflows._magentic:Magentic Orchestrator: Inner loop - round 6
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'user'}, 'contents': [{'type': 'text', 'text': "I'm build a ai system that help reasoning and problem parsing capabilities. "}], 'additional_properties': {}}
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': "\nWe are working to address the following user request:\n\nI'm build a ai system that help reasoning and problem parsing capabilities. \n\n\nTo answer this request we have assembled the following team:\n\n- coder: Writes and executes code to perform calculations, data analysis, and computational tasks.\n- verifier: Validates outputs, checks assumptions, and confirms work meets requirements.\n- generator: Synthesizes


[AGENT:coder] assistant
  Short answer up front - I cannot myself create repo access credentials or upload to your storage from this environment. I can, however, (A) run the validation and baselines and upload results as soon ...

--------------------------------------------------------------------------------


INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"
INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': '{\n  "is_request_satisfied": {\n    "reason": "The generator produced a concrete scope & success-criteria spec and the coder produced ingestion scripts, schema, and sample files. However core verification and baseline experiments (schema validation run, baseline model inference, annotated validation set) have NOT been executed because required repository/storage/compute access and annotator confirmations are missing. Those remaining execution steps are necessary to consider the request fully satisfied.",\n    "answer": false\n  },\n  "is_in_loop": {\n    "reason": "There have been repeated requests for repository/storage/compute access and repeated offers of ready-to-run artifacts, but this is an access/permissions bottleneck rather than a cyclical repet


[ORCHESTRATOR:instruction]

We are blocked on access and execution. Please do one of the following within 3 business days: (A) Grant repo access OR provide a presigned GET URL for the repo zip, AND provide S3 or GCS write access (temporary creds or presigned PUT URLs) for these outputs: validation_report.json, predictions.jsonl, results_arithmetic.json; OR (B) if you cannot share credentials, run these commands locally and upload the outputs to the agreed storage path: 1) pip install -r requirements.txt; 2) python3 schema/validate_samples.py samples/validation.jsonl --output validation_report.json; 3) run the provided run_baseline.py on 50 examples (model google/flan-t5-small or other specified) to produce predictions.jsonl and results_arithmetic.json; 4) upload files to S3/GCS or add them to a PR. Also supply compute specs (GPU model(s) & counts, per-GPU memory, vCPU/RAM, disk space, whether preemptible instances are acceptable), confirm whether Docker/CI and sandboxed code execution

INFO:httpx:HTTP Request: POST https://fleetw7ot.openai.azure.com/openai/v1/responses "HTTP/1.1 200 OK"



[STREAMING:coder]: Short
[STREAMING:coder]:  direct
[STREAMING:coder]:  answer
[STREAMING:coder]: 

[Progress: 17630 events, 590.3s elapsed]


[STREAMING:coder]: -
[STREAMING:coder]:  I
[STREAMING:coder]:  cannot
[STREAMING:coder]:  create
[STREAMING:coder]:  repo
[STREAMING:coder]:  credentials
[STREAMING:coder]: ,
[STREAMING:coder]:  access
[STREAMING:coder]:  your
[STREAMING:coder]:  S
[Progress: 17640 events, 590.4s elapsed]


[STREAMING:coder]: 3
[STREAMING:coder]: /G
[STREAMING:coder]: CS
[STREAMING:coder]: ,
[STREAMING:coder]:  or
[STREAMING:coder]:  run
[STREAMING:coder]:  jobs
[STREAMING:coder]:  in
[STREAMING:coder]:  your
[STREAMING:coder]:  infrastructure
[Progress: 17650 events, 590.5s elapsed]


[STREAMING:coder]:  from
[STREAMING:coder]:  this
[STREAMING:coder]:  environment
[STREAMING:coder]: .
[STREAMING:coder]:  I
[STREAMING:coder]:  can
[STREAMING:coder]: ,
[STREAMING:coder]:  however
[STREAMING:coder]: ,
[STREAMING:coder]:  immediately
[Progress: 17660 events, 590.

INFO:agent_framework:{'type': 'chat_message', 'role': {'type': 'role', 'value': 'assistant'}, 'contents': [{'type': 'text', 'text': 'Short direct answer\n- I cannot create repo credentials, access your S3/GCS, or run jobs in your infrastructure from this environment. I can, however, immediately (A) provide the repo zip for you to download, (B) provide exact scripts, commands, and expected output formats so your verifier or CI can run validation and baseline locally, and (C) run the tasks for you and upload results if you supply either temporary S3/GCS credentials or presigned PUT URLs and a presigned GET for the repo zip (or a runner/VM I can SSH into). If you provide credentials/presigned URLs now I will run validation within 24 hours and the baseline within 48 hours.\n\n1) Repo download\n- I prepared a repo zip previously. If you want me to upload a fresh zip here I will, or you can provide a presigned GET and I will fetch it. If you want me to run validation/baseline, give either:\n


[Progress: 20390 events, 628.6s elapsed]



INFO:agent_framework._workflows._runner:Completed superstep 11
INFO:agent_framework._workflows._runner:Starting superstep 12
ERROR:agent_framework._workflows._magentic:Magentic Orchestrator: Max round count reached
INFO:agent_framework._workflows._runner:Completed superstep 12
INFO:agent_framework._workflows._runner:Workflow completed after 12 supersteps



[AGENT:coder] assistant
  Short direct answer - I cannot create repo credentials, access your S3/GCS, or run jobs in your infrastructure from this environment. I can, however, immediately (A) provide the repo zip for you to do...

--------------------------------------------------------------------------------

FINAL RESULT:

✓ Workflow completed successfully!

Short direct answer
- I cannot create repo credentials, access your S3/GCS, or run jobs in your infrastructure from this environment. I can, however, immediately (A) provide the repo zip for you to download, (B) provide exact scripts, commands, and expected output formats so your verifier or CI can run validation and baseline locally, and (C) run the tasks for you and upload results if you supply either temporary S3/GCS credentials or presigned PUT URLs and a presigned GET for the repo zip (or a runner/VM I can SSH into). If you provide credentials/presigned URLs now I will run validation within 24 hours and the baseline within

## Step 14: Using `.as_agent()` for Composition

The workflow can be wrapped as a reusable agent using `.as_agent()`. This allows:

- **Composition**: Use this workflow as a participant in larger workflows
- **Reusability**: Call the same workflow multiple times with different tasks
- **Transcript access**: Get structured message history from the workflow execution

This is powerful for building hierarchical agent systems where complex workflows become modular components.

In [None]:
print("\n" + "=" * 80)
print("Demonstrating .as_agent() wrapper pattern...")
print("=" * 80)

# Wrap the workflow as an agent
workflow_agent = workflow.as_agent(name="MultiRoleWorkflowAgent")

# Execute through the agent interface
agent_result = await workflow_agent.run(task)

# Display the transcript
if agent_result.messages:
    print("\n===== Workflow Transcript =====\n")
    for i, msg in enumerate(agent_result.messages, start=1):
        role_value = getattr(msg.role, "value", msg.role)
        speaker = msg.author_name or role_value
        message_preview = (msg.text or "")[:150] + "..." if len(msg.text or "") > 150 else (msg.text or "")
        print(f"{'-' * 80}")
        print(f"Message {i:02d} [{speaker}]")
        print(f"{message_preview}")
    print(f"{'-' * 80}")
    print(f"\n✓ Total messages in transcript: {len(agent_result.messages)}")
else:
    print("No messages in transcript.")

## Optional: Customize Manager Parameters

You can experiment with different manager configurations to see how they affect workflow behavior:

- **Increase `max_round_count`** for more complex tasks requiring longer conversations
- **Decrease `max_stall_count`** to trigger replanning faster when agents aren't making progress
- **Add custom manager instructions** to guide orchestration behavior

Uncomment and run the cell below to try a different configuration.

In [None]:
# # Example: Build workflow with custom manager configuration
# custom_workflow = (
#     MagenticBuilder()
#     .participants(
#         planner=planner_agent,
#         executor=executor_agent,
#         coder=coder_agent,
#         verifier=verifier_agent,
#         generator=generator_agent,
#     )
#     .with_standard_manager(
#         chat_client=OpenAIChatClient(model_id="gpt-5-mini"),
#         max_round_count=20,  # More rounds for complex tasks
#         max_stall_count=2,   # Faster replanning
#         max_reset_count=1,   # Fewer resets
#         instructions="Focus on efficiency. Always verify calculations before generating final outputs.",
#     )
#     .build()
# )
#
# print("✓ Custom workflow configured with modified parameters")

## Optional: Change Logging Level

Switch between logging levels to control output verbosity:

- **`DEBUG`**: See all manager decisions, agent selections, and internal state changes
- **`INFO`**: See workflow progress and key events
- **`WARNING`**: See only warnings and errors

Run the cell below to change the logging level, then re-execute the workflow cells above.

In [None]:
# Change logging level (re-run workflow cells after changing this)
# logging.basicConfig(level=logging.DEBUG, force=True)  # Detailed internal logs
logging.basicConfig(level=logging.INFO, force=True)    # Clean progress logs
# logging.basicConfig(level=logging.WARNING, force=True)  # Warnings only

print("✓ Logging level updated")

## Troubleshooting Common Issues

### API Key Not Found
```
Error: OpenAI API key not found
```
**Solution**: Set the `OPENAI_API_KEY` environment variable before running the notebook.

### Model Not Available
```
Error: Model 'gpt-5-mini' does not exist
```
**Solution**: The notebook defaults to `gpt-5-mini` for every role. If this model is unavailable in your account, choose an accessible alternative and update the agent constructors. Common substitutes include:
- `gpt-4o` for high-quality reasoning
- `gpt-4o-mini` for lower latency and cost
- `gpt-4o-reasoning` or other reasoning-tier models if enabled for your workspace

### Rate Limits
```
Error: Rate limit exceeded
```
**Solution**: The workflow makes multiple API calls. If you hit rate limits:
- Use lower-tier models (e.g., `gpt-4o-mini` instead of `gpt-5-mini`)
- Reduce `max_round_count` to limit conversation length
- Add delays between workflow executions

### Workflow Stalls or Times Out
```
Warning: Max stall count reached
```
**Solution**: The manager detected agents aren't making progress:
- Simplify the task
- Adjust `max_stall_count` to allow more attempts
- Check agent instructions for clarity
- Review debug logs to see which agent is stalling

### Code Execution Fails
```
Error: Code execution failed
```
**Solution**: `HostedCodeInterpreterTool` requires:
- OpenAI Responses API or Azure OpenAI with code interpreter enabled
- Cannot execute arbitrary local code (runs in OpenAI's sandbox)
- Some operations (file I/O, network access) may be restricted

## Next Steps

Experiment with this workflow by:

1. **Modifying agent instructions** - Change role prompts to adjust behavior
2. **Adding new agents** - Include specialists like a ResearcherAgent or CriticAgent
3. **Trying different tasks** - Test with various complexity levels
4. **Adjusting manager parameters** - Tune `max_round_count`, `max_stall_count` for your use case
5. **Composing workflows** - Use `.as_agent()` to nest workflows within larger systems
6. **Switching models** - Compare performance across different model tiers

For more examples, see:
- `magentic_workflow_as_agent.py` - Detailed streaming example with researcher + coder
- `group_chat_workflow_as_agent.py` - Simpler group chat pattern
- Agent Framework documentation for advanced orchestration patterns