# CSC 480-F25 Lab 2: Task Decomposition, Multi‑Agent Design Patterns, and Communication Protocols

# Authors:

***Anthony Man, Ravi Panchal***

California Polytechnic State University, San Luis Obispo;

Computer Science & Software Engineering Department

# Overview

This lab focuses on:
- Breaking down complex problems into smaller tasks (task decomposition)
- Defining specialized agents with clear roles and responsibilities  
- Choosing appropriate multi-agent design patterns
- Specifying communication protocols between agents
- Implementing a three-agent system using AutoGen

NOTE: If you already know what you're going to do for your final project, you may use this lab to make progress on it. Otherwise, treat this lab as a distinct exercise.

## Learning Objectives

By the end of this lab, you will be able to:

- Decompose a complex problem into a comprehensive set of constituent tasks
- Identify and define the roles, responsibilities, and abilities of individual agents
- Select and justify an appropriate multi-agent design pattern (Manager‑Worker, Sequential Pipeline, Collaborative Team)
- Explain and apply foundational agent communication protocols at a high level (MCP, A2A)
- Start thinking about the architecture of your project (agents, tasks, design pattern, and interaction protocols)

# Part 1: Task Decomposition and Agent Design

## 1. Problem Statement & Task Decomposition

**Main Problem:** PSA card grading

**Task Breakdown:** List the steps and sub-tasks needed to achieve the goal:
1. Task 1: Identify card and perform image processing
  * Subtask 1: Center card
  * Subtask 2: Improve image quality
  * Subtask 3: Cut unnecessary background
2. Task 2: Grade edges quality
3. Task 3: Grade surface quality
4. Task 4: Grade corner quality
5. Task 5: Grade centering quality
6. Task 6: Combine scores for each aspect for overall score

## 2. Agent Definition

Define your primary agents with clear roles and responsibilities:

### Agent 1: Image_processor
- **Role:** Image Processing
- **Responsibilities:** Performs task 1, ensuring that the image is suitable for the other agents to properly grade the card
- **Inputs:** Raw image of card
- **Outputs:** Processed image of card
- **Success Criteria:** Image is clear of noise and card is centered. May also play with brightness.

### Agent 2: Edge_grader
- **Role:** grade edges
- **Responsibilities:** Performs task 2, providing a grade for edge quality
- **Inputs:** Processed image from Image_processor
- **Outputs:** Grade out of 10 based on PSA standards
- **Success Criteria:** If grade closely aligns with what a human grader would give.

### Agent 3: Surface_grader
- **Role:** grade surface
- **Responsibilities:** Performs task 3, providing a grade for surface quality
- **Inputs:** Processed image from Image_processor
- **Outputs:** Grade out of 10 based on PSA standards
- **Success Criteria:** If grade closely aligns with what a human grader would give.

### Agent 4: Corner_grader
- **Role:** grade corner
- **Responsibilities:** Performs task 4, providing a grade for corner quality
- **Inputs:** Processed image from Image_processor
- **Outputs:** Grade out of 10 based on PSA standards
- **Success Criteria:** If grade closely aligns with what a human grader would give.

### Agent 5: Centering_grader
- **Role:** grade centering
- **Responsibilities:** Performs task 5, providing a grade for centering quality
- **Inputs:** Processed image from Image_processor
- **Outputs:** Grade out of 10 based on PSA standards
- **Success Criteria:** If grade closely aligns with what a human grader would give.

### Agent 6: Overall_grader
- **Role:** Provide overall grade
- **Responsibilities:** Performs task 6 by combining output from Agents 2-5 and outputing an appropriate overall grade for the card
- **Inputs:** A vector of grades from Agents 2-5
- **Outputs:** A grade out of 10 based on PSA standards
- **Success Criteria:** If grade closely aligns with what a human grader would give.

## 3. Design Pattern Selection

**Chosen Pattern:** Sequential Pipeline (with Agents 2-5 being parallel for efficiency)

**Justification:** We chose sequential pipeline since Agents 2-5 have to wait for Agent 1, and Agent 6 has to wait for Agents 2-5. Each Agent does it's own thing, which is why a collaborative team pattern does not work, and none of the Agents oversees other Agent's tasks, so manager-worker doesn't make sense either. Agents 2-5 can work in parallel.


## 4. Communication Design

### Model Context Protocol (MCP)
**Shared Context & Tools:**


* Context: Each agent shares its output so that other agents can use it
* Communication Mechanism: outputs are published to a shared memory
* Acess Rules: Agents can read the outputs of previous Agents once the data is available
* Schemas: Images are in standard image formats, Grades are in numeric vectors (with values between 0-10)

## 5. Interaction Diagram

[Create a flowchart or sequence diagram showing agents, your chosen pattern, and MCP/A2A interactions. You can use text-based diagrams, draw by hand and insert an image, or use diagramming tools.]

```
User → Agent1 → Agents2-5 → Agent6 → User
  ↓       ↓       ↓           ↓
 MCP    MCP     MPC         Final
Tool   image    grades      Output
```

# Part 2: Three-Agent AutoGen Implementation

## Environment Setup

Install required packages and set up Azure OpenAI configuration.

In [6]:
%pip install "autogen-core" "autogen-agentchat" "autogen-ext[openai,azure]"



In [7]:
import os
import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.ui import Console
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from google.colab import userdata

In [8]:
# This is the same as in L1
azure_deployment = "gpt-5-mini"
api_version = "2024-12-01-preview"  # If you're using gpt-5-mini as demonstrated

# e.g. https://ccoha-mfoynknp-eastus2.cognitiveservices.azure.com/
azure_endpoint = "https://csc480resource1.openai.azure.com/"

## An Example Three-Agent System Architecture

Based on the suggested roles from the overview:
- **Planner**: Decomposes the user goal into a concise, ordered plan
- **Implementer**: Executes the steps from the plan
- **Critic/Integrator**: Reviews output and suggests improvements or approves completion

In [9]:
async def setup_three_agent_system():
    """Set up the three-agent system with Planner, Implementer, and Critic roles"""

    # Create the Azure OpenAI client
    client = AzureOpenAIChatCompletionClient(
        azure_deployment=azure_deployment,
        model="gpt-5-mini",
        api_version=api_version,
        azure_endpoint=azure_endpoint,
        # api_key=os.getenv("AZURE_SUBSCRIPTION_KEY"),
        api_key = userdata.get("AZURE_SUBSCRIPTION_KEY")
    )

    # Define the three agents with specific system prompts
    planner = AssistantAgent(
        name="Planner",
        model_client=client,
        system_message="""You are a task decomposition specialist. Your role is to:
        1. Break down complex user goals into clear, numbered steps
        2. Define specific inputs/outputs for each step
        3. Provide testable acceptance criteria for each step
        4. Present plans in a structured, actionable format

        Always end your response with 'PLAN_COMPLETE' when finished."""
    )

    implementer = AssistantAgent(
        name="Implementer",
        model_client=client,
        system_message="""You are an implementation specialist. Your role is to:
        1. Follow the Planner's steps methodically
        2. Address each step with concrete outputs/artifacts
        3. Provide detailed solutions that meet the specified criteria
        4. Ask for clarification if any step is unclear

        Always end your response with 'IMPLEMENTATION_COMPLETE' when finished."""
    )

    critic = AssistantAgent(
        name="Critic",
        model_client=client,
        system_message="""You are a quality assurance specialist. Your role is to:
        1. Review the Implementer's work against the Planner's criteria
        2. Identify gaps, errors, or missing elements
        3. Suggest specific improvements if needed
        4. Only approve with 'APPROVED' if all criteria are fully met

        Be thorough but constructive in your feedback."""
    )

    return planner, implementer, critic

In [10]:
async def run_three_agent_workflow(task_description):
    """Run the three-agent workflow for a given task"""

    # Set up the agents
    planner, implementer, critic = await setup_three_agent_system()

    # Create termination condition - look for "APPROVED" in messages
    # See https://microsoft.github.io/autogen/stable//reference/python/autogen_agentchat.base.html#autogen_agentchat.base.TerminationCondition
    termination = TextMentionTermination("APPROVED")

    # Create a group chat with the three agents
    # The RoundRobinGroupChat will manage turn-taking between agents
    # See https://microsoft.github.io/autogen/stable//reference/python/autogen_agentchat.teams.html#autogen_agentchat.teams.RoundRobinGroupChat
    team = RoundRobinGroupChat([planner, implementer, critic], termination_condition=termination)

    # Run the workflow
    print("Starting three-agent workflow...")
    print("=" * 60)

    # Start the conversation
    result = await Console(
        team.run_stream(task=task_description)
    )

    return result

## Example Task 1: Analytical Task

Test the three-agent system with a small example analytical task.

In [15]:
# Test Task 1: Compare sorting algorithms
analytical_task = """
Compare the pros and cons of Quick Sort vs Merge Sort algorithms.
Provide a structured analysis that includes:
- Time and space complexity for both
- Best/worst case scenarios
- Practical use cases for each
- A final recommendation with justification
"""

# Uncomment to run when ready:
await run_three_agent_workflow(analytical_task)

Starting three-agent workflow...
---------- TextMessage (user) ----------

Compare the pros and cons of Quick Sort vs Merge Sort algorithms.
Provide a structured analysis that includes:
- Time and space complexity for both
- Best/worst case scenarios
- Practical use cases for each
- A final recommendation with justification

---------- TextMessage (Planner) ----------
1) Step 1 — High-level comparison (purpose and summary)
- Inputs: names of algorithms (Quick Sort, Merge Sort)
- Outputs: one-paragraph summary comparing their roles and tradeoffs
- Acceptance criteria (testable): output contains at least one clear sentence describing the main tradeoff (speed/space vs stability/predictability)
Analysis:
- Quick Sort: divide-and-conquer, in-place partitioning, typically fastest in-memory comparison sort because of low constant factors and good cache locality. Not stable by default and has worst-case O(n^2) unless mitigated.
- Merge Sort: divide-and-conquer that merges sorted halves, stable

TaskResult(messages=[TextMessage(id='bde8ca05-183a-48ce-883f-5f7721356f7a', source='user', models_usage=None, metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 34, 47, 843742, tzinfo=datetime.timezone.utc), content='\nCompare the pros and cons of Quick Sort vs Merge Sort algorithms.\nProvide a structured analysis that includes:\n- Time and space complexity for both\n- Best/worst case scenarios\n- Practical use cases for each\n- A final recommendation with justification\n', type='TextMessage'), TextMessage(id='c2ad1482-3f63-467c-90f4-b8e39084dba6', source='Planner', models_usage=RequestUsage(prompt_tokens=140, completion_tokens=1976), metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 35, 13, 274382, tzinfo=datetime.timezone.utc), content='1) Step 1 — High-level comparison (purpose and summary)\n- Inputs: names of algorithms (Quick Sort, Merge Sort)\n- Outputs: one-paragraph summary comparing their roles and tradeoffs\n- Acceptance criteria (testable): output contains a

## Example Task 2: Creative Task

Test the three-agent system with a creative writing task.

In [16]:
# Test Task 2: Creative writing with constraints
creative_task = """
Write a 150-word summary of the benefits of renewable energy that:
- Uses exactly 3 specific statistics or data points
- Includes at least 2 types of renewable energy sources
- Has a compelling call-to-action in the final sentence
- Maintains an optimistic but factual tone throughout
"""

# Uncomment to run when ready:
await run_three_agent_workflow(creative_task)

Starting three-agent workflow...
---------- TextMessage (user) ----------

Write a 150-word summary of the benefits of renewable energy that:
- Uses exactly 3 specific statistics or data points
- Includes at least 2 types of renewable energy sources
- Has a compelling call-to-action in the final sentence
- Maintains an optimistic but factual tone throughout

---------- TextMessage (Planner) ----------
1) Define requirements
- Input: user prompt specifying: 150-word summary; exactly 3 specific statistics/data points; at least 2 renewable energy types; final sentence is a compelling call-to-action; optimistic but factual tone.
- Output: explicit checklist of constraints to satisfy.
- Acceptance criteria (testable):
  - Checklist includes all items from user prompt.
  - Constraints are clear and unambiguous for drafting and validation.

2) Draft the 150-word summary
- Input: constraint checklist from Step 1.
- Output: final text (exactly 150 words) satisfying constraints.
- Acceptance cri

TaskResult(messages=[TextMessage(id='51ff2b0e-1411-42bc-8840-479fa173d1d3', source='user', models_usage=None, metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 36, 6, 132885, tzinfo=datetime.timezone.utc), content='\nWrite a 150-word summary of the benefits of renewable energy that:\n- Uses exactly 3 specific statistics or data points\n- Includes at least 2 types of renewable energy sources\n- Has a compelling call-to-action in the final sentence\n- Maintains an optimistic but factual tone throughout\n', type='TextMessage'), TextMessage(id='bed5a473-4a14-41f0-8a6a-882bcaa0681e', source='Planner', models_usage=RequestUsage(prompt_tokens=150, completion_tokens=3229), metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 36, 44, 27285, tzinfo=datetime.timezone.utc), content='1) Define requirements\n- Input: user prompt specifying: 150-word summary; exactly 3 specific statistics/data points; at least 2 renewable energy types; final sentence is a compelling call-to-action; opt

## Your Task

Write your own task for the planner-implementer-critic system

In [17]:
your_task = """
Generate a discussion between Shrek and Donkey (from Dreamwork's "Shrek") that discusses Lord Farquad's height
but they were cursed by a witch and have to speak in rhymes
"""
# Uncomment to run when ready:
await run_three_agent_workflow(your_task)

Starting three-agent workflow...
---------- TextMessage (user) ----------

Generate a discussion between Shrek and Donkey (from Dreamwork's "Shrek") that discusses Lord Farquad's height
but they were cursed by a witch and have to speak in rhymes

---------- TextMessage (Planner) ----------
1) Step 1 — Clarify constraints and success goals
   - Input: User request: "Generate a discussion between Shrek and Donkey that discusses Lord Farquaad's height; they were cursed by a witch and have to speak in rhymes."
   - Output: A short, rhyming dialogue between two labeled characters "Shrek" and "Donkey" that (a) references the witch's curse, (b) discusses Lord Farquaad's height, and (c) is original (not copying lines from the film).
   - Acceptance criteria (testable):
     - The produced text contains a clear label for each speaking turn: "Shrek:" and "Donkey:".
     - The dialogue explicitly mentions a witch's curse.
     - The dialogue explicitly refers to Lord Farquaad and comments on his 

TaskResult(messages=[TextMessage(id='97a42145-45cb-4304-b1e1-d0ba462ab2ca', source='user', models_usage=None, metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 39, 6, 219568, tzinfo=datetime.timezone.utc), content='\nGenerate a discussion between Shrek and Donkey (from Dreamwork\'s "Shrek") that discusses Lord Farquad\'s height\nbut they were cursed by a witch and have to speak in rhymes\n', type='TextMessage'), TextMessage(id='d4c7ea68-017d-4bee-9ecb-e1b57df5216c', source='Planner', models_usage=RequestUsage(prompt_tokens=132, completion_tokens=3675), metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 39, 43, 559743, tzinfo=datetime.timezone.utc), content='1) Step 1 — Clarify constraints and success goals\n   - Input: User request: "Generate a discussion between Shrek and Donkey that discusses Lord Farquaad\'s height; they were cursed by a witch and have to speak in rhymes."\n   - Output: A short, rhyming dialogue between two labeled characters "Shrek" and "Donkey" th

In [12]:
your_task = """
Compare AMD and Nvidia products in terms of:
- pricing
- performance
- overall value according to current customers
"""
# Uncomment to run when ready:
await run_three_agent_workflow(your_task)

Starting three-agent workflow...
---------- TextMessage (user) ----------

Compare AMD and Nvidia products in terms of:
- pricing
- performance
- overall value according to current customers

---------- TextMessage (Planner) ----------
1) Clarify scope and target buyer profiles
- Action: agree which product families and market segments to compare (consumer GPUs only? also workstation/professional GPUs? include integrated GPUs? include AMD CPUs vs Nvidia GPUs? mobile vs desktop?). Also confirm currency and region for pricing (USD / US retailers, EUR / EU, etc.) and buyer types to prioritize (gamers at 1080p/1440p/4K, content creators, AI/ML users, miners).
- Inputs: your confirmation of scope (list of product families/segments and currency/region and buyer types).
- Outputs: a final scope statement listing SKUs/segments to analyze and buyer personas to target.
- Acceptance criteria: you explicitly confirm scope and region/currency. If not confirmed, default to: desktop discrete consumer

TaskResult(messages=[TextMessage(id='a86cbd56-62f3-4760-8504-8a00b21079ac', source='user', models_usage=None, metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 54, 34, 711041, tzinfo=datetime.timezone.utc), content='\nCompare AMD and Nvidia products in terms of:\n- pricing\n- performance\n- overall value according to current customers\n', type='TextMessage'), TextMessage(id='599957c3-283b-4b18-90e9-53229466ddb5', source='Planner', models_usage=RequestUsage(prompt_tokens=114, completion_tokens=2437), metadata={}, created_at=datetime.datetime(2025, 10, 1, 22, 55, 1, 598977, tzinfo=datetime.timezone.utc), content='1) Clarify scope and target buyer profiles\n- Action: agree which product families and market segments to compare (consumer GPUs only? also workstation/professional GPUs? include integrated GPUs? include AMD CPUs vs Nvidia GPUs? mobile vs desktop?). Also confirm currency and region for pricing (USD / US retailers, EUR / EU, etc.) and buyer types to prioritize (gamers at

NOTE: You'll notice that the output might contain duplicated text, that is because `Console` prints when a text message is sent and received.

## Reflection & Analysis

### What worked well?
It looked like the model did better when there were very clear constraints on the prompt. The it succesfully performed the two example tasks, which had more detail in it's instructions, and yielded an output that the critic was happy with.

### What struggled?
It struggled with tasks that layed out a general problem statement, but didn't necessarily give all the guardrails. In our tasks that we made up, we did not thoroughly specify the output format, and it seems as though the model struggled on these. In our Shrek scenario, the critic was not satisfied with the implementer's performance. In the AMD vs Nvidia prompt, the implementer did not execute the planner's plan fully, but instead asked for clarification. Somehow the critic was satisfied with the implementer's output.

### Potential improvements:
The system may be more affective if the agents were able to rerun their processes as requested if a later agent requests it (for example, when the implementer asks for clarification, the planner should be able to clarify. Another example is if the critic is not satisfied, the implementer should be able to rework it's solution given the feedback). This would allow for the system to always produce complete and satisfactory responses (if such a response is possible). If each agent was able to also be more concise, that would likely improve performance as well.

# Optional Extensions

## Adding Tools (Optional Challenge)

Consider adding a simple tool function that agents can use, such as a calculator or text analyzer.

In [None]:
# Example tool function (uncomment and modify as needed)
# def calculator(expression: str) -> float:
#     """Simple calculator tool for basic mathematical operations"""
#     try:
#         # WARNING: eval() should not be used in production - use a proper math parser
#         result = eval(expression)
#         return result
#     except Exception as e:
#         return f"Error: {str(e)}"

# To integrate tools with AutoGen agents, you would typically:
# 1. Define tool schemas following MCP-like patterns
# 2. Register tools with specific agents
# 3. Update system prompts to mention available tools
# 4. Handle tool calls in the conversation flow

# Summary and Next Steps

## Key Takeaways

- **Task Decomposition**: [Your insights about breaking down complex problems]
- **Agent Design**: [What you learned about defining agent roles and responsibilities]  
- **Communication Protocols**: [Understanding of MCP and A2A interactions]

## References

- [AutoGen Documentation](https://microsoft.github.io/autogen/stable/index.html)
- [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro)
- [Agent-to-Agent Protocol](https://a2a-protocol.org/latest/)
- L2_overview.md (Lab requirements and detailed explanations)