# CSC 480-F25 Lab 3: Agentic Heuristic Search (NYT Spelling Bee)

# Authors:

***Arnav Bhola, Pranav Krishna***

California Polytechnic State University, San Luis Obispo;

Computer Science & Software Engineering Department

# Overview

This lab focuses on:
- Integrating a provided generalized search engine as a tool inside an agentic workflow
- Designing an agentic heuristic system that collaborates to estimate $h(n)$ for NYT Spelling Bee states
- Implementing a custom cost function $g(n)$ and evaluating search strategies (Uniform Cost, A*)
- Coordinating agent communication via MCP-style tool exposure and A2A interactions
- Reflecting on how agentic heuristics complement classical search methods

NOTE: The Spelling Bee problem definition and generalized search function are provided for you. Your primary work is to wire them into your agentic solution and iterate on the heuristic design (see part 2 of this notebook).

## Learning Objectives

By the end of this lab, you will be able to:

- Integrate a provided generalized search function as a tool within an AutoGen-based agentic system
- Design and implement an agent team that produces numeric heuristic estimates to guide search
- Define and justify a cost function that complements your heuristic in the Spelling Bee domain
- Specify MCP-style tool schemas and A2A message flows for heuristic collaboration
- Analyze how different heuristic strategies impact search quality, cost, and convergence

# Part 1: Agentic Heuristic Design and Planning

## 1. Problem Statement & Search Context

**Provided Problem:** NYT Spelling Bee puzzle instance (letters, required center letter, dictionary utilities)

**Goal:** Integrate the provided Spelling Bee problem specification with the generalized search engine and your agentic heuristic.

**Task Breakdown:** Outline the high-level steps you will take to reach a working solution.
1. Configure SpellingBeeProblem with letters and required center letter
2. Expose generalized_search by implementing cost_fn and heuristic_fn wrappers
3. Design 2-3 specialized agents that analyze different aspects of state quality
4. Use RoundRobinGroupChat to coordinate agents and aggregate scores
5. Test on sample puzzles and iterate based on expansion counts

## 2. Agentic Heuristic Team Definition

Define the agents who collaborate to estimate $h(n)$ for a given Spelling Bee state. Feel free to use more, or fewer.

### Agent 1: FeasibilityAnalyst
- **Role:** Constraint checker and dictionary validator
- **Responsibilities:** 
  - Check if state contains only allowed letters
  - Verify required letter is present
  - Assess if current prefix exists in valid word prefixes
- **Inputs:** Current state (partial word), allowed letters, required letter
- **Outputs:** Score 0-10 reflecting constraint satisfaction, rationale
- **Success Criteria:** Correctly identifies invalid prefixes that can't lead to solutions

### Agent 2: CompletenessEstimator  
- **Role:** Distance-to-goal estimator
- **Responsibilities:**
  - Estimate how many more letters needed to form valid word
  - Consider minimum word length (4 letters)
  - Assess likelihood of common word patterns
- **Inputs:** Current state length, min word length, letter usage patterns
- **Outputs:** Score 0-10 (higher = closer to completion), rationale
- **Success Criteria:** Provides lower scores for short states, higher for near-complete words

### Agent 3: HeuristicAggregator
- **Role:** Score coordinator and combiner
- **Responsibilities:**
  - Collect scores from FeasibilityAnalyst and CompletenessEstimator
  - Weight and combine scores into single h(n)
  - Ensure admissibility (never overestimate distance to goal)
- **Inputs:** Sub-agent scores and rationales
- **Outputs:** Single numeric FINAL_SCORE
- **Success Criteria:** Returns consistent, numeric heuristic values

## 3. Tool Integration & Coordination Pattern

**Chosen Pattern:** Manager-Worker with Sequential Coordination

**Justification:**
- Manager (HeuristicAggregator) coordinates two specialist workers
- Sequential because feasibility must be checked before estimating completeness
- Simpler than full collaborative team, reduces message overhead
- Clear responsibility boundaries minimize conflicts

**Integration Plan:** Outline how you will connect the provided components.
- generalized_search calls heuristic_fn(state) whenever it needs h(n)
- heuristic_fn creates RoundRobinGroupChat with 3 agents
- Agents discuss state in sequence, aggregator produces FINAL_SCORE
- cost_fn returns len(next_state) (each letter added costs 1)
- Search maintains frontier/explored set; we just provide callbacks

## 4. Communication Design

### Model Context Protocol (MCP)
**Tool Schema:** generalized_search exposed as async function
- Parameters: problem (SpellingBeeProblem), cost_fn (callable), heuristic_fn (async callable), strategy (str)
- Returns: SearchResult with success, goal_state, actions, cost, expansions

### Agent-to-Agent (A2A) Interactions

Describe the kind of communications you expect from your system.

#### Interaction 1: HeuristicAggregator → FeasibilityAnalyst
- **Purpose:** Request constraint analysis for current state
- **Key Fields:** state string, allowed letters, required letter
- **Message Format:** "Evaluate feasibility for state: {state}"

#### Interaction 2: HeuristicAggregator → CompletenessEstimator  
- **Purpose:** Request distance-to-goal estimate
- **Key Fields:** state string, state length, min required length
- **Message Format:** "Estimate completeness for state: {state}"

#### Interaction 3: Sub-agents → HeuristicAggregator
- **Purpose:** Return partial scores with reasoning
- **Key Fields:** Numeric score, rationale text with "SCORE: X.X"
- **Message Format:** Text ending with "SCORE: <float>"

*(Add more interactions as needed.)*

# Part 2: Integrating the Generalized Search Tool

## Environment Setup

Install required packages and configure model access before running the agentic heuristic experiments.

In [1]:
%pip install "autogen-core" "autogen-agentchat" "autogen-ext[openai,azure]" "python-dotenv"

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
import asyncio
from dataclasses import asdict

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.conditions import TextMentionTermination
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.base import TaskResult
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient

from utils import SpellingBeeProblem, SearchResult, generalized_search

In [3]:
from pathlib import Path
from dotenv import load_dotenv

In [4]:
cwd = Path.cwd()
env_path = cwd.parent / ".env"
loaded = load_dotenv(env_path)

In [5]:
# Just like in the other labs
azure_deployment = os.getenv("AZURE_DEPLOYMENT_NAME")
api_version = "2024-12-01-preview" 
azure_endpoint = os.getenv("AZURE_ENDPOINT")
# Expect AZURE_SUBSCRIPTION_KEY to be set in environment variables

## Agentic Heuristic Architecture Blueprint

Based on the overview, sketch out how your heuristic agents and the generalized search tool interact:
- **Feasibility Agent**: Evaluates constraint satisfaction and dictionary viability.
- **Completeness Agent**: Estimates remaining effort to reach a valid Spelling Bee solution.
- **Score Aggregator Agent**: Combines the scores and analysis of other agents into a final score.
- *(Optional)* Additional agents for scoring letter diversity, pangram potential, etc.

NOTE: The below system is just a example. Feel free to make it your own as you designed in part 1.

In [6]:
def setup_agentic_heuristic_system():
    """Instantiate heuristic agents for Spelling Bee state evaluation."""
    client = AzureOpenAIChatCompletionClient(
        azure_deployment=azure_deployment,
        model="gpt-5-mini",
        api_version=api_version,
        azure_endpoint=azure_endpoint,
        api_key=os.getenv("AZURE_SUBSCRIPTION_KEY"),
    )

    feasibility = AssistantAgent(
        name="FeasibilityAnalyst",
        model_client=client,
        system_message="""You analyze whether a partial word can become a valid Spelling Bee answer.
        This word must contain the required letter and only have allowed letters.
        Cost: low (bad), high (excellent). Include rationale and numeric score (0-10) line 'SCORE: <float>'.""",
    )

    completeness = AssistantAgent(
        name="CompletenessEstimator",
        model_client=client,
        system_message="""You estimate remaining effort to reach a full valid word. 
        Consider minimum word length (4 letters)
        Assess likelihood of common word patterns
        Cost: low (far) to high (close). Include rationale and numeric score (0-10 ; higher = closer to completion) line 'SCORE: <float>'.""",
    )

    aggregator = AssistantAgent(
        name="HeuristicAggregator",
        model_client=client,
        system_message="""
        You coordinate heuristic scoring for Spelling Bee nodes.
        You will:
        1. Consider analyses from collaborators.
        2. Combine their numeric scores and rationales.
        3. Ensure admissibility (never overestimate distance to goal)
        4. Return a single numeric heuristic estimate h(n).
        Always end responses with FINAL_SCORE: <float>.""",
    )

    return feasibility, completeness, aggregator

In [7]:
def cost_fn(parent_state, action, next_state) -> float:
    """
    Return the incremental cost g(n) for moving to next_state.
    Customize this to reflect letter usage, word length, or other criteria.

    Args:
        parent_state: The current sub-word.
        action: The next letter added.
        next_state: The sub-word after adding the action letter.
    Returns:
        A numeric cost value (float).
    """
    # Slightly penalize longer words to find solutions faster.
    if(len(parent_state) + len(next_state) > 7):
        return len(next_state) * 1.2
    else:
        return len(next_state)

In [8]:
async def run_agentic_search(
    spelling_bee: SpellingBeeProblem, strategy: str = "a_star"
):
    """Execute the generalized search with your cost and heuristic functions."""
    feasibility, completeness, aggregator = setup_agentic_heuristic_system()

    async def heuristic_fn(state, min_state_len=3) -> float:
        """
        Delegate to the agentic heuristic system to compute h(n) for `state`. In our problem space,
        state is a partial word. You will need to implement the logic to send messages to your agents,
        gather their responses, and compute a final numeric heuristic value.

        Args:
            state: The current state (partial word) to evaluate.
        Returns:
            A numeric heuristic estimate (float).
        """
        if len(state) < min_state_len:
            # No heuristic value for very short states
            # There isn't enough information to evaluate
            return 0.0

        # See https://microsoft.github.io/autogen/stable//reference/python/autogen_agentchat.teams.html
        team = RoundRobinGroupChat(
            [feasibility, completeness, aggregator],
            termination_condition=TextMentionTermination("FINAL_SCORE:"),
        )

        # Get the result from the team
        messages_generator = team.run_stream(
            task=f"Evaluate heuristic for state: {state}", output_task_messages=False
        )
        async for message in messages_generator:
            if isinstance(message, TaskResult):
                response = message.messages[-1].content
                break

        # Parse the numeric score from the orchestrator's response
        try:
            print([line for line in response.splitlines()])
            score_line = next(
                line for line in response.splitlines() if "FINAL_SCORE:" in line
            )
            start_idx = score_line.index("FINAL_SCORE:") + len("FINAL_SCORE:")
            end_idx = score_line.index(".", start_idx) + 1
            final_score = float(score_line[start_idx:end_idx].strip())
            return final_score
        except (StopIteration, ValueError, IndexError):
            print("Failed to parse score from orchestrator response. Defaulting to 0.0")
            return 0.0

    print(f"Running {strategy} search on: {spelling_bee}")

    result: SearchResult = await generalized_search(
        problem=spelling_bee,
        cost_fn=cost_fn,
        heuristic_fn=heuristic_fn,
        strategy=strategy,
        max_expansions=43,  # May want to set this for debugging
        verbose=True,
    )

    result_summary = asdict(result)
    print("SearchResult summary:")
    for key, value in result_summary.items():
        print(f"  {key}: {value}")

    return result

## Example Puzzle 1: Starter Configuration

Test the integrated system on a small Spelling Bee instance provided with the lab.

In [None]:
# Provided helper will create a Spelling Bee problem instance, e.g. letters="ADELOPR", center="O"
starter_problem = SpellingBeeProblem.from_letters(
    letters=["A", "D", "E", "L", "O", "P", "R"],
    required_letter="O",
)

# Uncomment to run once cost_fn and heuristic_fn are implemented
await run_agentic_search(starter_problem, strategy="a_star")

## Example Puzzle 2: Alternate Strategy Comparison

Run the same instance under Uniform Cost Search to compare behavior vs. A*.

In [None]:
# # Uncomment to compare strategies once heuristic_fn is operational
await run_agentic_search(starter_problem, strategy="uniform_cost")

Running uniform_cost search on: SpellingBeeProblem(letters='AIZGBNL', required='L', words=10)
SearchResult summary:
  success: True
  goal_state: ALAN
  actions: ['A', 'L', 'A', 'N']
  cost: 10.0
  expansions: 18
  explored: 18
  frontier_size: 8


SearchResult(success=True, goal_state='ALAN', actions=['A', 'L', 'A', 'N'], cost=10.0, expansions=18, explored=18, frontier_size=8)

## Your Experiment

Define your own Spelling Bee instance or heuristic variant and record results. This could be [today's puzzle](https://www.nytimes.com/puzzles/spelling-bee).

In [11]:
custom_problem = SpellingBeeProblem.from_letters(
    letters=["A", "I", "Z", "G", "B", "N", "L"],
    required_letter="L",
)

await run_agentic_search(custom_problem, strategy="a_star")

Running a_star search on: SpellingBeeProblem(letters='AIZGBNL', required='L', words=10)
['Combined assessment:', '', '- Both analysts agree the stem "ALA" is promising but short of the common 4-letter minimum and its viability depends on allowed/required-letter constraints.', '- CompletenessEstimator notes many common 4+ continuations (e.g., "alas", "alae"), so often only one additional letter is needed.', '- FeasibilityAnalyst highlights dependence on the required/center letter and allowed-letter set; if constraints exclude needed letters the branch may be infeasible.', '', 'Admissibility consideration (never overestimate remaining distance): to stay conservative I take the lower of the two collaborator numeric estimates so we do not overstate the remaining cost.', '', 'FINAL_SCORE: 6.0']
['Combined assessment:', '', '- Both analysts rate "ALI" as a promising 3-letter stem with strong morphological potential and many common 4+ continuations (alien, align, alike, alive, alias, alibi, e

SearchResult(success=True, goal_state='ALAN', actions=['A', 'L', 'A', 'N'], cost=10.0, expansions=18, explored=18, frontier_size=8)

## Reflection & Analysis

### Heuristic effectiveness
[Discuss where your agentic heuristic provided strong guidance.]

### Failure modes / surprises
[Document puzzles or states where the heuristic misled the search or produced high cost.]

### Cost vs. heuristic alignment
[Reflect on whether $g(n)$ and $h(n)$ share compatible units/scales.]

### Communication insights
[Explain how MCP/A2A design choices supported or hindered collaboration.]

### Future improvements
[Outline ideas for richer heuristics, additional agents, or tool integrations.]

## References

- `L3_overview.md`
- [AutoGen Documentation](https://microsoft.github.io/autogen/stable/index.html)
- [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro)
- [Agent-to-Agent Protocol](https://a2a-protocol.org/latest/)
- [NYT Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee)