Skip to content

Implement Observation-Decision Loop Test (End-to-End Pipeline) #28

@justinmadison

Description

@justinmadison

Overview

Implement a test to validate the observation-decision loop, which is the missing piece in our end-to-end pipeline testing. This test will verify that game observations can be sent to the Python backend, processed into decisions, and returned to Godot - without executing actual agent movement.

Current Status

Already Tested ✅

  • Backend connectivity (test_autoload_services.gd)
  • Tool execution (test_tool_execution.gd)
  • HTTP communication (test_tool_execution_simple.tscn)

Missing ❌

  • Observation-based decision loop
  • Perception → Decision → Action cycle
  • Continuous tick loop with backend

Architecture Flow

┌─────────────────────────────┐
│   Simplified Test Scene     │
│   - Mock agent position     │
│   - Mock resources/hazards  │
└────────────┬────────────────┘
             │ Build observations
             ▼
┌─────────────────────────────┐
│   Observation Dictionary    │
│   - position: [x,y,z]      │
│   - nearby_resources: []   │
│   - nearby_hazards: []     │
└────────────┬────────────────┘
             │ HTTP POST /observe
             ▼
┌─────────────────────────────┐
│   Python Backend            │
│   - Mock decision logic     │
│   - Returns tool + params   │
└────────────┬────────────────┘
             │ JSON response
             ▼
┌─────────────────────────────┐
│   Test Scene                │
│   - Log decision            │
│   - Don't execute           │
│   - Continue loop           │
└─────────────────────────────┘

Implementation Tasks

Phase 1: Backend Endpoint (30 min)

File: python/ipc/server.py

  • Add /observe POST endpoint
  • Implement make_mock_decision() function with rule-based logic:
    • Priority 1: Avoid nearby hazards (distance < 3.0)
    • Priority 2: Move to nearest resource (distance < 5.0)
    • Default: Idle
  • Return decision with tool name, params, and reasoning
  • Add logging for debugging

Phase 2: Test Scene (1 hour)

File: scripts/tests/test_observation_loop.gd

  • Create test script extending Node
  • Add mock foraging data (agent position, resources, hazards)
  • Implement build_observation() to create observation dict
  • Implement send_observation() using HTTPRequest
  • Process 10 ticks with 0.5s delay between each
  • Log observations sent and decisions received
  • Add keyboard controls (Q to quit)

File: scenes/tests/test_observation_loop.tscn

  • Create simple scene with test script node
  • No 3D environment needed

Phase 3: Documentation

File: scenes/tests/README.md

  • Add section for test_observation_loop.tscn
  • Document purpose, how to run, expected output
  • Add to test suite list

Mock Decision Logic

def make_mock_decision(obs: dict) -> dict:
    nearby_resources = obs.get("nearby_resources", [])
    nearby_hazards = obs.get("nearby_hazards", [])
    
    # Priority 1: Avoid hazards
    for hazard in nearby_hazards:
        if hazard["distance"] < 3.0:
            return {
                "tool": "move_away",
                "params": {"from_position": hazard["position"]},
                "reasoning": f"Avoiding {hazard['type']} hazard"
            }
    
    # Priority 2: Collect resources
    if nearby_resources:
        closest = min(nearby_resources, key=lambda r: r["distance"])
        if closest["distance"] < 5.0:
            return {
                "tool": "move_to",
                "params": {"target_position": closest["position"]},
                "reasoning": f"Moving to collect {closest['type']}"
            }
    
    # Default: idle
    return {
        "tool": "idle",
        "params": {},
        "reasoning": "No immediate actions needed"
    }

Success Criteria

  • /observe endpoint responds to POST requests
  • Mock decision logic returns valid actions
  • Test scene runs for 10 ticks without errors
  • Each tick sends observation and receives decision
  • Decisions logged to console with clear formatting
  • Decisions make sense based on mock game state
  • No crashes, memory leaks, or hangs

Testing Steps

  1. Start Python IPC server: START_IPC_SERVER.bat
  2. Open scenes/tests/test_observation_loop.tscn in Godot
  3. Press F6 to run
  4. Watch console for 10 ticks of observations and decisions
  5. Verify decisions match expected behavior
  6. Press Q to quit

Expected Console Output

=== Observation-Decision Loop Test ===
Waiting for backend connection...
✓ Connected to backend!

=== Starting Observation Loop ===
Running 10 ticks...

--- Tick 0 ---
Observation:
  Position: (0, 0, 0)
  Nearby resources: 2
  Nearby hazards: 1
✓ Decision received:
  Tool: move_away
  Reasoning: Avoiding fire hazard

--- Tick 1 ---
...

=== Test Complete ===
All 10 ticks processed successfully!
Press Q to quit

Out of Scope (for this issue)

  • ❌ Actual agent movement execution
  • ❌ Real LLM integration (using mock logic)
  • ❌ Integration with foraging.gd (separate task)
  • ❌ Multiple agents
  • ❌ Physics/collision

Follow-Up Tasks

After this test passes:

  1. Integrate observation loop into foraging scene
  2. Replace mock decisions with real LLM backend
  3. Implement actual movement execution
  4. Add multi-agent support

Estimated Time

  • Phase 1: 30 minutes
  • Phase 2: 1 hour
  • Phase 3: 30 minutes
  • Total: ~2 hours

Related Files

  • scripts/tests/test_tool_execution.gd - Reference for test structure
  • scripts/foraging.gd - Will eventually integrate this pattern
  • python/ipc/server.py - Backend server to modify
  • python/tools/movement.py - Example tool implementations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions