Implement Observation-Decision Loop Test (End-to-End Pipeline)

## Overview

Implement a test to validate the observation-decision loop, which is the missing piece in our end-to-end pipeline testing. This test will verify that game observations can be sent to the Python backend, processed into decisions, and returned to Godot - without executing actual agent movement.

## Current Status

### Already Tested ✅
- Backend connectivity (test_autoload_services.gd)
- Tool execution (test_tool_execution.gd)
- HTTP communication (test_tool_execution_simple.tscn)

### Missing ❌
- Observation-based decision loop
- Perception → Decision → Action cycle
- Continuous tick loop with backend

## Architecture Flow

```
┌─────────────────────────────┐
│   Simplified Test Scene     │
│   - Mock agent position     │
│   - Mock resources/hazards  │
└────────────┬────────────────┘
             │ Build observations
             ▼
┌─────────────────────────────┐
│   Observation Dictionary    │
│   - position: [x,y,z]      │
│   - nearby_resources: []   │
│   - nearby_hazards: []     │
└────────────┬────────────────┘
             │ HTTP POST /observe
             ▼
┌─────────────────────────────┐
│   Python Backend            │
│   - Mock decision logic     │
│   - Returns tool + params   │
└────────────┬────────────────┘
             │ JSON response
             ▼
┌─────────────────────────────┐
│   Test Scene                │
│   - Log decision            │
│   - Don't execute           │
│   - Continue loop           │
└─────────────────────────────┘
```

## Implementation Tasks

### Phase 1: Backend Endpoint (30 min)

**File:** `python/ipc/server.py`

- [ ] Add `/observe` POST endpoint
- [ ] Implement `make_mock_decision()` function with rule-based logic:
  - Priority 1: Avoid nearby hazards (distance < 3.0)
  - Priority 2: Move to nearest resource (distance < 5.0)
  - Default: Idle
- [ ] Return decision with tool name, params, and reasoning
- [ ] Add logging for debugging

### Phase 2: Test Scene (1 hour)

**File:** `scripts/tests/test_observation_loop.gd`

- [ ] Create test script extending Node
- [ ] Add mock foraging data (agent position, resources, hazards)
- [ ] Implement `build_observation()` to create observation dict
- [ ] Implement `send_observation()` using HTTPRequest
- [ ] Process 10 ticks with 0.5s delay between each
- [ ] Log observations sent and decisions received
- [ ] Add keyboard controls (Q to quit)

**File:** `scenes/tests/test_observation_loop.tscn`

- [ ] Create simple scene with test script node
- [ ] No 3D environment needed

### Phase 3: Documentation

**File:** `scenes/tests/README.md`

- [ ] Add section for test_observation_loop.tscn
- [ ] Document purpose, how to run, expected output
- [ ] Add to test suite list

## Mock Decision Logic

```python
def make_mock_decision(obs: dict) -> dict:
    nearby_resources = obs.get("nearby_resources", [])
    nearby_hazards = obs.get("nearby_hazards", [])
    
    # Priority 1: Avoid hazards
    for hazard in nearby_hazards:
        if hazard["distance"] < 3.0:
            return {
                "tool": "move_away",
                "params": {"from_position": hazard["position"]},
                "reasoning": f"Avoiding {hazard['type']} hazard"
            }
    
    # Priority 2: Collect resources
    if nearby_resources:
        closest = min(nearby_resources, key=lambda r: r["distance"])
        if closest["distance"] < 5.0:
            return {
                "tool": "move_to",
                "params": {"target_position": closest["position"]},
                "reasoning": f"Moving to collect {closest['type']}"
            }
    
    # Default: idle
    return {
        "tool": "idle",
        "params": {},
        "reasoning": "No immediate actions needed"
    }
```

## Success Criteria

- [ ] `/observe` endpoint responds to POST requests
- [ ] Mock decision logic returns valid actions
- [ ] Test scene runs for 10 ticks without errors
- [ ] Each tick sends observation and receives decision
- [ ] Decisions logged to console with clear formatting
- [ ] Decisions make sense based on mock game state
- [ ] No crashes, memory leaks, or hangs

## Testing Steps

1. Start Python IPC server: `START_IPC_SERVER.bat`
2. Open `scenes/tests/test_observation_loop.tscn` in Godot
3. Press F6 to run
4. Watch console for 10 ticks of observations and decisions
5. Verify decisions match expected behavior
6. Press Q to quit

## Expected Console Output

```
=== Observation-Decision Loop Test ===
Waiting for backend connection...
✓ Connected to backend!

=== Starting Observation Loop ===
Running 10 ticks...

--- Tick 0 ---
Observation:
  Position: (0, 0, 0)
  Nearby resources: 2
  Nearby hazards: 1
✓ Decision received:
  Tool: move_away
  Reasoning: Avoiding fire hazard

--- Tick 1 ---
...

=== Test Complete ===
All 10 ticks processed successfully!
Press Q to quit
```

## Out of Scope (for this issue)

- ❌ Actual agent movement execution
- ❌ Real LLM integration (using mock logic)
- ❌ Integration with foraging.gd (separate task)
- ❌ Multiple agents
- ❌ Physics/collision

## Follow-Up Tasks

After this test passes:
1. Integrate observation loop into foraging scene
2. Replace mock decisions with real LLM backend
3. Implement actual movement execution
4. Add multi-agent support

## Estimated Time

- Phase 1: 30 minutes
- Phase 2: 1 hour
- Phase 3: 30 minutes
- **Total: ~2 hours**

## Related Files

- `scripts/tests/test_tool_execution.gd` - Reference for test structure
- `scripts/foraging.gd` - Will eventually integrate this pattern
- `python/ipc/server.py` - Backend server to modify
- `python/tools/movement.py` - Example tool implementations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Observation-Decision Loop Test (End-to-End Pipeline) #28

Overview

Current Status

Already Tested ✅

Missing ❌

Architecture Flow

Implementation Tasks

Phase 1: Backend Endpoint (30 min)

Phase 2: Test Scene (1 hour)

Phase 3: Documentation

Mock Decision Logic

Success Criteria

Testing Steps

Expected Console Output

Out of Scope (for this issue)

Follow-Up Tasks

Estimated Time

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Observation-Decision Loop Test (End-to-End Pipeline) #28

Description

Overview

Current Status

Already Tested ✅

Missing ❌

Architecture Flow

Implementation Tasks

Phase 1: Backend Endpoint (30 min)

Phase 2: Test Scene (1 hour)

Phase 3: Documentation

Mock Decision Logic

Success Criteria

Testing Steps

Expected Console Output

Out of Scope (for this issue)

Follow-Up Tasks

Estimated Time

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions