Skip to content

fix(pain): propagate agent_id through pain reaction reward chain#223

Merged
dennys246 merged 1 commit intomainfrom
bug/pain-reward-chain-investigation
May 4, 2026
Merged

fix(pain): propagate agent_id through pain reaction reward chain#223
dennys246 merged 1 commit intomainfrom
bug/pain-reward-chain-investigation

Conversation

@dennys246
Copy link
Copy Markdown
Owner

Summary

Fixes the silent failure mode that left _reward_bias empty across every cradle / damage / pain-bearing sim despite pain firing constantly. Root cause: pain reactions emerged with ReactionContext.agent_id=None, the reward-distributor subscriber early-returned on every one, and distribute() was never called.

Investigation

Added a structured trace gated by MAXIM_PAIN_CHAIN_TRACE=1 that emits one event per chain transition:

pain_chain.bus_publish       (PainBus.publish entry)
pain_chain.reaction_emitted  (after compat conversion)
pain_chain.reward_subscriber (subscriber receives reaction)
pain_chain.distribute_returned (after distributor.distribute call)

First cradle run with trace (no fix):

Event Count Note
bus_publish 11 pain emits — works
reaction_emitted 11 compat converts
reward_subscriber 19 receives, but will_distribute: false on every one
distribute_returned 0 chain breaks here

bus_publish events showed ctx_has_agent_id: false for every pain — body publishers weren't propagating agent_id, and even if they had, pain_signal_to_reaction wouldn't have extracted it into ReactionContext.

Fix (three parts)

  1. Embodiment.__init__ takes agent_id: str = "" and stores it. Both _publish_pain and _publish_drive_pain include it in PainSignal.context. runtime/bootstrap.py passes agent_id=agent_id when constructing the embodiment.
  2. simulation/tools.py::DamageComponentTool (the third pain publisher — fires from orchestrator-side damage calls including reflex_fire) reads agent_id from self._embodiment and adds it to its inline-constructed PainSignal.context.
  3. reactions/compat.py::pain_signal_to_reaction extracts signal.context["agent_id"] and passes it to ReactionContext(agent_id=...). Empty string normalises to None so the subscriber's documented agent_id is None early-return fires correctly for foundry / scene-entity embodiments that aren't learning subjects.

Verification — third cradle run with all fixes:

Event Count
bus_publish 17
reaction_emitted 17
reward_subscriber 21 (will_distribute: true, agent_id="sim_aut")
distribute_returned 12 (11 of 12 credited 1+ nodes)

The chain is fully wired. goal_reward_bias accumulates (cradle: -0.16) confirming the distributor reaches its terminal stage.

On _reward_bias still being empty

Verified: per the existing CLAUDE.md invariant, "_reward_bias clamps to [0, max_reward_bias] — Negative rewards (pain) produce 0.0 bias. Pain avoidance is handled by valence annotation on edges, not by reward bias." Negative pain rewards clamp to 0; the per-tick decay then prunes 0.0 entries. This is intended. Pain-only sims produce no persistent reward_bias state. Positive Reactions on the bus would now land correctly — that part of the chain was the dormancy this PR closes.

Trace infrastructure

The MAXIM_PAIN_CHAIN_TRACE=1 env gate + pain_chain_trace() helper in proprioception/pain_bus.py ship in this commit. Future debugging of the chain doesn't need to re-instrument it.

Test plan

  • Full fast suite: 6357 passed (was 6336; +3 new tests)
  • ruff check + ruff format on every touched file
  • New tests pin the agent_id propagation contract:
    • test_publish_propagates_agent_id_into_reaction_context — signal context → ReactionContext
    • test_publish_with_empty_agent_id_normalises_to_none — empty string normalisation
    • test_embodiment_includes_agent_id_in_pain_context — Embodiment writes agent_id into pain
  • End-to-end cradle sim with trace events shows the chain transitioning cleanly through all four stages

🤖 Generated with Claude Code

Pain reactions emerge with `ReactionContext.agent_id=None`, so the
`_distribute_reward_from_reaction` subscriber in `runtime/bio_stack.py`
early-returns on every pain. Result across 4 prior cradle/embodiment
sims: pain fires loudly (38 pain_bridge events, 17 perceived_pain
events in cradle), but `_reward_bias` stayed at 0 entries. The whole
substrate-side reward path was dormant.

Diagnosis added a structured trace gated by `MAXIM_PAIN_CHAIN_TRACE=1`
that emits one event per chain transition. First cradle run with the
trace showed:

  pain_chain.bus_publish        11    (body emits pain — works)
  pain_chain.reaction_emitted   11    (compat converts to reaction)
  pain_chain.reward_subscriber  19    (subscriber receives)
  pain_chain.distribute_returned 0    ← chain breaks here

Every subscriber event had `will_distribute: false`. The smoking gun:
ctx_has_agent_id was `false` on every bus_publish. Pain publishers
weren't including agent_id in PainSignal.context, and even if they
had, `pain_signal_to_reaction` didn't extract it into ReactionContext.

Fix has three parts:

1. `Embodiment.__init__` takes a new `agent_id: str = ""` parameter,
   stored as `self.agent_id`. `_publish_pain` and `_publish_drive_pain`
   include it in PainSignal.context. Bootstrap passes `agent_id=agent_id`
   when constructing the Embodiment.

2. `simulation/tools.py::DamageComponentTool` (the third pain
   publisher, missed in pass 1 — fires from orchestrator-side damage
   calls and reflex_fire) reads `agent_id` from `self._embodiment` and
   includes it in its PainSignal.context.

3. `reactions/compat.py::pain_signal_to_reaction` extracts
   `signal.context["agent_id"]` and passes it to
   `ReactionContext(agent_id=...)`. Empty string normalises to None so
   the subscriber's documented `agent_id is None` early-return fires
   correctly for foundry / scene-entity embodiments that aren't
   learning subjects.

Verification — third cradle run with trace:

  pain_chain.bus_publish        17
  pain_chain.reaction_emitted   17
  pain_chain.reward_subscriber  21    (will_distribute: true, agent_id="sim_aut")
  pain_chain.distribute_returned 12   (11 of 12 credited 1+ nodes)

`_reward_bias` is still empty *by design*: per the existing CLAUDE.md
invariant, "_reward_bias clamps to [0, max_reward_bias] — Negative
rewards (pain) produce 0.0 bias. Pain avoidance is handled by valence
annotation on edges, not by reward bias." Negative pain rewards clamp
to 0; the per-tick decay then prunes 0.0 entries. The chain is wired
correctly — pain-only sims simply produce no persistent reward_bias
state, which is intended. Positive Reactions on the bus would now land
correctly.

`goal_reward_bias` does accumulate (cradle: -0.16) — confirming the
distributor actually reached this stage, just that the substrate-side
clamp filters pain.

Trace infrastructure (`MAXIM_PAIN_CHAIN_TRACE=1` env gate +
`pain_chain_trace()` helper in `proprioception/pain_bus.py`) ships in
this commit so future debugging of the chain doesn't need to
re-instrument it.

Tests: +3 (Embodiment includes agent_id in pain context; PainSignal
agent_id propagates into ReactionContext; empty-string agent_id
normalises to None). Total: 6357 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dennys246 dennys246 merged commit 46f1925 into main May 4, 2026
5 checks passed
@dennys246 dennys246 deleted the bug/pain-reward-chain-investigation branch May 4, 2026 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant