fix(pain): propagate agent_id through pain reaction reward chain#223
Merged
fix(pain): propagate agent_id through pain reaction reward chain#223
Conversation
Pain reactions emerge with `ReactionContext.agent_id=None`, so the `_distribute_reward_from_reaction` subscriber in `runtime/bio_stack.py` early-returns on every pain. Result across 4 prior cradle/embodiment sims: pain fires loudly (38 pain_bridge events, 17 perceived_pain events in cradle), but `_reward_bias` stayed at 0 entries. The whole substrate-side reward path was dormant. Diagnosis added a structured trace gated by `MAXIM_PAIN_CHAIN_TRACE=1` that emits one event per chain transition. First cradle run with the trace showed: pain_chain.bus_publish 11 (body emits pain — works) pain_chain.reaction_emitted 11 (compat converts to reaction) pain_chain.reward_subscriber 19 (subscriber receives) pain_chain.distribute_returned 0 ← chain breaks here Every subscriber event had `will_distribute: false`. The smoking gun: ctx_has_agent_id was `false` on every bus_publish. Pain publishers weren't including agent_id in PainSignal.context, and even if they had, `pain_signal_to_reaction` didn't extract it into ReactionContext. Fix has three parts: 1. `Embodiment.__init__` takes a new `agent_id: str = ""` parameter, stored as `self.agent_id`. `_publish_pain` and `_publish_drive_pain` include it in PainSignal.context. Bootstrap passes `agent_id=agent_id` when constructing the Embodiment. 2. `simulation/tools.py::DamageComponentTool` (the third pain publisher, missed in pass 1 — fires from orchestrator-side damage calls and reflex_fire) reads `agent_id` from `self._embodiment` and includes it in its PainSignal.context. 3. `reactions/compat.py::pain_signal_to_reaction` extracts `signal.context["agent_id"]` and passes it to `ReactionContext(agent_id=...)`. Empty string normalises to None so the subscriber's documented `agent_id is None` early-return fires correctly for foundry / scene-entity embodiments that aren't learning subjects. Verification — third cradle run with trace: pain_chain.bus_publish 17 pain_chain.reaction_emitted 17 pain_chain.reward_subscriber 21 (will_distribute: true, agent_id="sim_aut") pain_chain.distribute_returned 12 (11 of 12 credited 1+ nodes) `_reward_bias` is still empty *by design*: per the existing CLAUDE.md invariant, "_reward_bias clamps to [0, max_reward_bias] — Negative rewards (pain) produce 0.0 bias. Pain avoidance is handled by valence annotation on edges, not by reward bias." Negative pain rewards clamp to 0; the per-tick decay then prunes 0.0 entries. The chain is wired correctly — pain-only sims simply produce no persistent reward_bias state, which is intended. Positive Reactions on the bus would now land correctly. `goal_reward_bias` does accumulate (cradle: -0.16) — confirming the distributor actually reached this stage, just that the substrate-side clamp filters pain. Trace infrastructure (`MAXIM_PAIN_CHAIN_TRACE=1` env gate + `pain_chain_trace()` helper in `proprioception/pain_bus.py`) ships in this commit so future debugging of the chain doesn't need to re-instrument it. Tests: +3 (Embodiment includes agent_id in pain context; PainSignal agent_id propagates into ReactionContext; empty-string agent_id normalises to None). Total: 6357 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the silent failure mode that left
_reward_biasempty across every cradle / damage / pain-bearing sim despite pain firing constantly. Root cause: pain reactions emerged withReactionContext.agent_id=None, the reward-distributor subscriber early-returned on every one, anddistribute()was never called.Investigation
Added a structured trace gated by
MAXIM_PAIN_CHAIN_TRACE=1that emits one event per chain transition:First cradle run with trace (no fix):
bus_publishreaction_emittedreward_subscriberwill_distribute: falseon every onedistribute_returnedbus_publishevents showedctx_has_agent_id: falsefor every pain — body publishers weren't propagatingagent_id, and even if they had,pain_signal_to_reactionwouldn't have extracted it intoReactionContext.Fix (three parts)
Embodiment.__init__takesagent_id: str = ""and stores it. Both_publish_painand_publish_drive_paininclude it inPainSignal.context.runtime/bootstrap.pypassesagent_id=agent_idwhen constructing the embodiment.simulation/tools.py::DamageComponentTool(the third pain publisher — fires from orchestrator-side damage calls including reflex_fire) reads agent_id fromself._embodimentand adds it to its inline-constructedPainSignal.context.reactions/compat.py::pain_signal_to_reactionextractssignal.context["agent_id"]and passes it toReactionContext(agent_id=...). Empty string normalises toNoneso the subscriber's documentedagent_id is Noneearly-return fires correctly for foundry / scene-entity embodiments that aren't learning subjects.Verification — third cradle run with all fixes:
bus_publishreaction_emittedreward_subscriberwill_distribute: true,agent_id="sim_aut")distribute_returnedThe chain is fully wired.
goal_reward_biasaccumulates (cradle: -0.16) confirming the distributor reaches its terminal stage.On
_reward_biasstill being emptyVerified: per the existing CLAUDE.md invariant, "
_reward_biasclamps to [0, max_reward_bias] — Negative rewards (pain) produce 0.0 bias. Pain avoidance is handled by valence annotation on edges, not by reward bias." Negative pain rewards clamp to 0; the per-tick decay then prunes 0.0 entries. This is intended. Pain-only sims produce no persistent reward_bias state. Positive Reactions on the bus would now land correctly — that part of the chain was the dormancy this PR closes.Trace infrastructure
The
MAXIM_PAIN_CHAIN_TRACE=1env gate +pain_chain_trace()helper inproprioception/pain_bus.pyship in this commit. Future debugging of the chain doesn't need to re-instrument it.Test plan
ruff check+ruff formaton every touched filetest_publish_propagates_agent_id_into_reaction_context— signal context → ReactionContexttest_publish_with_empty_agent_id_normalises_to_none— empty string normalisationtest_embodiment_includes_agent_id_in_pain_context— Embodiment writes agent_id into pain🤖 Generated with Claude Code