fix(pain): propagate agent_id through pain reaction reward chain by dennys246 · Pull Request #223 · dennys246/Maxim

dennys246 · 2026-05-04T03:57:22Z

Summary

Fixes the silent failure mode that left _reward_bias empty across every cradle / damage / pain-bearing sim despite pain firing constantly. Root cause: pain reactions emerged with ReactionContext.agent_id=None, the reward-distributor subscriber early-returned on every one, and distribute() was never called.

Investigation

Added a structured trace gated by MAXIM_PAIN_CHAIN_TRACE=1 that emits one event per chain transition:

pain_chain.bus_publish       (PainBus.publish entry)
pain_chain.reaction_emitted  (after compat conversion)
pain_chain.reward_subscriber (subscriber receives reaction)
pain_chain.distribute_returned (after distributor.distribute call)

First cradle run with trace (no fix):

Event	Count	Note
`bus_publish`	11	pain emits — works
`reaction_emitted`	11	compat converts
`reward_subscriber`	19	receives, but `will_distribute: false` on every one
`distribute_returned`	0	chain breaks here

bus_publish events showed ctx_has_agent_id: false for every pain — body publishers weren't propagating agent_id, and even if they had, pain_signal_to_reaction wouldn't have extracted it into ReactionContext.

Fix (three parts)

Embodiment.__init__ takes agent_id: str = "" and stores it. Both _publish_pain and _publish_drive_pain include it in PainSignal.context. runtime/bootstrap.py passes agent_id=agent_id when constructing the embodiment.
simulation/tools.py::DamageComponentTool (the third pain publisher — fires from orchestrator-side damage calls including reflex_fire) reads agent_id from self._embodiment and adds it to its inline-constructed PainSignal.context.
reactions/compat.py::pain_signal_to_reaction extracts signal.context["agent_id"] and passes it to ReactionContext(agent_id=...). Empty string normalises to None so the subscriber's documented agent_id is None early-return fires correctly for foundry / scene-entity embodiments that aren't learning subjects.

Verification — third cradle run with all fixes:

Event	Count
`bus_publish`	17
`reaction_emitted`	17
`reward_subscriber`	21 (`will_distribute: true`, `agent_id="sim_aut"`)
`distribute_returned`	12 (11 of 12 credited 1+ nodes)

The chain is fully wired. goal_reward_bias accumulates (cradle: -0.16) confirming the distributor reaches its terminal stage.

On `_reward_bias` still being empty

Verified: per the existing CLAUDE.md invariant, "_reward_bias clamps to [0, max_reward_bias] — Negative rewards (pain) produce 0.0 bias. Pain avoidance is handled by valence annotation on edges, not by reward bias." Negative pain rewards clamp to 0; the per-tick decay then prunes 0.0 entries. This is intended. Pain-only sims produce no persistent reward_bias state. Positive Reactions on the bus would now land correctly — that part of the chain was the dormancy this PR closes.

Trace infrastructure

The MAXIM_PAIN_CHAIN_TRACE=1 env gate + pain_chain_trace() helper in proprioception/pain_bus.py ship in this commit. Future debugging of the chain doesn't need to re-instrument it.

Test plan

Full fast suite: 6357 passed (was 6336; +3 new tests)
ruff check + ruff format on every touched file
New tests pin the agent_id propagation contract:
- test_publish_propagates_agent_id_into_reaction_context — signal context → ReactionContext
- test_publish_with_empty_agent_id_normalises_to_none — empty string normalisation
- test_embodiment_includes_agent_id_in_pain_context — Embodiment writes agent_id into pain
End-to-end cradle sim with trace events shows the chain transitioning cleanly through all four stages

🤖 Generated with Claude Code

Pain reactions emerge with `ReactionContext.agent_id=None`, so the `_distribute_reward_from_reaction` subscriber in `runtime/bio_stack.py` early-returns on every pain. Result across 4 prior cradle/embodiment sims: pain fires loudly (38 pain_bridge events, 17 perceived_pain events in cradle), but `_reward_bias` stayed at 0 entries. The whole substrate-side reward path was dormant. Diagnosis added a structured trace gated by `MAXIM_PAIN_CHAIN_TRACE=1` that emits one event per chain transition. First cradle run with the trace showed: pain_chain.bus_publish 11 (body emits pain — works) pain_chain.reaction_emitted 11 (compat converts to reaction) pain_chain.reward_subscriber 19 (subscriber receives) pain_chain.distribute_returned 0 ← chain breaks here Every subscriber event had `will_distribute: false`. The smoking gun: ctx_has_agent_id was `false` on every bus_publish. Pain publishers weren't including agent_id in PainSignal.context, and even if they had, `pain_signal_to_reaction` didn't extract it into ReactionContext. Fix has three parts: 1. `Embodiment.__init__` takes a new `agent_id: str = ""` parameter, stored as `self.agent_id`. `_publish_pain` and `_publish_drive_pain` include it in PainSignal.context. Bootstrap passes `agent_id=agent_id` when constructing the Embodiment. 2. `simulation/tools.py::DamageComponentTool` (the third pain publisher, missed in pass 1 — fires from orchestrator-side damage calls and reflex_fire) reads `agent_id` from `self._embodiment` and includes it in its PainSignal.context. 3. `reactions/compat.py::pain_signal_to_reaction` extracts `signal.context["agent_id"]` and passes it to `ReactionContext(agent_id=...)`. Empty string normalises to None so the subscriber's documented `agent_id is None` early-return fires correctly for foundry / scene-entity embodiments that aren't learning subjects. Verification — third cradle run with trace: pain_chain.bus_publish 17 pain_chain.reaction_emitted 17 pain_chain.reward_subscriber 21 (will_distribute: true, agent_id="sim_aut") pain_chain.distribute_returned 12 (11 of 12 credited 1+ nodes) `_reward_bias` is still empty *by design*: per the existing CLAUDE.md invariant, "_reward_bias clamps to [0, max_reward_bias] — Negative rewards (pain) produce 0.0 bias. Pain avoidance is handled by valence annotation on edges, not by reward bias." Negative pain rewards clamp to 0; the per-tick decay then prunes 0.0 entries. The chain is wired correctly — pain-only sims simply produce no persistent reward_bias state, which is intended. Positive Reactions on the bus would now land correctly. `goal_reward_bias` does accumulate (cradle: -0.16) — confirming the distributor actually reached this stage, just that the substrate-side clamp filters pain. Trace infrastructure (`MAXIM_PAIN_CHAIN_TRACE=1` env gate + `pain_chain_trace()` helper in `proprioception/pain_bus.py`) ships in this commit so future debugging of the chain doesn't need to re-instrument it. Tests: +3 (Embodiment includes agent_id in pain context; PainSignal agent_id propagates into ReactionContext; empty-string agent_id normalises to None). Total: 6357 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dennys246 merged commit 46f1925 into main May 4, 2026
5 checks passed

dennys246 deleted the bug/pain-reward-chain-investigation branch May 4, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pain): propagate agent_id through pain reaction reward chain#223

fix(pain): propagate agent_id through pain reaction reward chain#223
dennys246 merged 1 commit intomainfrom
bug/pain-reward-chain-investigation

dennys246 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dennys246 commented May 4, 2026

Summary

Investigation

Fix (three parts)

Verification — third cradle run with all fixes:

On _reward_bias still being empty

Trace infrastructure

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

On `_reward_bias` still being empty