feat(gsp-diagnostics): surface label, squared error, and GSP loss per step#11
Conversation
Adds --experiment_name argparse argument and calls h5_logger.close() at end of training to emit FINAL sentinel to the ingestion worker (PR2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- algorithm_defaults.py: per-algorithm field merges; DQN/DDQN/DDPG/TD3 - make_config: accepts algorithm kwarg, calls merge_algorithm_defaults (spec §9) Part of experiment dispatcher PR5 (spec §§5.5, 9, 14.6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Agent.__init__ reads ROBOT_ORDER (fixed|randomized) and SEED from config - choose_agent_gsp shuffles per-robot iteration order with seeded RNG when ROBOT_ORDER=randomized, enabling controlled bias comparison (spec §14.3) - Fixed order is the default, preserving existing behavior exactly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Instantiate HDF5Logger at training start (one episodes.h5 per experiment, co-located with Data/ and Models/ under recording_path) - Replace data_writer.writerow() with h5_logger.writerow() (signatures match) - Replace data_writer.write_to_file() with h5_logger.write_episode() — fires notify_episode sentinel on the ingestion FIFO - close() at end of training fires notify_final for test auto-enqueue Pickle-based data_logger import kept for backwards-compat reads but never writes. Eliminates the disk-full risk from 500 episodes × 3 MB pkl files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… step calculate_gsp_reward now returns (reward, label, squared_errors) so the raw prediction error is available alongside the clipped reward. Main.py forwards label (broadcast as per-robot gsp_target), squared_error, and model.last_gsp_loss to the HDF5Logger each tick and learning step. These are the fields needed to detect GSP information collapse: raw squared error carries the magnitude beyond the reward's -2 saturation, and the GSP network's training loss exposes degenerate learning that the actor/critic loss would hide. See Stelaris docs/specs/2026-04-12-dispatcher-diagnostic-batch.md for the hypothesis. Requires companion changes in GSP-RL (feature/hdf5-gsp-diagnostics, commit e1b138d) and Stelaris HDF5Logger (same branch). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review of c699675 (wiring layer only)Verdict: approve with one semantic concern and two suggestions. The env.py 3-tuple refactor is clean, the squared-error math is correct, tests are faithful tuple-unpack updates, and there are no other callers of Concerns1. 2. 3. Verified non-issues
SuggestionConsider deleting the commented-out dead code block at Overall: ship it after deciding on concern #1 (document or aggregate). The diagnostic data will be correct at the source modulo that schema cardinality question. |
…ependent mode In --independent_learning, record_gsp_loss previously fired once per robot model inside the per-robot learn loop, so the 1D gsp_loss dataset received num_robots entries per learn tick instead of one. That made the information-collapse diagnostic's gsp_loss axis length differ between independent-learning mode (num_learn_steps × num_robots) and shared-model mode (num_learn_steps), breaking cross-mode comparability. Move to one-call-per-tick: run all per-robot learn() steps first, then collect last_gsp_loss from each model and record the mean. Shared-model branch is unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second-pass review — commit 84c6203Verdict: fix lands correctly. Ready to merge. 1. Independent-learning fix verified
2. No regression in shared-model branchLines 531-535 untouched. Primary diagnostic path intact. 3. Commit scope clean
Red flags checked
Cardinality asymmetry from first-pass review is resolved. Ship it. |
Summary
Wires the GSP information-collapse diagnostic signals from the training loop into the HDF5 logger.
Core change (`c699675` — this is the review target):
Ancestor commits in this PR (pre-existing work on `feat/learn-every-n-steps` not yet on origin — my commit depends on `h5_logger` existing, so they must ship together):
Please focus review on commit `c699675` only. The four ancestor commits are existing in-progress work and should be reviewed separately if they haven't been.
Why
The clipped `gsp_reward` saturates at -2 and hides the magnitude of large prediction errors. The raw squared error carries the signal needed to detect degenerate GSP predictions. Combined with `Actor.last_gsp_loss` from the companion GSP-RL PR, this lets the HDF5 logger capture enough to diagnose GSP information collapse at scale.
See `docs/specs/2026-04-12-dispatcher-diagnostic-batch.md` in Stelaris for the information-collapse hypothesis these fields exist to test.
Companion PRs (must land together)
Test plan
🤖 Generated with Claude Code