Skip to content

Fix off-by-one, log-zero, and start_loss-zero bugs in reweight()#91

Merged
MaxGhenis merged 1 commit into
mainfrom
fix/reweight-small
Apr 17, 2026
Merged

Fix off-by-one, log-zero, and start_loss-zero bugs in reweight()#91
MaxGhenis merged 1 commit into
mainfrom
fix/reweight-small

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

Three latent bugs in src/microcalibrate/reweight.py:

  • CRITICAL (finding Add repo documentation, tests #1 in bug-hunt report): The dense training loop guarded its gradient step with if i != max_epochs - 1 where max_epochs = epochs - 1, which actually skipped the penultimate epoch (i == epochs - 2) while still stepping on the final epoch. The returned final_weights therefore drifted one step away from the final tracked row. Every epoch now steps, and the tracker always ends on the final epoch so logged estimates correspond to the pre-step state of the returned weights.
  • HIGH (finding Add changelog system #5): The sparse L0 loop computed (l.item() - start_loss) / start_loss unconditionally; when start_loss was ~0 (trivial/pre-calibrated data with l0_lambda=0) this raised ZeroDivisionError inside the tqdm postfix. A magnitude guard now short-circuits to 0.0.
  • MED (finding Set up changelog #8): np.log(original_weights + random_noise) produced -inf (and downstream NaN gradients) whenever an initial weight was zero and noise_level was zero, and the L0 branch hit the same issue even with nonzero noise because it logs the raw weights. Both call sites now clamp inputs to >= 1e-12.

Test plan

  • Add tests/test_reweight_regression.py covering all three fixes: tracker includes the final epoch, N vs N-1 epochs produce different weights (every epoch steps), sparse loop does not crash when start_loss == 0, zero initial weights in the L0 path do not produce non-finite weights.
  • All existing tests pass (uv run pytest tests -x -q -> 19 passed).

🤖 Generated with Claude Code

Three latent bugs in src/microcalibrate/reweight.py:

1. The dense training loop guarded its gradient step with
   `if i != max_epochs - 1` where `max_epochs = epochs - 1`, which
   actually skipped the penultimate epoch (i == epochs - 2) while
   still stepping on the final epoch. The returned final_weights
   therefore drifted one step away from the final tracked row. Every
   epoch now steps, and the tracker always ends on the final epoch
   so logged estimates correspond to the pre-step state of the
   returned weights.

2. `np.log(original_weights + random_noise)` produced -inf (and
   downstream NaN gradients) whenever an initial weight was zero and
   noise_level was zero, and the L0 branch hit the same issue even
   with nonzero noise because it logs the raw weights. Both call
   sites now clamp inputs to >= 1e-12.

3. The sparse L0 loop computed `(l.item() - start_loss) / start_loss`
   unconditionally; when start_loss happened to be ~0 (trivially
   calibrated data with l0_lambda 0) this raised ZeroDivisionError
   inside the tqdm postfix. A small-magnitude guard now short-circuits
   to 0.0 for the display value.

Adds tests/test_reweight_regression.py covering all three fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
microcalibrate Ready Ready Preview, Comment Apr 17, 2026 0:40am

Request Review

Copy link
Copy Markdown
Contributor Author

@MaxGhenis MaxGhenis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (cannot self-approve; posting as comment).

  • Off-by-one fix: removed the max_epochs = epochs - 1 guard so every epoch steps. Loop semantic is now clean and documented in the code: pre-step state of epoch i is logged (when tracked); post-last-step state is returned. The asymmetry between the final logged row and the returned weights is acknowledged in the inline comment — a reasonable choice vs. running an extra forward pass after the last step.
  • is_final_epoch guard ensures the tracker always contains epoch epochs - 1, correctly fixing the diagnostic/returned-state disagreement.
  • np.maximum(..., 1e-12) added on both the dense and sparse log-weight initialisations — consistent.
  • start_loss divide-by-zero guard uses abs(start_loss) < 1e-12 which is correct for both zero and near-zero starts.

Minor (non-blocking) notes:

  • test_all_epochs_step compares weights after N and N-1 epochs. Under the original bug the two runs still produced different weights (different epochs skipped), so this test doesn't strictly regress the off-by-one. test_final_epoch_matches_tracker also doesn't regress it when tracking_n | (epochs - 1) (the default case epochs=25, tracking_n=2). For a tighter regression test, consider epochs=30 so (epochs-1) % tracking_n != 0. The fix itself is correct; this is just test tightness.
  • Pre-existing test_evaluate_holdout_robustness flakiness is unrelated to this PR — this PR doesn't touch RNG seeding (that's #93), so the underlying unseeded-numpy cause is present on main too.
  • Unrelated repo infra: .python-version still pins 3.11 while pyproject.toml requires >=3.13. Worth a follow-up cleanup but out of scope here.

@MaxGhenis MaxGhenis merged commit 71e19de into main Apr 17, 2026
8 of 9 checks passed
@MaxGhenis MaxGhenis deleted the fix/reweight-small branch April 17, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant