Skip to content

[codex] fix data indexing and clip metrics#28

Merged
yifanzhang-pro merged 1 commit into
mainfrom
codex/pr23-safer-fixes
Apr 29, 2026
Merged

[codex] fix data indexing and clip metrics#28
yifanzhang-pro merged 1 commit into
mainfrom
codex/pr23-safer-fixes

Conversation

@yifanzhang-pro
Copy link
Copy Markdown
Member

Summary

This is a narrower replacement for the useful parts of #23.

  • Preserve the existing README fix on main that points installation at requirements.txt.
  • Update process-data.py to write 0-based repeat indices into the schema consumed by RLHFDataset: extra_info["index"].
  • Fix GRPO score aggregation to use torch.stack, preserving tensor dtype/device and computing the standard deviation over the response group.
  • Make REINFORCE hard-clamp clip fraction metrics report how often w falls outside the clamp bounds instead of always reporting zero.

Notes

The GRPO advantage change affects the training signal because it fixes the normalization used before policy loss computation.

The REINFORCE pg_clipfrac / pg_clipfrac_lower change only affects logged metrics. It does not change A, pg_losses, or pg_loss.

Validation

  • python -m py_compile process-data.py verl/trainer/ppo/core_algos.py
  • git diff --check
  • uvx ruff check process-data.py verl/trainer/ppo/core_algos.py

Not run: full torch/pandas tests in this temporary checkout because the local Python environment lacks project dependencies such as torch and pandas.

@yifanzhang-pro yifanzhang-pro marked this pull request as ready for review April 29, 2026 00:34
@yifanzhang-pro
Copy link
Copy Markdown
Member Author

/gemini review

@yifanzhang-pro yifanzhang-pro merged commit c0574e4 into main Apr 29, 2026
1 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant