You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Track the manual verification steps that were skipped on the per-dataset
normalization PR (#336) because they require a GPU runner / multi-GPU
node, beyond what CPU CI can exercise.
What's already covered
CPU CI (1188 passing tests, including new parametrised tests under tests/policies/test_normalize_per_dataset.py, tests/policies/test_save_pretrained_skip_stats.py, and tests/datasets/test_tagged_dataset.py) validates the unit-level
behaviour of:
the stacked (D, *feat_shape) Normalize/Unnormalize buffers and the
per-sample index_select + broadcast path,
the _TaggedDataset wrapper and default-collate batching,
the save-with-stats / save-without-stats round-trip through safetensors.safe_open.
The pytest -m "gpu" -n 0 subset (19 passed / 10 skipped / 1359
deselected) was also run on an internal GPU dev box during PR
preparation and is green on this branch.
What still needs a real GPU runner
Smoke training run on a small config (e.g. configs/examples/pi05_training_config.json with steps=40, or configs/dev/dev_config.json with steps=2):
Confirm forward does not trip the new inf-assertion when the
dataloader pipeline emits dataset_index and dataset_repo_id.
Confirm the per-dataset validation dataloaders iterate without KeyError("dataset_index").
Inspect model.safetensors after a save_normalization_stats=false
save and assert no normalize_*.buffer_* / unnormalize_*.buffer_* keys made it to disk:
python -c "from safetensors import safe_open; \ f=safe_open('outputs/.../checkpoints/last/pretrained_model/model.safetensors','pt'); \ keys=list(f.keys()); \ assert not any('normalize_inputs.buffer' in k for k in keys), keys"
Re-load via make_policy(cfg, ds_meta=mixture.meta) and verify _inject_stats repopulates the buffers and forward succeeds.
Determinism check (per CLAUDE.md rule Fixing reward normalizer #3): run the smoke config
twice with seed=0 on a single GPU and diff the per-step loss
series — confirm it is bit-identical. Required because this PR
touches policies/normalize.py, every pi policy's forward/sample_actions, and the datasets pipeline.
Distributed sanity under DDP and DeepSpeed ZeRO-2: launch the
smoke config with --num_processes=2 and --num_processes=8,
watch for any NCCL desync — the new index_select is a local op
with no new collectives, so the risk is low, but worth confirming
on the real backend.
Nightly regression suite (regression_test.yml runs on
g6.12xlarge). If pi05 / pi07 short training runs land within their
historical loss envelopes after this change, that's the strongest
signal that the per-dataset path doesn't silently regress a
single-dataset config (which is the most common case today).
Why this is split out
CI's CPU subset can't fail-fast on these — they need real CUDA
collectives and a real training step. Splitting keeps the PR shippable
on green CPU CI while the heavier checks happen on a runner that has
access to a multi-GPU node.
Done when
pytest -m "gpu" -n 0 passes on a CUDA box.
Smoke train on configs/dev/dev_config.json (2 steps) succeeds.
Smoke train with --policy.save_normalization_stats=false produces
a safetensors file with no normalize_*.buffer_* keys; reload via make_policy(..., ds_meta=...) succeeds.
Seeded determinism: two runs with seed=0 produce bit-identical
losses.
DDP --num_processes=2 smoke run completes without NCCL desync.
Nightly regression suite is green on the merged commit.
Track the manual verification steps that were skipped on the per-dataset
normalization PR (#336) because they require a GPU runner / multi-GPU
node, beyond what CPU CI can exercise.
What's already covered
CPU CI (1188 passing tests, including new parametrised tests under
tests/policies/test_normalize_per_dataset.py,tests/policies/test_save_pretrained_skip_stats.py, andtests/datasets/test_tagged_dataset.py) validates the unit-levelbehaviour of:
(D, *feat_shape)Normalize/Unnormalize buffers and theper-sample
index_select+ broadcast path,_TaggedDatasetwrapper and default-collate batching,safetensors.safe_open.The
pytest -m "gpu" -n 0subset (19 passed / 10 skipped / 1359deselected) was also run on an internal GPU dev box during PR
preparation and is green on this branch.
What still needs a real GPU runner
Smoke training run on a small config (e.g.
configs/examples/pi05_training_config.jsonwithsteps=40, orconfigs/dev/dev_config.jsonwithsteps=2):forwarddoes not trip the new inf-assertion when thedataloader pipeline emits
dataset_indexanddataset_repo_id.KeyError("dataset_index").model.safetensorsafter asave_normalization_stats=falsesave and assert no
normalize_*.buffer_*/unnormalize_*.buffer_*keys made it to disk:make_policy(cfg, ds_meta=mixture.meta)and verify_inject_statsrepopulates the buffers and forward succeeds.Determinism check (per CLAUDE.md rule Fixing reward normalizer #3): run the smoke config
twice with
seed=0on a single GPU and diff the per-step lossseries — confirm it is bit-identical. Required because this PR
touches
policies/normalize.py, every pi policy'sforward/sample_actions, and the datasets pipeline.Distributed sanity under DDP and DeepSpeed ZeRO-2: launch the
smoke config with
--num_processes=2and--num_processes=8,watch for any NCCL desync — the new
index_selectis a local opwith no new collectives, so the risk is low, but worth confirming
on the real backend.
Nightly regression suite (
regression_test.ymlruns ong6.12xlarge). If pi05 / pi07 short training runs land within their
historical loss envelopes after this change, that's the strongest
signal that the per-dataset path doesn't silently regress a
single-dataset config (which is the most common case today).
Why this is split out
CI's CPU subset can't fail-fast on these — they need real CUDA
collectives and a real training step. Splitting keeps the PR shippable
on green CPU CI while the heavier checks happen on a runner that has
access to a multi-GPU node.
Done when
pytest -m "gpu" -n 0passes on a CUDA box.configs/dev/dev_config.json(2 steps) succeeds.--policy.save_normalization_stats=falseproducesa safetensors file with no
normalize_*.buffer_*keys; reload viamake_policy(..., ds_meta=...)succeeds.seed=0produce bit-identicallosses.
--num_processes=2smoke run completes without NCCL desync.