Skip to content

Verify per-dataset normalization PR end-to-end on GPU #335

@shuheng-liu

Description

@shuheng-liu

Track the manual verification steps that were skipped on the per-dataset
normalization PR (#336) because they require a GPU runner / multi-GPU
node, beyond what CPU CI can exercise.

What's already covered

CPU CI (1188 passing tests, including new parametrised tests under
tests/policies/test_normalize_per_dataset.py,
tests/policies/test_save_pretrained_skip_stats.py, and
tests/datasets/test_tagged_dataset.py) validates the unit-level
behaviour of:

  • the stacked (D, *feat_shape) Normalize/Unnormalize buffers and the
    per-sample index_select + broadcast path,
  • the _TaggedDataset wrapper and default-collate batching,
  • the save-with-stats / save-without-stats round-trip through
    safetensors.safe_open.

The pytest -m "gpu" -n 0 subset (19 passed / 10 skipped / 1359
deselected) was also run on an internal GPU dev box during PR
preparation and is green on this branch.

What still needs a real GPU runner

  • Smoke training run on a small config (e.g.
    configs/examples/pi05_training_config.json with steps=40, or
    configs/dev/dev_config.json with steps=2):

    • Confirm forward does not trip the new inf-assertion when the
      dataloader pipeline emits dataset_index and dataset_repo_id.
    • Confirm the per-dataset validation dataloaders iterate without
      KeyError("dataset_index").
    • Inspect model.safetensors after a save_normalization_stats=false
      save and assert no normalize_*.buffer_* /
      unnormalize_*.buffer_* keys made it to disk:
      python -c "from safetensors import safe_open; \
                 f=safe_open('outputs/.../checkpoints/last/pretrained_model/model.safetensors','pt'); \
                 keys=list(f.keys()); \
                 assert not any('normalize_inputs.buffer' in k for k in keys), keys"
    • Re-load via make_policy(cfg, ds_meta=mixture.meta) and verify
      _inject_stats repopulates the buffers and forward succeeds.
  • Determinism check (per CLAUDE.md rule Fixing reward normalizer #3): run the smoke config
    twice with seed=0 on a single GPU and diff the per-step loss
    series — confirm it is bit-identical. Required because this PR
    touches policies/normalize.py, every pi policy's
    forward/sample_actions, and the datasets pipeline.

  • Distributed sanity under DDP and DeepSpeed ZeRO-2: launch the
    smoke config with --num_processes=2 and --num_processes=8,
    watch for any NCCL desync — the new index_select is a local op
    with no new collectives, so the risk is low, but worth confirming
    on the real backend.

  • Nightly regression suite (regression_test.yml runs on
    g6.12xlarge). If pi05 / pi07 short training runs land within their
    historical loss envelopes after this change, that's the strongest
    signal that the per-dataset path doesn't silently regress a
    single-dataset config (which is the most common case today).

Why this is split out

CI's CPU subset can't fail-fast on these — they need real CUDA
collectives and a real training step. Splitting keeps the PR shippable
on green CPU CI while the heavier checks happen on a runner that has
access to a multi-GPU node.

Done when

  • pytest -m "gpu" -n 0 passes on a CUDA box.
  • Smoke train on configs/dev/dev_config.json (2 steps) succeeds.
  • Smoke train with --policy.save_normalization_stats=false produces
    a safetensors file with no normalize_*.buffer_* keys; reload via
    make_policy(..., ds_meta=...) succeeds.
  • Seeded determinism: two runs with seed=0 produce bit-identical
    losses.
  • DDP --num_processes=2 smoke run completes without NCCL desync.
  • Nightly regression suite is green on the merged commit.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions