cp: test: Update on-policy distillation release tests (1363) into r0.4.0#1376
cp: test: Update on-policy distillation release tests (1363) into r0.4.0#1376
test: Update on-policy distillation release tests (1363) into r0.4.0#1376Conversation
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughThis PR updates and restructures multiple distillation example configurations and test scripts for Qwen model distillation. Changes include increasing validation batch sizes, removing training/generation batch size and scheduler block parameters from policy/teacher configurations, updating test success criteria with stricter loss thresholds and validation accuracy checks, and consolidating/removing deprecated configuration files and test entries. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes The changes span multiple files with a consistent pattern of configuration restructuring (removal of batch size and scheduler fields across policy/teacher blocks). While the repetition across files reduces individual reasoning effort per file, the heterogeneity of changes (config updates, new additions, deletions, test metric modifications) and the need to verify test success criteria alignment require moderate review complexity. Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/test_suites/release.txt (1)
46-49: Fix typos and hyphenation in the comment line.Use “20-step” and fix “seqence” -> “sequence”.
-# 20 step functional tests on dynamic batching, non-colocated and seqence packing features +# 20-step functional tests on dynamic batching, non-colocated and sequence packing featuresexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml (1)
12-20: Explicitly set TP=2 for policy to match fsdp2tp2.Currently only context_parallel_size is set. Add either policy.dtensor_cfg.tensor_parallel_size: 2 or generation.vllm_cfg.tensor_parallel_size: 2 (matching the long recipe).
Example (match long recipe):
policy: model_name: Qwen/Qwen3-4B-Base dtensor_cfg: context_parallel_size: 1 + generation: + vllm_cfg: + tensor_parallel_size: 2 dynamic_batching: enabled: false
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml(0 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml(0 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml(0 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh(2 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh(0 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh(0 hunks)tests/test_suites/release.txt(1 hunks)
💤 Files with no reviewable changes (5)
- examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml
- examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml
- examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml
- tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh
- tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh
🧰 Additional context used
📓 Path-based instructions (7)
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh
tests/test_suites/llm/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh
tests/test_suites/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt
Files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.shtests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.shtests/test_suites/release.txt
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml
examples/configs/recipes/llm/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml
examples/configs/recipes/**/*.{yaml,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml
examples/configs/recipes/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place recipe YAMLs under examples/configs/recipes//
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yamlexamples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml
🧠 Learnings (1)
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh
🪛 LanguageTool
tests/test_suites/release.txt
[grammar] ~43-~43: There might be a mistake here.
Context: ... ################ # Long 4b convergence tests/test_suites/llm/distillation-qwen3...
(QB_NEW_EN)
[grammar] ~46-~46: Use a hyphen to join words.
Context: ...-4b-base-2n8g-fsdp2tp2-long.v1.sh # 20 step functional tests on dynamic batchin...
(QB_NEW_EN_HYPHEN)
[grammar] ~46-~46: Ensure spelling is correct
Context: ... on dynamic batching, non-colocated and seqence packing features tests/test_suites/llm/...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~47-~47: There might be a mistake here.
Context: ...4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh tests/test_suites/llm/distillation-qwen3...
(QB_NEW_EN)
[grammar] ~48-~48: There might be a mistake here.
Context: ...4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh tests/test_suites/llm/distillation-qwen3...
(QB_NEW_EN)
[grammar] ~49-~49: There might be a mistake here.
Context: ...b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh
(QB_NEW_EN)
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: CI quality check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (14)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml (1)
5-7: All checks passed; no changes needed.
val_batch_sizeis used globally by the dataloader andmax_batches = max_val_samples // val_batch_sizeyields one eval batch as intended.examples/configs/distillation_math.yamlprovidesoptimizerandschedulervia thePOLICY_BASEanchor, inherited by both policy and teacher.- Recipe’s
checkpointing.save_period=50versusmax_num_steps=20means no checkpoint is written—acceptable for fast release tests.tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh (2)
10-10: NUM_MINUTES is OK; consumed by infra.SC2034 can be ignored here; standard test infra reads this var from common.env/external tooling.
Based on learnings
38-40: Stricter thresholds look good; verify metric key presence at step 20.Ensure validation runs at step 20 so data["validation/accuracy"]["20"] exists (val_period=10 in the YAML).
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh (2)
10-10: NUM_MINUTES is OK; consumed by infra.SC2034 can be ignored for this suite variable.
Based on learnings
38-40: Tightened loss + added val acc check are reasonable.Confirm val_period ensures a step-20 validation to populate the metric.
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml (2)
5-5: val_batch_size=256: LGTM.Matches test cadence; improves validation throughput.
12-22: Add explicit tensor_parallel_size: 2 under policy.dtensor_cfg. The base distillation_math.yaml defines &DTENSOR_BASE with tensor_parallel_size: 2, but policy.dtensor_cfg only overrides context_parallel_size—which won’t inherit root values. Appendtensor_parallel_size: 2to policy.dtensor_cfg to match the fsdp2tp2 naming.tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh (2)
10-10: NUM_MINUTES is OK; consumed by infra.SC2034 can be ignored for this standard suite variable.
Based on learnings
38-40: Added validation accuracy check is good; ensure val at step 20.Confirms training quality; just verify val_period hits step 20 in the YAML.
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh (2)
7-11: Run budget reductions are sensible for release cadence.New STEPS_PER_RUN/MAX_STEPS/NUM_MINUTES align with tightened criteria.
Based on learnings
38-40: Tightened loss and added val acc at step 100: good signal.Works with val_period=50; step 100 will have validation logged.
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml (1)
31-37: Cluster=num_nodes=2 aligns with 2n8g tests.Config/test naming consistency looks good.
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml (2)
5-11: Validation and checkpoint cadence improvements look good.val_batch_size=512 and save_period=10 suit the tightened test criteria.
14-17: Setting generation.vllm_cfg.tensor_parallel_size=2 matches fsdp2tp2.Good alignment with file naming/strategy.
…r0.4.0` (#1376) Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com>
beep boop [🤖]: Hi @zpqiu 👋,
Summary by CodeRabbit
New Features
Bug Fixes
Chores