cp: `test: Update on-policy distillation release tests (1363)` into `r0.4.0` by chtruong814 · Pull Request #1376 · NVIDIA-NeMo/RL

chtruong814 · 2025-10-16T09:13:49Z

beep boop [🤖]: Hi @zpqiu 👋,

we've cherry picked #1363 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

New Features
- Added new distillation configuration example for sequence packing workflows.
Bug Fixes
- Updated validation accuracy thresholds and loss metrics in test suites for improved reliability.
Chores
- Streamlined example distillation configurations by removing redundant batch size and optimizer settings.
- Optimized test runtime parameters and reorganized test suite entries for better maintainability.
- Removed deprecated distillation configuration examples.

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai · 2025-10-16T09:14:17Z

📝 Walkthrough

Walkthrough

This PR updates and restructures multiple distillation example configurations and test scripts for Qwen model distillation. Changes include increasing validation batch sizes, removing training/generation batch size and scheduler block parameters from policy/teacher configurations, updating test success criteria with stricter loss thresholds and validation accuracy checks, and consolidating/removing deprecated configuration files and test entries.

Changes

Cohort / File(s)	Summary
Config updates: batch size & scheduler removal `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml`, `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml`	Increased `distillation.val_batch_size` (32→256), removed `policy.train_global_batch_size`, `policy.generation_batch_size`, `policy.scheduler` block and equivalent fields from `teacher` section.
Config major restructuring `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml`	Increased `val_batch_size` (32→512), removed `max_val_samples`, added `loss_fn` block (kl_type: reverse), reduced `checkpointing.save_period` (50→10), updated `max_total_sequence_length` values, removed optimizer/scheduler/batching configs from policy/teacher, added `generation.vllm_cfg.tensor_parallel_size`.
New config file `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml`	Added complete distillation configuration with sequence packing enabled, tensor parallel settings, and logging paths.
Deleted config files `examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml`, `examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml`, `examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml`	Removed entire YAML configuration files, eliminating distillation pipeline configurations.
Test script metric & timing updates `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh`, `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh`, `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh`, `tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh`	Reduced time budgets (NUM_MINUTES: 240→120 or 1200→240), tightened training loss thresholds, added `validation/accuracy` checks, removed GPU memory usage constraints.
Deleted test scripts `tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh`, `tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh`	Removed entire Bash test scripts.
Release manifest update `tests/test_suites/release.txt`	Removed 8B convergence distillation test entries, updated section heading from "Long 4b and 8b convergence" to "Long 4b convergence", replaced instruct-focused test with base seqpack test entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes span multiple files with a consistent pattern of configuration restructuring (removal of batch size and scheduler fields across policy/teacher blocks). While the repetition across files reduces individual reasoning effort per file, the heterogeneity of changes (config updates, new additions, deletions, test metric modifications) and the need to verify test success criteria alignment require moderate review complexity.

Possibly related PRs

feat: add on policy distillation algorithm #1006: Introduces the on-policy distillation feature that these config and test updates directly extend and refactor.
test: Update on-policy distillation release tests #1363: Applies identical code-level changes to the same YAML and test script files (batch size increases, scheduler/batch parameter removals).
feat: add config_cli.py and refactor configs + config pre-commit #1024: Performs the same config-level refactor pattern (removal of train_global_batch_size, generation_batch_size, scheduler blocks) as part of configuration minimization.

Suggested labels

r0.4.0

Suggested reviewers

terrykong

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The title includes explicit cherry-pick metadata and branch references rather than clearly describing the update to on-policy distillation release tests. It does not succinctly convey the main content of the changeset and adds unnecessary noise. A concise title focusing on the test updates would make the purpose clearer.	Please revise the title to a concise summary of the change, for example “Update on-policy distillation release tests,” and remove the cherry-pick and branch references.
Test Results For Major Changes	⚠️ Warning	The PR makes extensive changes to distillation configurations and test scripts that directly affect convergence criteria and performance thresholds, but its description contains only a generic cherry-pick notice without any evidence of validation, regression tests, or performance benchmarks; therefore it lacks the required documentation of test results to ensure that these major changes do not introduce regressions.	Please update the PR description to include detailed test results or validation data—such as before-and-after loss and accuracy metrics, convergence checks, and the specific configurations used—to demonstrate that the revised batch sizes, scheduler removals, and threshold updates maintain or improve numerical stability and performance.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-1363-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/test_suites/release.txt (1)
46-49: Fix typos and hyphenation in the comment line.

Use “20-step” and fix “seqence” -> “sequence”.
-# 20 step functional tests on dynamic batching, non-colocated and seqence packing features
+# 20-step functional tests on dynamic batching, non-colocated and sequence packing features
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml (1)
12-20: Explicitly set TP=2 for policy to match fsdp2tp2.

Currently only context_parallel_size is set. Add either policy.dtensor_cfg.tensor_parallel_size: 2 or generation.vllm_cfg.tensor_parallel_size: 2 (matching the long recipe).

Example (match long recipe):
 policy:
   model_name: Qwen/Qwen3-4B-Base
   dtensor_cfg:
     context_parallel_size: 1
+  generation:
+    vllm_cfg:
+      tensor_parallel_size: 2
   dynamic_batching:
     enabled: false

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b81f38 and 86df9f1.

📒 Files selected for processing (14)

examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.yaml (1 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml (0 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml (0 hunks)
examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml (0 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-1n8g-fsdp2tp2-dynamicbatch.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-seqpack.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp8-noncolocated.v1.sh (2 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh (0 hunks)
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh (0 hunks)
tests/test_suites/release.txt (1 hunks)

💤 Files with no reviewable changes (5)

examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.yaml
examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml
examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-4n8g-fsdp2tp8-long.v1.sh
tests/test_suites/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.sh

🧰 Additional context used

📓 Path-based instructions (7)

**/*.sh