Skip to content

feat(distillation): add TransferQueue support for On-Policy Distillation#2580

Open
pthombre wants to merge 4 commits into
mainfrom
pthombre/opd_tq_support
Open

feat(distillation): add TransferQueue support for On-Policy Distillation#2580
pthombre wants to merge 4 commits into
mainfrom
pthombre/opd_tq_support

Conversation

@pthombre

@pthombre pthombre commented May 27, 2026

Copy link
Copy Markdown
Contributor

What does this PR do ?

Adds TransferQueue-backed on-policy distillation so teacher top-k logits can stay in the data plane instead of being returned through the driver.

This PR extends the data-plane support that already exists for GRPO to the distillation workflow. When data_plane.enabled=true, examples/run_distillation.py now builds the student policy as a TQPolicy and dispatches to a new TQ-aware distillation trainer. The legacy distillation path remains the default when the data plane is disabled.

Key pieces added:

  • A new distillation_train_sync implementation that orchestrates rollout, teacher top-k inference, and student training through TransferQueue metadata.
  • A DistillationRolloutActor that runs generation/rollout processing and seeds the TQ partition with row-aligned distillation training tensors.
  • Worker-side get_topk_logits_presharded support so teacher workers write teacher_topk_logits and teacher_topk_indices back to TQ directly.
  • Distillation-specific data-plane schema fields for teacher top-k seed inputs and train-time top-k columns.
  • TQPolicy extensions for custom partition fields, consumer task names, and custom train fetch fields so the same policy wrapper can serve GRPO and distillation.
  • Transport-volume metrics for teacher top-k payloads, valid-token payloads, padding overhead, TQ write volume, and driver bytes avoided.
  • A TQ exemplar config and a runnable recipe/test-suite script for Qwen3 32B-to-1.7B distillation.

Issues

None.

Usage

Enable the TQ distillation path by using the new exemplar or recipe:

uv run examples/run_distillation.py \
  --config examples/configs/distillation_math_tq.yaml \
  data_plane.enabled=true

The recipe added in this PR can be launched through the test-suite wrapper:

tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1-tq.v1.sh

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

@pthombre pthombre requested review from a team as code owners May 27, 2026 00:19
@copy-pr-bot

copy-pr-bot Bot commented May 27, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pthombre pthombre changed the title feat(distillation): add TransferQueue support feat(distillation): add TransferQueue support for On-Policy Distillation May 27, 2026
@pthombre pthombre force-pushed the pthombre/opd_tq_support branch from b0e5794 to 3b9e56b Compare May 27, 2026 00:34
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
@pthombre pthombre force-pushed the pthombre/opd_tq_support branch from 3b9e56b to 5ce45ca Compare June 24, 2026 00:45
@pthombre

Copy link
Copy Markdown
Contributor Author

/ok to test 5ce45ca

@pthombre pthombre added CI Relating to CI CI:L1 Run doctests, unit tests, and functional tests labels Jun 24, 2026
@pthombre

Copy link
Copy Markdown
Contributor Author

/ok to test 5ce45ca

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
@github-actions github-actions Bot removed the CI Relating to CI label Jun 24, 2026
@pthombre

Copy link
Copy Markdown
Contributor Author

/ok to test d2827ba

Add the data_plane block to the distillation_math reference config so
test_reference_configs_up_to_date matches the exemplar config that gained
the TransferQueue section.

Add unit tests for the pure helpers in distillation_sync (_dedupe_fields,
_as_row_aligned_tensor, _aggregate_teacher_topk_transport_results,
_packing_args_for_policy, _stamp_policy_pad_seqlen) and transport_metrics
(tensor_nbytes, valid_token_tensor_nbytes) to raise patch coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
@pthombre

Copy link
Copy Markdown
Contributor Author

/ok to test ff3a57c

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant