[megatron] rebuild weight conversion tasks per sync to prevent stale PP-collective caches with bucketing by erictang000 · Pull Request #1345 · NovaSky-AI/SkyRL

erictang000 · 2026-03-19T00:14:29Z

Summary

Bucketed weight sync reused WeightConversionTask objects (and their mapping caches) across sync cycles, causing incorrect vLLM weight updates for DeepSeek-V3 style models (like Moonlight-16B-A3B, or GLM-4.7-Flash) with PP > 1 (for both LoRA and full finetuning). The mapping objects cache PP-collective metadata that becomes stale across train/offload/reload cycles.
Fix: store only the bucket index structure (which task indices go in which bucket) once, and rebuild fresh tasks with clean mapping objects on each extract_weights() call. This preserves packed-broadcast performance while ensuring correct PP collectives every sync.

This manifested in extremely unstable training + reward collapsing for Deepseek-v3 style models with megatron.

Test plan

Moonlight-16B with PP=2: reward increases without significant weight sync time regression

Results

Moonlight-16B-A3B GSM8k

Before in purple, after the fix in blue:

Weight sync timing (~10s after, ~8s before):

GLM-4.7-Flash GSM8k

Before in red, after the fix in tan (GLM with LoRA)

Weight sync timing (~15 after, ~12 before):

…an refs

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

SumanthRH

stamp

…PP-collective caches with bucketing (#1345) ## Summary - Bucketed weight sync reused `WeightConversionTask` objects (and their mapping caches) across sync cycles, causing incorrect vLLM weight updates for DeepSeek-V3 style models (like Moonlight-16B-A3B, or GLM-4.7-Flash) with PP > 1 (for both LoRA and full finetuning). The mapping objects cache PP-collective metadata that becomes stale across train/offload/reload cycles. - Fix: store only the bucket index structure (which task indices go in which bucket) once, and rebuild fresh tasks with clean mapping objects on each `extract_weights()` call. This preserves packed-broadcast performance while ensuring correct PP collectives every sync. This manifested in extremely unstable training + reward collapsing for Deepseek-v3 style models with megatron. ## Test plan - [x] Moonlight-16B with PP=2: reward increases without significant weight sync time regression ## Results ### Moonlight-16B-A3B GSM8k Before in purple, after the fix in blue: <img width="315" height="246" alt="image" src="https://github.com/user-attachments/assets/9a7119af-aea8-4b81-888f-f05cd3865c99" /> <img width="318" height="242" alt="image" src="https://github.com/user-attachments/assets/b899766f-9a7e-43fc-98f9-2f856dc04c3c" /> Weight sync timing (~10s after, ~8s before): <img width="317" height="246" alt="image" src="https://github.com/user-attachments/assets/ed46cdbc-9b95-4533-bb5f-416db56a2847" /> ### GLM-4.7-Flash GSM8k Before in red, after the fix in tan (GLM with LoRA) <img width="337" height="525" alt="image" src="https://github.com/user-attachments/assets/e647d6f4-42a9-4317-99d4-e3d1d940b038" /> Weight sync timing (~15 after, ~12 before): <img width="331" height="239" alt="image" src="https://github.com/user-attachments/assets/cc98f4e9-314c-462c-ab7f-80b29bc96f70" />  --- <a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1345" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>  --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

retrieve get_conversion_tasks() on every weight sync to guarantee cle…

84c296a

…an refs

This comment was marked as resolved.

Sign in to view

Update skyrl/backends/skyrl_train/workers/megatron/megatron_worker.py

7fe7876

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

devin-ai-integration bot reviewed Mar 19, 2026

View reviewed changes

SumanthRH approved these changes Mar 19, 2026

View reviewed changes

erictang000 merged commit 957640e into NovaSky-AI:main Mar 19, 2026
4 of 6 checks passed

erictang000 deleted the fix_megatron_pp_weight_sync branch March 19, 2026 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron] rebuild weight conversion tasks per sync to prevent stale PP-collective caches with bucketing#1345

[megatron] rebuild weight conversion tasks per sync to prevent stale PP-collective caches with bucketing#1345
erictang000 merged 2 commits intoNovaSky-AI:mainfrom
erictang000:fix_megatron_pp_weight_sync

erictang000 commented Mar 19, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

SumanthRH left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erictang000 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Results

Moonlight-16B-A3B GSM8k

GLM-4.7-Flash GSM8k

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

SumanthRH left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erictang000 commented Mar 19, 2026 •

edited

Loading