Skip to content

Drop overlength SFT tensors before backend training#665

Merged
Kovbo merged 5 commits intomainfrom
kovbo/sft-truncate-overlength
Apr 29, 2026
Merged

Drop overlength SFT tensors before backend training#665
Kovbo merged 5 commits intomainfrom
kovbo/sft-truncate-overlength

Conversation

@Kovbo
Copy link
Copy Markdown
Collaborator

@Kovbo Kovbo commented Apr 28, 2026

Fixes SFT failures caused by examples longer than the configured training context length.

Changes:

  • Drop overlength SFT trajectories inside tokenize_sft_batch.
  • Emit a console warning when trajectories are dropped.
  • Report num_dropped_trajectories from Unsloth and Megatron SFT metrics.
  • Show dropped trajectory count in ART SFT progress output.
  • Allow empty SFT batches as no-op batches, which can happen when all trajectories in a batch are dropped.

@Kovbo Kovbo requested review from FurtherAI and angkywilliam and removed request for FurtherAI and angkywilliam April 28, 2026 23:26
@Kovbo Kovbo marked this pull request as draft April 28, 2026 23:35
@Kovbo Kovbo requested a review from angkywilliam April 29, 2026 00:03
@Kovbo Kovbo marked this pull request as ready for review April 29, 2026 00:03
@Kovbo Kovbo merged commit fd36cbc into main Apr 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants