Skip to content

Fix streaming validation dataset causing infinite loop#43

Open
vominh1919 wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
vominh1919:fix-streaming-validation-infinite-loop
Open

Fix streaming validation dataset causing infinite loop#43
vominh1919 wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
vominh1919:fix-streaming-validation-infinite-loop

Conversation

@vominh1919
Copy link
Copy Markdown

Fixes #42

Problem

In train_diloco_torch.py, the validation dataset was loaded as part of a streaming (IterableDataset) load. The evaluate_model function iterates over eval_dataloader with a for loop, which never terminates because streaming datasets have no __len__ and never signal end-of-data.

Fix

  • Removed validation from the streaming load_dataset call (training only)
  • Load the validation dataset separately with streaming=False only when eval_steps is not None
  • Apply the same tokenization and collation to the non-streaming validation set

This keeps training efficient with streaming while ensuring validation has a finite, known length.

When streaming=True, the validation dataset is an IterableDataset with no __len__, causing evaluate_model to loop forever. Fix by loading validation separately with streaming=False while keeping training data streaming.

Fixes PrimeIntellect-ai#42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming validation dataset will lead to infinite loop

1 participant