Skip to content

Allow float for repetitions#56

Merged
philipp-fischer merged 14 commits intodevelopfrom
feature/fractional_repeat
Feb 7, 2025
Merged

Allow float for repetitions#56
philipp-fischer merged 14 commits intodevelopfrom
feature/fractional_repeat

Conversation

@philipp-fischer
Copy link
Copy Markdown
Collaborator

Allow not just integer repeat count but also float. That means a dataset with 0.5 repetitions would end after half the dataset. Also 2.5 would be possible for two and a half epochs.

Copy link
Copy Markdown
Collaborator

@voegtlel voegtlel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, this needs tests. Cases I see:

  • An odd-length dataset, ratio being separative
  • An even-length dataset, where the ratio is exactly on the boundary (e.g. 10 samples in the dataset, 0.5 repetitions)
  • Save iteration, border cases: Save just before the last sample of a repetition, and right after.

Comment thread src/megatron/energon/wrappers/repeat_dataset.py Outdated
@philipp-fischer philipp-fischer marked this pull request as ready for review February 6, 2025 15:47
Comment thread src/megatron/energon/metadataset/metadataset_v2.py Outdated
Comment thread src/megatron/energon/wrappers/blend_dataset.py Outdated
Comment thread src/megatron/energon/wrappers/blend_dataset.py Outdated
Comment thread src/megatron/energon/wrappers/blend_dataset.py Outdated
Comment thread src/megatron/energon/wrappers/repeat_dataset.py
@philipp-fischer philipp-fischer merged commit cee4d39 into develop Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants