Skip to content

Fix/sequence length power of 2#158

Merged
le1nux merged 21 commits intodev_experimentsfrom
fix/sequence_length_power_of_2
Jun 30, 2024
Merged

Fix/sequence length power of 2#158
le1nux merged 21 commits intodev_experimentsfrom
fix/sequence_length_power_of_2

Conversation

@le1nux
Copy link
Copy Markdown
Member

@le1nux le1nux commented Jun 19, 2024

What does this PR do?

Previously, the block_size in the dataset would be set to a power of two, resulting in the sequence length being block_size -1, which is not best practice and can impact the model training e.g., throughput-wise.

As a fix, we now specify the sequence_length in the config instead of the block_size. During Dataset instantiation we chose the block_size to be sequence_length+1.

Previously, we would also chunk the dataset into block_size long chunks. Each chunk would then be used for training individually. As a result, the last token of a block would be only used as a target but never as an input. We changed this, such that we reuse the last token of a batch as the first one of the subsequent batch.

General changes

  • nothing apart from points mentioned above

Breaking Changes

  • replaced block_size in Dataset, Model and NumberConversion with sequence_length

Checklist before submitting final PR

  • My PR is minimal and addresses one issue / enhancement in isolation
  • I have merged main into this feature branch
  • I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
  • I have run a sample config for model training
  • I have fixed all failing tests (python tests/tests.py)

@le1nux le1nux changed the base branch from main to dev_experiments June 19, 2024 16:58
@flxst flxst linked an issue Jun 20, 2024 that may be closed by this pull request
…ed as the first token (i.e., first input token) of the subsequent block
@le1nux le1nux added the enhancement New feature or request label Jun 20, 2024
@le1nux le1nux requested review from flxst, fromm-m and lhahn-iis and removed request for flxst and lhahn-iis June 20, 2024 12:00
@le1nux le1nux marked this pull request as ready for review June 20, 2024 12:01
@flxst flxst requested a review from mali-git June 24, 2024 07:57
@le1nux le1nux removed the request for review from lhahn-iis June 24, 2024 10:18
Copy link
Copy Markdown
Member

@flxst flxst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Added some minor comments.

Comment thread src/modalities/config/config.py Outdated
Comment thread src/modalities/models/gpt2/gpt2_model.py Outdated
Comment thread src/modalities/models/gpt2/gpt2_model.py Outdated
Comment thread tests/checkpointing/test_fsdp_to_disc_checkpointing.py Outdated
mali-git and others added 3 commits June 29, 2024 16:37
@le1nux le1nux merged commit 563d864 into dev_experiments Jun 30, 2024
@le1nux le1nux deleted the fix/sequence_length_power_of_2 branch June 30, 2024 10:51
@le1nux le1nux mentioned this pull request Jun 30, 2024
5 tasks
@flxst flxst mentioned this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revision of block size and sequence length

3 participants