Skip to content

Conversation

@adil-a
Copy link
Collaborator

@adil-a adil-a commented Jan 23, 2024

Two bug fixes for data preprocessing:

  • During tokenization, if we were splitting the sequence based on a separator for conditional fine-tuning, then the BOS token wasn't being counted for backpropagation.

  • During packing, we were adding the bos/eos token ids themselves to the attention mask instead of [1]s.

  • Updated .gitignore and updated version number in setup.py.

@adil-a adil-a merged commit 4a1323b into master Jan 23, 2024
@adil-a adil-a deleted the data_processing_bug branch January 23, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants