Add configurable train_on_eos for conversation data preparation#535
Closed
jlamypoirier wants to merge 1 commit into
Closed
Add configurable train_on_eos for conversation data preparation#535jlamypoirier wants to merge 1 commit into
jlamypoirier wants to merge 1 commit into
Conversation
Add a `train_on_eos` flag (default `False`) to `ConversationSourceConfig` controlling whether the end-of-sequence token appended after the final message is included in the training loss. When disabled (the default, unchanged behavior) that token is masked from the loss; when enabled it becomes a training target. Threaded through `tokenize_chat` as a `train_on_eos` parameter. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Authored by Claude Opus 4.8 (with @jlamypoirier).
Splits out — as a standalone, opt-in flag — the loss-masking change that was bundled into #473.
What
Add
train_on_eos: bool = FalsetoConversationSourceConfig. It controls whether the end-of-sequence token thattokenize_chatappends after the final message is included in the training loss:false(default): the appended EOS is masked from the loss — unchanged behavior.true: the appended EOS becomes a training target.Threaded through
Tokenizer.tokenize_chatas atrain_on_eosparameter. Per-turn terminators emitted by the chat template (e.g. ChatML<|im_end|>) are unaffected — they remain governed by the template's{% generation %}markers. This flag only touches the single sequence-terminating EOS that the tokenizer appends when no EOS already appears in the conversation.Why
Masking the terminal EOS means the model never gets a loss signal to emit end-of-sequence — a well-known cause of models that don't stop generating at inference. Training on the final/assistant EOS is the common recommendation, and frameworks expose it as a knob (Axolotl
train_on_eos: turn|all|last; TRLassistant_only_loss, which includes the assistant turn's EOS). It's also an asymmetry within Fast-LLM today: the document path already trains on its appended EOS (unmasked tokens all contribute to the loss), while the conversation path masks it. This flag lets conversation prep opt into the same behavior.Kept off by default so existing datasets' loss masking is unchanged.
Testing
tests/data/test_tokenizer.py(incl. the newtest_tokenize_chat_train_on_eos) andtest_preparator.pypass (41). The new test asserts thattrain_on_eoschanges only the appended EOS's loss mask, not the tokens.Note
Touches the same
tokenize_chatcall site as #534 (no-BOS prep); whichever merges first, the other needs a one-line rebase.🤖 Generated with Claude Code