Skip to content

Switch testing tokenizer from santacoder to gpt2#482

Merged
jlamypoirier merged 2 commits intomainfrom
jlp_gpt2_tokenizer
Apr 9, 2026
Merged

Switch testing tokenizer from santacoder to gpt2#482
jlamypoirier merged 2 commits intomainfrom
jlp_gpt2_tokenizer

Conversation

@jlamypoirier
Copy link
Copy Markdown
Collaborator

Santacoder is broken in transformers v5

jlamypoirier and others added 2 commits April 9, 2026 16:00
Update TOKENIZER_NAME from "bigcode/santacoder" to "gpt2" and update all
hardcoded token values in data tests to match the gpt2 vocabulary.
Also fix deprecated huggingface_hub.HfFolder.get_token() → get_token().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add skipif(not _kda_available) to the backup param in test_kda — the
reference KimiDeltaAttention raises ImportError at instantiation when
fla is not installed, so both fast and backup variants need the same
skip guard.

(Stale torch inductor precompiled header cache cleared separately.)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jlamypoirier jlamypoirier merged commit 7a7129d into main Apr 9, 2026
2 checks passed
@jlamypoirier jlamypoirier deleted the jlp_gpt2_tokenizer branch April 9, 2026 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant