Remove fastokens entirely#95
Merged
Merged
Conversation
ApprovabilityVerdict: Needs human review This PR removes the fastokens dependency which provided ~10x tokenization speedup. While the code simplification is clean, removing a significant performance optimization is an architectural decision that warrants human review to confirm the tradeoff is intentional. You can customize Macroscope's approvability policy. Learn more. |
fastokens is a process-global monkey-patch on transformers.AutoTokenizer that swaps the .backend_tokenizer for a faster Rust BPE shim. The shim has no offset_mapping support, so we kept a parallel vanilla tokenizer per model purely for body/scaffold attribution in attribute_text_segments. Most assistant messages render mixed-label (scaffold+content+scaffold), which means the offset path runs and fastokens doesn't help; fastokens only saves time on homogeneous-label runs and direct _encode/_decode calls. The complexity tax — FASTOKENS_INCOMPATIBLE denylist (DeepSeek-V3), _FASTOKENS_PATCH_LOCK for pool slot races, contextlib.redirect_stdout to swallow [fastokens] prints, unpatch/restore dance in _get_offset_tokenizer, twin tokenizer in memory — was not earning its keep. What's removed: * fastokens dependency in pyproject.toml + its uv.exclude-newer-package exemption * FASTOKENS_INCOMPATIBLE frozenset (DeepSeek-V3 family workaround, which only existed because fastokens lacks Metaspace pre-tokenizer support) * _FASTOKENS_PATCH_LOCK, _FASTOKENS_ANNOUNCED, _patched_load * use_fastokens kwarg on load_tokenizer (signature simplifies to one positional arg) * The fastokens-failure retry-vanilla branch in load_tokenizer * The whole _offset_tokenizers cache, its lock, and the unpatch-and-reload race-safe path in _get_offset_tokenizer * tests/test_load_tokenizer_fastokens.py (211 lines, all asserted fastokens-specific behaviour) What replaces it: _get_offset_tokenizer is now 3 lines of real logic: probe → return, or raise a clear error. The contract is "pass a fast tokenizer or get a loud error." Tokenizers from load_tokenizer are PreTrainedTokenizerFast and satisfy this trivially. BYO tokenizers without return_offsets_mapping support fail at construction time instead of silently triggering a reload-from-name_or_path that only existed to paper over the fastokens shim's missing offsets. The single test that exercised the BYO-no-offsets reload path is replaced with one asserting the new error contract. Perf impact: encode/decode now goes through HuggingFace's stock tokenizers Rust BPE instead of fastokens' (claimed ~10x faster on synthetic benchmarks). The mixed-label path (most assistant turns) was already off fastokens before this change, so the realistic-workload regression is bounded to homogeneous-label runs and direct encode/decode helper calls. Measure if it matters; revisit by adding fastokens back as the offset tokenizer once it supports offset_mapping upstream. Net: -462 lines. Test suite: 2248 passed, 88 skipped, 1 xfailed (2259 → 2248 = 11 fastokens-specific tests deleted; no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
samsja
approved these changes
Jun 26, 2026
Member
Author
|
this is mostly because we currently need offset renderers everywhere. we will switch back to fasttokens once they support it. |
hallerite
added a commit
that referenced
this pull request
Jun 26, 2026
Two related refactors of the emit_text_segments / attribute_text_segments pipeline: 1. ``emit_text_segments`` closures across 8 hand-coded renderers (qwen3, qwen35, glm45, glm5, deepseek_v3, nemotron3, laguna_xs2, minimax_m2) get a "collapse-or-fallback" pattern: adjacent same-label segments are folded into one ``emit_text`` call (preserves internal BPE merges, skips the offset path); only genuinely mixed-label runs go through ``attribute_text_segments``. Most rendering paths end up homogeneous after collapse, so the offset machinery only runs when it actually has to. 2. ``attribute_text_segments`` is rewritten to use the Rust ``tokenizers.Encoding`` API directly — ``.encode().ids`` / ``.encode().offsets`` — instead of going through ``transformers``'s ``return_offsets_mapping=True`` dict API. This unblocks the future ``transformers``-optional path (issue #31): a BYO ``tokenizers.Tokenizer`` works without any ``transformers`` wrapper. ``_get_offset_tokenizer`` becomes a 2-path resolver (direct Rust tokenizer, or extract ``.backend_tokenizer`` from a ``PreTrainedTokenizerFast``); no second tokenizer load, no probe-verify, no AutoTokenizer fallback — all of those existed in the previous version of this PR to coordinate with the fastokens shim, which is gone after #95. ``minimax_m2.emit_token_overlap_body`` and ``qwen3_vl._Emitter._flush`` are updated to call the new ``Encoding``-based offset API directly. ``tokenizers>=0.20`` becomes an explicit core dependency — it was already a transitive of ``transformers``, but the new ``attribute_text_segments`` imports from ``tokenizers`` at the module level so we declare it. Tests: 2248 passed, 88 skipped, 1 xfailed (baseline parity with #95). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hallerite
added a commit
that referenced
this pull request
Jun 26, 2026
Two related refactors of the emit_text_segments / attribute_text_segments pipeline: 1. ``emit_text_segments`` closures across 8 hand-coded renderers (qwen3, qwen35, glm45, glm5, deepseek_v3, nemotron3, laguna_xs2, minimax_m2) get a "collapse-or-fallback" pattern: adjacent same-label segments are folded into one ``emit_text`` call (preserves internal BPE merges, skips the offset path); only genuinely mixed-label runs go through ``attribute_text_segments``. Most rendering paths end up homogeneous after collapse, so the offset machinery only runs when it actually has to. 2. ``attribute_text_segments`` is rewritten to use the Rust ``tokenizers.Encoding`` API directly — ``.encode().ids`` / ``.encode().offsets`` — instead of going through ``transformers``'s ``return_offsets_mapping=True`` dict API. This unblocks the future ``transformers``-optional path (issue #31): a BYO ``tokenizers.Tokenizer`` works without any ``transformers`` wrapper. ``_get_offset_tokenizer`` becomes a 2-path resolver (direct Rust tokenizer, or extract ``.backend_tokenizer`` from a ``PreTrainedTokenizerFast``); no second tokenizer load, no probe-verify, no AutoTokenizer fallback — all of those existed in the previous version of this PR to coordinate with the fastokens shim, which is gone after #95. ``minimax_m2.emit_token_overlap_body`` and ``qwen3_vl._Emitter._flush`` are updated to call the new ``Encoding``-based offset API directly. ``tokenizers>=0.20`` becomes an explicit core dependency — it was already a transitive of ``transformers``, but the new ``attribute_text_segments`` imports from ``tokenizers`` at the module level so we declare it. Tests: 2248 passed, 88 skipped, 1 xfailed (baseline parity with #95). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joanvelja
added a commit
to joanvelja/renderers
that referenced
this pull request
Jun 27, 2026
…ens removal PrimeIntellect-ai#95, dyn-versioning PrimeIntellect-ai#96) + migrate gemma4 (#10) * adopt verifiers-style dynamic versioning (PrimeIntellect-ai#96) * remove fastokens entirely for now (PrimeIntellect-ai#95) * feat(thinking): replace preserve_* bools with thinking_retention, respected by the bridge (PrimeIntellect-ai#88) * migrate gemma4 to thinking_retention (render+bridge, implied=all; debate hot path 1:1, full-render keeps thinking) + fastokens test fix --------- Co-authored-by: hallerite <git@hallerite.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
fastokensis a process-global monkey-patch ontransformers.AutoTokenizerthat swaps.backend_tokenizerfor a faster Rust BPE shim. The shim has no offset-mapping support, so we kept a parallel vanilla tokenizer per model purely for body/scaffold attribution inattribute_text_segments. After the v3 collapse-or-fallback refactor (PR #87), most assistant messages render mixed-label (scaffold+content+scaffold) and run through the offset path where fastokens contributes nothing; fastokens only saves time on homogeneous-label runs and direct_encode/_decodehelpers.The complexity tax it was charging:
FASTOKENS_INCOMPATIBLEfrozenset (DeepSeek-V3 family workaround — only exists because fastokens lacks Metaspace pre-tokenizer support)_FASTOKENS_PATCH_LOCK+_FASTOKENS_ANNOUNCEDglobal state_patched_load()withcontextlib.redirect_stdoutto swallow[fastokens] patch_transformers: ...prints under thread contention_get_offset_tokenizer(PR fix: _get_offset_tokenizer immune to global fastokens patch (concurrent-pool race) #86)…wasn't earning its keep.
What's removed
fastokens>=0.2.0dep +[tool.uv.exclude-newer-package].fastokensexemptionFASTOKENS_INCOMPATIBLE,_FASTOKENS_PATCH_LOCK,_FASTOKENS_ANNOUNCED,_patched_loaduse_fastokenskwarg onload_tokenizer(signature → 1 positional arg)load_tokenizercontextlib,io_offset_tokenizerscache + lock + entire unpatch-and-reload blocktests/test_load_tokenizer_fastokens.py(211 lines, all fastokens-specific)What replaces it
_get_offset_tokenizercollapses to its essential contract — probe and return, or raise:Tokenizers from
load_tokenizerarePreTrainedTokenizerFastand satisfy this trivially. BYO tokenizers withoutreturn_offsets_mappingsupport fail loudly at construction time instead of silently triggering a reload fromname_or_paththat only existed to paper over the fastokens shim's missing offsets.The single test that exercised the BYO-no-offsets reload path is replaced with one asserting the new error contract (
test_get_offset_tokenizer_rejects_offsetless_byo).Perf impact (honest)
Encode/decode now go through HuggingFace's stock
tokenizersRust BPE instead of fastokens (claimed ~10x faster on synthetic benchmarks). The mixed-label path — which is most assistant turns post-PR-#87 — was already off fastokens before this change, so the realistic-workload regression is bounded to homogeneous-label runs and direct encode/decode helper calls.Not measured. If it matters in real training pools, revisit by adding fastokens back as the offset tokenizer once it supports
offset_mappingupstream (filed: TODO link if a fastokens issue exists). Until then, the simpler tree wins.Tests
Stats
Net −462 lines.
Followups
_get_offset_tokenizerrewrite becomes simpler/obsolete since the path-4 AutoTokenizer fallback existed for fastokens-induced reasons.transformersan optional dependency" (issue Is transformers necessary or tokenizers is enough? #31, plan at.claude/plans/enchanted-sniffing-scott.md) becomes simpler —load_tokenizeris now the onlytransformersuser on the construction path.🤖 Generated with Claude Code
Note
Medium Risk
Behavior change for callers passing
use_fastokensor relying on silent offset-capable tokenizer reload; encode performance may regress on homogeneous-label paths. Core rendering security policy is unchanged.Overview
Removes the
fastokensoptional dependency and all tokenizer monkey-patching fromrenderers.base, shrinking the load path to vanillaAutoTokenizer/PreTrainedTokenizerFastonly.load_tokenizerno longer acceptsuse_fastokens; it always goes through_load_tokenizer_via_autowith the existing security policy (pinned Kimi revisions, Llama unsloth mirrors). Deleted surfaces includeFASTOKENS_INCOMPATIBLE,_patched_load, patch locks, stdout suppression, and fastokens failure fallback._get_offset_tokenizeris simplified to a probe-and-return contract: the supplied tokenizer must supportreturn_offsets_mapping=True, or aRuntimeErroris raised. The per-model vanilla offset cache and fastokens race-safe reload path are gone—callers with BYO tokenizers must pass a fast tokenizer explicitly.Tests and lockfile drop
tests/test_load_tokenizer_fastokens.py, updatetest_load_tokenizer.py(includingtest_get_offset_tokenizer_rejects_offsetless_byo), and removefastokensfrompyproject.tomlanduv.lock.Trade-off: encode/decode use stock HuggingFace tokenizers instead of the fastokens shim (~10x encode claim); mixed-label / offset attribution paths were already on vanilla HF for offsets.
Reviewed by Cursor Bugbot for commit 33c3f39. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Remove
fastokensdependency and all related tokenizer patching logicfastokenspackage from dependencies and deletes all integration code, including_patched_load, theuse_fastokensparameter onload_tokenizer, and the_offset_tokenizerscache.load_tokenizernow always loads a vanilla fast tokenizer viaAutoTokenizerwith no retry or fallback logic._get_offset_tokenizernow raisesRuntimeErrorinstead of transparently loading a separate offset-capable tokenizer when the supplied tokenizer lacks offset mapping support.use_fastokensparameter will get aTypeErrororRuntimeErrorat runtime.Macroscope summarized 33c3f39.