-
Notifications
You must be signed in to change notification settings - Fork 955
Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
mismatch in wordpiece vocab to use AHashMap instead of HashMap
#1844
opened Aug 12, 2025 by
rinechran
Loading…
Tokenizer: Add native async bindings, via py03-async-runtimes.
#1843
opened Aug 10, 2025 by
michaelfeil
Loading…
chore(trainers): add
__init__
to fix python type check errors in WordLevelTrainer
#1838
opened Jul 31, 2025 by
shenxiangzhuang
Loading…
Adding multiprocessing for sentencepiece_extractor
#1804
opened Jun 19, 2025 by
AamodThakur
Loading…
Expose
Encoding
attributes via the buffer protocol interface
#1789
opened Jun 4, 2025 by
mariosasko
Loading…
Add benchmark for deserializing large added vocab + optimizations
#1782
opened May 27, 2025 by
ArthurZucker
•
Draft
Pre-tokenizers that support multi-word/non-whitespace BPE in single pass
#1753
opened Mar 22, 2025 by
mjbommar
Loading…
Add FxHash and ShortStringOptimization.
#1733
opened Feb 10, 2025 by
MeetThePatel
Loading…
3 of 4 tasks
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.