Releases: DevaanshPathak/tokenscope
Releases · DevaanshPathak/tokenscope
Release list
tokenscope v0.2.0
tokenscope v0.2.0
Second public release of tokenscope, adding optional HuggingFace Hub tokenizer loading, speed benchmarking, budget alerts, clipboard utilities, configuration file support, and stdin pipe mode.
Highlights
- HuggingFace Hub Tokenizer Loading: Added on-demand hub tokenizer downloads. Users can load tokenizers from HF Hub using the
--hubCLI flag, or by entering a model ID (e.g.gpt2ormeta-llama/Llama-3.1-8B) directly in the TUI folder browser. - Speed Benchmarking: Added a "Benchmark" tab to measure and compare tokenization throughput (tokens/sec and characters/sec) between primary and compare tokenizers.
- Budget Threshold Alerts: Added an ambient budget badge in the TUI header and a dedicated row in the Stats Panel. Both dynamically update and color-code depending on utilization (green < 80%, yellow 80-95%, red > 95%, bold red > 100%).
- Clipboard Integration: Added
Ctrl+Yshortcut to instantly copy space-separated primary token IDs to the system clipboard (integratespyperclip). - Configuration File Support: Added
.tokenscopercandtokenscope.jsonfile parsing (looks for defaults in CWD first, falling back to user's home directory). Added aninit-configsubcommand to generate a default configuration file template. - stdin Pipe Mode: Enhanced headless
analyzemode to read piped text inputs directly from standard input (useful for Unix pipes and automation).
Validation
python -m compileall .python -m unittest discover -v- Headless stdin analysis integration verification.
tokenscope v0.1
tokenscope v0.1.0
Initial public release of tokenscope, an offline terminal tokenizer explorer for local HuggingFace tokenizer files.
Highlights
- Added local tokenizer loading from folders,
tokenizer.json, WordPiece vocab files, and BPE vocab/merge files. - Added interactive token spans, token IDs, offsets, bytes, statistics, selected-token inspection, decode round-trip checks, special-token views, prompt budgets, and BPE merge-tree inspection.
- Added side-by-side compare mode with aligned diff rows, compare metrics, corpus comparison, and tokenizer metadata inspection.
- Added corpus and batch prompt analyzers for local
.txt,.md,.jsonl,.json, and.csvinputs. - Added project save/load, tokenizer diffing, prompt packing simulation, regression suites, Unicode inspection, RAG chunk analysis, token-count distributions, cost estimates, and tokenizer repair suggestions.
- Added JSON, CSV, Markdown, and HTML exports.
- Added headless
analyzemode for scripts and CI checks. - Added PyInstaller packaging for Windows and macOS hosts, plus Docker-based Linux binary builds.
Release Assets
tokenscope-linux-x86_64tokenscope-windows-x86_64.exetokenscope-macos-*
The Linux and Windows binaries were attached manually first. The tag build workflow also builds and uploads the macOS binary when the macOS runner completes successfully.
Validation
python -m compileall .python -m unittest discover -v- Windows binary
--helpand headlessanalyzesmoke test. - Linux binary
--helpand headlessanalyzesmoke test in Docker.