Highlights

HuggingFace Hub Tokenizer Loading: Added on-demand hub tokenizer downloads. Users can load tokenizers from HF Hub using the --hub CLI flag, or by entering a model ID (e.g. gpt2 or meta-llama/Llama-3.1-8B) directly in the TUI folder browser.

Speed Benchmarking: Added a "Benchmark" tab to measure and compare tokenization throughput (tokens/sec and characters/sec) between primary and compare tokenizers.

Budget Threshold Alerts: Added an ambient budget badge in the TUI header and a dedicated row in the Stats Panel. Both dynamically update and color-code depending on utilization (green < 80%, yellow 80-95%, red > 95%, bold red > 100%).

Clipboard Integration: Added Ctrl+Y shortcut to instantly copy space-separated primary token IDs to the system clipboard (integrates pyperclip).

Configuration File Support: Added .tokenscoperc and tokenscope.json file parsing (looks for defaults in CWD first, falling back to user's home directory). Added an init-config subcommand to generate a default configuration file template.

stdin Pipe Mode: Enhanced headless analyze mode to read piped text inputs directly from standard input (useful for Unix pipes and automation).

tokenscope v0.1.0

Initial public release of tokenscope, an offline terminal tokenizer explorer for local HuggingFace tokenizer files.

Highlights

Added local tokenizer loading from folders, tokenizer.json, WordPiece vocab files, and BPE vocab/merge files.
Added interactive token spans, token IDs, offsets, bytes, statistics, selected-token inspection, decode round-trip checks, special-token views, prompt budgets, and BPE merge-tree inspection.
Added side-by-side compare mode with aligned diff rows, compare metrics, corpus comparison, and tokenizer metadata inspection.
Added corpus and batch prompt analyzers for local .txt, .md, .jsonl, .json, and .csv inputs.
Added project save/load, tokenizer diffing, prompt packing simulation, regression suites, Unicode inspection, RAG chunk analysis, token-count distributions, cost estimates, and tokenizer repair suggestions.
Added JSON, CSV, Markdown, and HTML exports.
Added headless analyze mode for scripts and CI checks.
Added PyInstaller packaging for Windows and macOS hosts, plus Docker-based Linux binary builds.

Release Assets

tokenscope-linux-x86_64
tokenscope-windows-x86_64.exe
tokenscope-macos-*

The Linux and Windows binaries were attached manually first. The tag build workflow also builds and uploads the macOS binary when the macOS runner completes successfully.

Validation

python -m compileall .
python -m unittest discover -v
Windows binary --help and headless analyze smoke test.
Linux binary --help and headless analyze smoke test in Docker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release list

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

tokenscope v0.2.0

Highlights

Validation

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

tokenscope v0.1.0

Highlights

Release Assets

Validation

Uh oh!

Releases: DevaanshPathak/tokenscope

Release list

tokenscope v0.2.0

tokenscope v0.2.0

Highlights

Validation

Uh oh!

tokenscope v0.1

tokenscope v0.1.0

Highlights

Release Assets

Validation

Uh oh!