Skip to content

Releases: DevaanshPathak/tokenscope

tokenscope v0.2.0

Choose a tag to compare

@DevaanshPathak DevaanshPathak released this 03 Jul 06:18

tokenscope v0.2.0

Second public release of tokenscope, adding optional HuggingFace Hub tokenizer loading, speed benchmarking, budget alerts, clipboard utilities, configuration file support, and stdin pipe mode.

Highlights

  • HuggingFace Hub Tokenizer Loading: Added on-demand hub tokenizer downloads. Users can load tokenizers from HF Hub using the --hub CLI flag, or by entering a model ID (e.g. gpt2 or meta-llama/Llama-3.1-8B) directly in the TUI folder browser.
  • Speed Benchmarking: Added a "Benchmark" tab to measure and compare tokenization throughput (tokens/sec and characters/sec) between primary and compare tokenizers.
  • Budget Threshold Alerts: Added an ambient budget badge in the TUI header and a dedicated row in the Stats Panel. Both dynamically update and color-code depending on utilization (green < 80%, yellow 80-95%, red > 95%, bold red > 100%).
  • Clipboard Integration: Added Ctrl+Y shortcut to instantly copy space-separated primary token IDs to the system clipboard (integrates pyperclip).
  • Configuration File Support: Added .tokenscoperc and tokenscope.json file parsing (looks for defaults in CWD first, falling back to user's home directory). Added an init-config subcommand to generate a default configuration file template.
  • stdin Pipe Mode: Enhanced headless analyze mode to read piped text inputs directly from standard input (useful for Unix pipes and automation).

Validation

  • python -m compileall .
  • python -m unittest discover -v
  • Headless stdin analysis integration verification.

tokenscope v0.1

Choose a tag to compare

@DevaanshPathak DevaanshPathak released this 27 Jun 14:30

tokenscope v0.1.0

Initial public release of tokenscope, an offline terminal tokenizer explorer for local HuggingFace tokenizer files.

Highlights

  • Added local tokenizer loading from folders, tokenizer.json, WordPiece vocab files, and BPE vocab/merge files.
  • Added interactive token spans, token IDs, offsets, bytes, statistics, selected-token inspection, decode round-trip checks, special-token views, prompt budgets, and BPE merge-tree inspection.
  • Added side-by-side compare mode with aligned diff rows, compare metrics, corpus comparison, and tokenizer metadata inspection.
  • Added corpus and batch prompt analyzers for local .txt, .md, .jsonl, .json, and .csv inputs.
  • Added project save/load, tokenizer diffing, prompt packing simulation, regression suites, Unicode inspection, RAG chunk analysis, token-count distributions, cost estimates, and tokenizer repair suggestions.
  • Added JSON, CSV, Markdown, and HTML exports.
  • Added headless analyze mode for scripts and CI checks.
  • Added PyInstaller packaging for Windows and macOS hosts, plus Docker-based Linux binary builds.

Release Assets

  • tokenscope-linux-x86_64
  • tokenscope-windows-x86_64.exe
  • tokenscope-macos-*

The Linux and Windows binaries were attached manually first. The tag build workflow also builds and uploads the macOS binary when the macOS runner completes successfully.

Validation

  • python -m compileall .
  • python -m unittest discover -v
  • Windows binary --help and headless analyze smoke test.
  • Linux binary --help and headless analyze smoke test in Docker.