Skip to content

TokenLens is a Python‑based web app that helps you visualize how large language models tokenize text right in your browser. Instead of guessing how a prompt is broken up into tokens (which drives cost and behavior in services like OpenAI), TokenLens shows it live with colors and counts.

License

Notifications You must be signed in to change notification settings

NinjaOfNeurons/TokenLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔬 TokenLens

Wanna see how AI chops your text into tiny weird pieces? This shows it.

Watch it do its thing 🎥:

Screen.Recording.2026-02-27.at.10.47.03.PM.mov

What even is TokenLens?

AI doesn’t read like humans. It breaks stuff into tokens — sometimes words, sometimes weird fragments, sometimes punctuation.

TokenLens lets you:

  • See why prompts cost $$$
  • Figure out why AI freaks out sometimes
  • Compare how different models slice your text

Features (or whatever)

  • Colors for tokens (because why not)
  • Token IDs, indexes, counts, ratios
  • Shows API cost for GPT-4o, GPT-4, GPT-3.5 💸
  • Flip between 4 encoders
  • Updates as you type ⚡

Supported Encoders

Encoding Models Vocab
cl100k_base GPT-4, GPT-3.5-turbo 100k-ish
o200k_base GPT-4o 200k-ish
p50k_base text-davinci-002/003 50k-ish
r50k_base GPT-2, GPT-3 50k-ish

Run the Web App (super easy)

git clone https://github.com/your-username/TokenLens.git
cd TokenLens
python -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
pip install -r requirements.txt
streamlit run app.py

Open http://localhost:8501 and watch the magic happen ✨


Run the CLI (also super easy)

For terminal lovers, TokenLens has a built-in CLI module.

# Tokenize direct text input
python -m tokenlens.cli "This is a test of the tokenlens engine."

# Pipe data from other tools
echo "Pipe me!" | python -m tokenlens.cli

# Check options
python -m tokenlens.cli --help

CLI Flags:

  • -e, --encoder: Choose your encoder (e.g., o200k_base)
  • -q, --quiet: Output just the integer token count for scripts
  • --no-color: Disable ANSI color background output
  • -s, --stats: Force show detailed stats/costs

Token money stuff

Model Cost / 1M tokens
GPT-4o $5
GPT-4 $30
GPT-3.5-turbo $0.50

Check OpenAI if prices matter.


Why tokens even matter

  • Tokens = text chunks (~4 chars each)
  • BPE = AI’s way of merging bytes until vocab is big
  • Costs, context windows, weird splits = all token stuff

Project tree (looks organized, kinda)

TokenLens/
├── tokenlens/
│   ├── __init__.py   ← package marker
│   ├── core.py       ← tokenization heart
│   └── cli.py        ← the cool terminal tool
├── app.py            ← web app
├── requirements.txt  ← boring dependencies
└── README.md         ← this lazy thing

License

MIT. Do what you want. Seriously.


Built to learn. Inspired by tiktokenizer.vercel.app

About

TokenLens is a Python‑based web app that helps you visualize how large language models tokenize text right in your browser. Instead of guessing how a prompt is broken up into tokens (which drives cost and behavior in services like OpenAI), TokenLens shows it live with colors and counts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages