UltraToken is a CLI utility that replicates TikTokens BPE tokenizer, and utilizes OpenAIs o200K Harmony encodings for fast, precise token cost estimation.
- Zero external dependencies - entirely self contained
- Complete BPE Implementation - Full byte-pair encoding algorithm
- Embedded o200k Vocabulary - 200k token vocabulary included
- Accurate Regex Splitting - Matches OpenAI's text segmentation
- Optimized Token Mapping - Fast lookups using Map structures
- Special Token Support - Handles control tokens correctly
- Batch Processing
Install UltraToken with npm
npm i ultratoken
Or clone and run locally:
git clone https://github.com/TrintechResearch/UltraToken.git
cd UltraToken
npm install -g .| Command | Description |
|---|---|
ultratoken |
Start interactive mode |
ultratoken <text> |
Get token count for text |
ultratoken economy <file.md> |
Process word list file |
ultratoken jump |
Exit the program |
ultratoken --help |
Show help information |
ultratoken --version |
Show version information |
Start the interactive session:
ultratoken🚀 UltraToken TikToken Utility
Interactive Mode - Type words to get token counts
Commands: "jump" to exit, "help" for help
ultratoken hello world
"hello world" = 2 tokens
ultratoken programming
"programming" = 2 tokens
ultratoken The quick brown fox
"The quick brown fox" = 4 tokens
ultratoken jump
UltraToken terminated. Goodbye!
Get token count for any text:
ultratoken "machine learning"
# Output:
# Word: machine learning
# Tokens: 2
ultratoken "The quick brown fox jumps over the lazy dog"
# Output:
# Word: The quick brown fox jumps over the lazy dog
# Tokens: 9Process a file containing a list of words. UltraToken will append the token count to each line:
Input file (words.txt):
hello world
artificial intelligence
machine learning
programming
natural language processing
Command:
ultratoken economy words.txtOutput file (words.txt) after processing:
hello world 2
artificial intelligence 3
machine learning 2
programming 2
natural language processing 4
For detailed documentation and advanced features, see Documentation.md
This project is licensed under the MIT License - see the LICENSE file for details.
- Civil McKnight - Project Ultra
- TrinityAI
- Project Ultra
UltraToken is developed by TrinityAI Research, specializing in advanced LLM development.