cllm-tokens

A lightweight CLI utility for counting tokens in text and files.

Purpose

cllm-tokens is a command-line tool that helps you quickly count the number of tokens in text. Whether you're analyzing prompts for language models, understanding token consumption, or debugging text processing, cllm-tokens provides a simple, fast way to count tokens from stdin or files.

Vision and Goals

Vision: Build a simple, intuitive CLI that makes token counting instant and transparent.

Goals:

Provide fast, accurate token counting from stdin or files
Support multiple input methods (piping, file paths)
Display clear, actionable token count results
Offer a lightweight alternative to heavier token analysis tools
Document decision-making processes through ADRs to guide AI-assisted development

Getting Started

This project uses Vibe ADR (Architecture Decision Records) to document key decisions and guide development. See docs/decisions/ for decision records.

Prerequisites

Python 3.12 or later
uv package manager

Installation

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Then install cllm-tokens:

uv sync
uv pip install -e .

Or in one command:

uv sync && uv pip install -e .

Usage

Basic Usage

Count tokens from stdin:

echo "Hello, world!" | cllm-tokens
# Output: 5

Count tokens from a file:

cllm-tokens /path/to/file.txt
# Output: 1,234

Count tokens from multiple files:

cllm-tokens file1.txt file2.txt file3.txt

Options

--verbose or -v: Show detailed output including file names
--quiet or -q: Show only the token count (no formatting)
--total or -t: Show total count when processing multiple files
--encoding: Specify tokenization encoding (default: cl100k_base)

Supported encodings:

cl100k_base – GPT-3.5, GPT-4 (default)
p50k_base – Code models, older GPT-3 variants
r50k_base – Legacy models
o200k_base – GPT-4 Turbo and newer models

Examples

Analyze a prompt:

echo "Write a poem about programming" | cllm-tokens
# Output: 6

Check token usage of a document:

cllm-tokens myessay.txt
# Output: 1,234

Count tokens using a different encoding:

cllm-tokens --encoding p50k_base document.txt

Get detailed output for a file:

cllm-tokens --verbose document.txt
# Output: document.txt: 1,234 tokens

Count tokens from multiple files with a total:

cllm-tokens --total file1.txt file2.txt file3.txt
# Output:
# 100
# 200
# 300
#
# Total: 600 tokens

Count tokens from piped content with quiet mode:

cat document.txt | cllm-tokens --quiet
# Output: 1234

Development

Project Structure

src/cllm_tokens/ – Main package
- counter.py – Token counting logic using tiktoken
- cli.py – Command-line interface using Click
- __init__.py – Package exports
tests/ – Test suite
- test_counter.py – Token counter tests (20+ tests)
- test_cli.py – CLI interface tests (27+ tests)

Running Tests

Install the package in development mode and run tests:

uv sync
uv pip install -e .
python -m pytest tests/ -v

Code Quality

Format code with black and lint with ruff:

uv run --with black black src/ tests/
uv run --with ruff ruff check src/ tests/ --fix

Architecture

The project is built on two main ADRs:

ADR 0003: CLI interface using Click framework
- Simple decorator-based API
- Supports stdin, files, and multiple inputs
- Built-in help, verbose, and quiet modes
- Proper error handling and exit codes
ADR 0004: Token counting using tiktoken
- OpenAI's official tokenizer for GPT models
- Support for multiple encodings (cl100k_base, p50k_base, etc.)
- Fast Rust-based implementation
- Accurate token counts matching API usage

Both modules are thoroughly tested with 54+ passing tests covering:

Simple and complex text inputs
File I/O and error handling
Multiple encodings
Unicode and special characters
CLI options and flags

Decision Documentation

All significant architectural and implementation decisions are recorded in the docs/decisions/ directory as Architecture Decision Records (ADRs). See 0001-adopt-vibe-adr.md for details on how we structure decisions.

Contributing

When contributing new features or making architectural decisions:

Read the relevant ADRs in docs/decisions/
Create a new ADR for significant decisions using templates/VIBE_ADR_TEMPLATE.md
Reference ADR IDs in commit messages (e.g., "ref: 0002-token-counting")
Keep decisions linked to implementation commits for traceability

Guiding Principles

Lightweight: Keep abstractions simple and dependencies minimal
Transparent: Make token counting logic clear and auditable
AI-Collaborative: Use ADRs to guide AI agents in understanding project intent
Intentional: Document the "why" behind decisions, not just the "what"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
docs/decisions		docs/decisions
src/cllm_tokens		src/cllm_tokens
templates		templates
tests		tests
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cllm-tokens

Purpose

Vision and Goals

Getting Started

Prerequisites

Installation

Usage

Basic Usage

Options

Examples

Development

Project Structure

Running Tests

Code Quality

Architecture

Decision Documentation

Contributing

Guiding Principles

About

Uh oh!

Releases

Packages

Languages

cmd-llm/cllm-tokens

Folders and files

Latest commit

History

Repository files navigation

cllm-tokens

Purpose

Vision and Goals

Getting Started

Prerequisites

Installation

Usage

Basic Usage

Options

Examples

Development

Project Structure

Running Tests

Code Quality

Architecture

Decision Documentation

Contributing

Guiding Principles

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages