⚠️ Early Release (v0.1.x): The API is stable enough for use, but may evolve before 1.0.0.
Compact, human-readable serialization format for LLM contexts with strong token savings on mixed and nested data. Uses a minimal syntax and deterministic round-trips. Designed for fast encode/decode and practical prompt efficiency.
Minimal syntax • Tabular arrays for uniform data • Optional auto format selection • Python 3.9+
pip install p-toon-llmfrom toon_format import encode, decode
# Simple object
encode({"name": "Alice", "age": 30})
# {name,age|Alice|30}
# Tabular array (uniform objects)
encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# ^csv[id,name|1,Alice|2,Bob]
# Decode back to Python
decode("{items|[apple|banana]}")
# {'items': ['apple', 'banana']}encode({"id": 123})
encode({"id": 123}, {"mode": "auto", "candidates": ("toon", "json", "csv")})Options:
mode:toon(default),hybrid, orautocandidates: iterable of formats for auto modemetric:tokensorcharsfor auto mode
decode("{id|123}")from toon_format import estimate_savings, compare_formats, count_tokens
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
result = estimate_savings(data)
print(f"Saves {result['savings_percent']:.1f}% tokens")
print(compare_formats(data))
toon_str = encode(data)
print(count_tokens(toon_str))Requires tiktoken for accurate token counts. Without it, count_tokens falls back to character length.
| Type | Example Input | TOON Output |
|---|---|---|
| Object | {"name": "Alice", "age": 30} |
`{name,age |
| Primitive Array | [1, 2, 3] |
`[1 |
| Tabular Array | [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}] |
`^csv[id,name |
| Mixed Array | [{"x": 1}, 42, "hi"] |
`[{x |
Quoting: only when necessary (empty, reserved tokens, numeric ambiguity, whitespace, delimiters)
Type Normalization: datetime/date -> ISO 8601 • Decimal -> float • NaN/Inf -> null • -0 -> 0
from toon_format import encode_best, encode
best = encode_best(data, candidates=("toon", "json", "yaml", "csv"))
print(best["format"], best["text"])
print(encode(data, options={"mode": "auto", "candidates": ("toon", "json", "csv")}))TOON tends to win on mixed and nested data. CSV will usually win on flat tables. Use your own datasets for reliable results.
from toon_format import compare_formats, estimate_savings
print(compare_formats(data))
print(estimate_savings(data))Example dataset:
data = {
"users": [
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "member"},
{"id": 3, "name": "Cara", "role": "member"},
],
"meta": {"team": "Toon", "active": True, "region": "EU"},
"events": [
{"type": "login", "user_id": 1, "ok": True},
{"type": "purchase", "user_id": 2, "amount": 19.99},
"note: rollout 1",
],
}Example results (tokens and chars will vary by tokenizer and data):
| Format | Tokens | Size (chars) | Notes |
|---|---|---|---|
| TOON | 203 | 203 | Compact mixed/nested |
| JSON compact | 293 | 293 | Minified JSON |
| JSON pretty | 508 | 508 | Indented JSON |
| YAML | 322 | 322 | Simple YAML |
| CSV (users) | 53 | 53 | Table only |
python -m pytestdocs/index.mddocs/format.mddocs/api.mddocs/performance.mddocs/prompt_header.md
MIT License – see LICENSE