Skip to content

PharenIT/Python-Toon-Schema

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TOON Format for Python

⚠️ Early Release (v0.1.x): The API is stable enough for use, but may evolve before 1.0.0.

Compact, human-readable serialization format for LLM contexts with strong token savings on mixed and nested data. Uses a minimal syntax and deterministic round-trips. Designed for fast encode/decode and practical prompt efficiency.

Key Features

Minimal syntax • Tabular arrays for uniform data • Optional auto format selection • Python 3.9+

Installation

pip install p-toon-llm

Quick Start

from toon_format import encode, decode

# Simple object
encode({"name": "Alice", "age": 30})
# {name,age|Alice|30}

# Tabular array (uniform objects)
encode([{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}])
# ^csv[id,name|1,Alice|2,Bob]

# Decode back to Python
decode("{items|[apple|banana]}")
# {'items': ['apple', 'banana']}

API Reference

encode(value, options=None)str

encode({"id": 123})
encode({"id": 123}, {"mode": "auto", "candidates": ("toon", "json", "csv")})

Options:

  1. mode: toon (default), hybrid, or auto
  2. candidates: iterable of formats for auto mode
  3. metric: tokens or chars for auto mode

decode(input_str, options=None)Any

decode("{id|123}")

Token Counting & Comparison

from toon_format import estimate_savings, compare_formats, count_tokens

data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
result = estimate_savings(data)
print(f"Saves {result['savings_percent']:.1f}% tokens")

print(compare_formats(data))

toon_str = encode(data)
print(count_tokens(toon_str))

Requires tiktoken for accurate token counts. Without it, count_tokens falls back to character length.

Format Specification

Type Example Input TOON Output
Object {"name": "Alice", "age": 30} `{name,age
Primitive Array [1, 2, 3] `[1
Tabular Array [{"id": 1, "name": "A"}, {"id": 2, "name": "B"}] `^csv[id,name
Mixed Array [{"x": 1}, 42, "hi"] `[{x

Quoting: only when necessary (empty, reserved tokens, numeric ambiguity, whitespace, delimiters)

Type Normalization: datetime/date -> ISO 8601 • Decimal -> float • NaN/Inf -> null • -0 -> 0

Auto Mode

from toon_format import encode_best, encode

best = encode_best(data, candidates=("toon", "json", "yaml", "csv"))
print(best["format"], best["text"])

print(encode(data, options={"mode": "auto", "candidates": ("toon", "json", "csv")}))

Benchmarks

TOON tends to win on mixed and nested data. CSV will usually win on flat tables. Use your own datasets for reliable results.

from toon_format import compare_formats, estimate_savings

print(compare_formats(data))
print(estimate_savings(data))

Example dataset:

data = {
    "users": [
        {"id": 1, "name": "Alice", "role": "admin"},
        {"id": 2, "name": "Bob", "role": "member"},
        {"id": 3, "name": "Cara", "role": "member"},
    ],
    "meta": {"team": "Toon", "active": True, "region": "EU"},
    "events": [
        {"type": "login", "user_id": 1, "ok": True},
        {"type": "purchase", "user_id": 2, "amount": 19.99},
        "note: rollout 1",
    ],
}

Example results (tokens and chars will vary by tokenizer and data):

Format Tokens Size (chars) Notes
TOON 203 203 Compact mixed/nested
JSON compact 293 293 Minified JSON
JSON pretty 508 508 Indented JSON
YAML 322 322 Simple YAML
CSV (users) 53 53 Table only

Development

python -m pytest

Documentation

  1. docs/index.md
  2. docs/format.md
  3. docs/api.md
  4. docs/performance.md
  5. docs/prompt_header.md

License

MIT License – see LICENSE

About

Toon Codec is a Python library for compressing structured data into a compact, human-readable and reversible format optimized for LLMs. It replaces noisy JSON, preserves structure and context clarity, and reduces token usage. Supports objects, arrays, tables and mixed lists with token counting and auto-decode.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages