Skip to content

TheTokenCompany/the-token-company-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Token Company Python SDK

Compress LLM prompts to reduce costs and latency. 100K tokens compressed in ~85ms.

Install

pip install the-token-company

Quick start

from thetokencompany import TheTokenCompany

client = TheTokenCompany(api_key="ttc-...")
result = client.compress("Your long prompt text here...", model="bear-2")

print(result.output)           # compressed text
print(result.tokens_saved)     # tokens removed
print(result.compression_ratio)  # e.g. 1.8

SDK wrappers

Drop-in wrappers that auto-compress all non-assistant messages before sending to your LLM. Assistant messages pass through unchanged so the provider's KV cache stays warm.

OpenAI / OpenRouter

from openai import OpenAI
from thetokencompany.openai import with_compression

client = with_compression(OpenAI(), compression_api_key="ttc-...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant..."},
        {"role": "user", "content": "Summarize these results..."},
    ],
)

Works with AsyncOpenAI too — the wrapper detects async automatically.

Anthropic

from anthropic import Anthropic
from thetokencompany.anthropic import with_compression

client = with_compression(Anthropic(), compression_api_key="ttc-...")

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant...",
    messages=[{"role": "user", "content": "Summarize these results..."}],
)

Both messages and the system parameter are compressed.

Async

from thetokencompany import AsyncTheTokenCompany

async with AsyncTheTokenCompany(api_key="ttc-...") as client:
    result = await client.compress("Your long prompt text...")

Models

Model Description
bear-2 Latest, recommended
bear-1.2 Previous generation
bear-1.1 Legacy
bear-1 Legacy

Aggressiveness

Control compression intensity with aggressiveness (0.0 – 1.0, default 0.5):

result = client.compress(text, model="bear-2", aggressiveness=0.8)

Gzip

Enable gzip compression of request payloads for better performance on large inputs (up to 2.2x faster on 1M+ tokens):

client = TheTokenCompany(api_key="ttc-...", gzip=True)

Protect text from compression

Use protect() to wrap content in <ttc_safe> tags — protected text passes through unchanged:

from thetokencompany import protect

prompt = f"{protect('system:')} You are a helpful assistant.\n{protect('user:')} Hello!"
result = client.compress(prompt, model="bear-2")

Response

CompressResponse fields:

Field Type Description
output str Compressed text
output_tokens int Token count after compression
input_tokens int Token count before compression
tokens_saved int Tokens removed
compression_ratio float Ratio (e.g. 1.8x)

License

MIT

About

Python SDK for The Token Company — compress LLM prompts to reduce costs and latency

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages