Compress LLM prompts to reduce costs and latency. 100K tokens compressed in ~85ms.
pip install the-token-companyfrom thetokencompany import TheTokenCompany
client = TheTokenCompany(api_key="ttc-...")
result = client.compress("Your long prompt text here...", model="bear-2")
print(result.output) # compressed text
print(result.tokens_saved) # tokens removed
print(result.compression_ratio) # e.g. 1.8Drop-in wrappers that auto-compress all non-assistant messages before sending to your LLM. Assistant messages pass through unchanged so the provider's KV cache stays warm.
from openai import OpenAI
from thetokencompany.openai import with_compression
client = with_compression(OpenAI(), compression_api_key="ttc-...")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Summarize these results..."},
],
)Works with AsyncOpenAI too — the wrapper detects async automatically.
from anthropic import Anthropic
from thetokencompany.anthropic import with_compression
client = with_compression(Anthropic(), compression_api_key="ttc-...")
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful assistant...",
messages=[{"role": "user", "content": "Summarize these results..."}],
)Both messages and the system parameter are compressed.
from thetokencompany import AsyncTheTokenCompany
async with AsyncTheTokenCompany(api_key="ttc-...") as client:
result = await client.compress("Your long prompt text...")| Model | Description |
|---|---|
bear-2 |
Latest, recommended |
bear-1.2 |
Previous generation |
bear-1.1 |
Legacy |
bear-1 |
Legacy |
Control compression intensity with aggressiveness (0.0 – 1.0, default 0.5):
result = client.compress(text, model="bear-2", aggressiveness=0.8)Enable gzip compression of request payloads for better performance on large inputs (up to 2.2x faster on 1M+ tokens):
client = TheTokenCompany(api_key="ttc-...", gzip=True)Use protect() to wrap content in <ttc_safe> tags — protected text passes through unchanged:
from thetokencompany import protect
prompt = f"{protect('system:')} You are a helpful assistant.\n{protect('user:')} Hello!"
result = client.compress(prompt, model="bear-2")CompressResponse fields:
| Field | Type | Description |
|---|---|---|
output |
str |
Compressed text |
output_tokens |
int |
Token count after compression |
input_tokens |
int |
Token count before compression |
tokens_saved |
int |
Tokens removed |
compression_ratio |
float |
Ratio (e.g. 1.8x) |
MIT