-
Notifications
You must be signed in to change notification settings - Fork 1k
Compression Language Packs
Caveman compression can load language-specific rule packs in addition to the built-in English rules. This keeps the core engine stable while allowing Portuguese, Spanish, German, French, Japanese, and future language packs to evolve independently.
Language packs live under:
open-sse/services/compression/rules/<language>/Current shipped packs (verified against rules/ directory contents):
| Language | Directory | Rule categories present |
|---|---|---|
| English | rules/en/ |
context, dedup, filler, structural, ultra
|
| Spanish | rules/es/ |
context, dedup, filler, structural, ultra
|
| Portuguese (Brazil) | rules/pt-BR/ |
context, filler, structural
|
| German | rules/de/ |
context, filler, structural
|
| French | rules/fr/ |
context, filler, structural
|
| Japanese | rules/ja/ |
context, filler, structural
|
Parity note:
enandespacks have the full 5 categories;pt-BR,de,fr,jaship 3 categories. The missingdedupandultracategories silently fall back to the English built-ins. Contributions welcome to adddedup.jsonandultra.jsonfor the smaller packs.The canonical category list and per-category schema live in
open-sse/services/compression/rules/_schema.json(JSON Schema draft 2020-12).
languageDetector.ts uses lightweight heuristics to infer the language from prompt text. The
configured default language is still respected, and detection can be disabled by config when exact
control is required.
Detection output is used only to choose rule packs. It does not change provider routing, locale selection, or UI language.
Compression settings can include:
{
"languageConfig": {
"enabled": true,
"defaultLanguage": "en",
"autoDetect": true,
"enabledPacks": ["en", "pt-BR", "es", "de", "fr", "ja"]
},
"cavemanConfig": {
"language": "en",
"autoDetectLanguage": true,
"enabledLanguagePacks": ["en", "pt-BR", "es", "de", "fr", "ja"]
}
}languageConfig controls dashboard/preview defaults. cavemanConfig is the runtime engine config
used when Caveman compresses message text.
- Create
open-sse/services/compression/rules/<language>/<pack>.json. - Use the Caveman rule format from
docs/compression/COMPRESSION_RULES_FORMAT.md. - Keep replacements conservative and avoid changing code, identifiers, URLs, or JSON.
- Add or update tests for language selection and replacement behavior.
- Expose new dashboard/i18n labels if the language appears in UI selectors.
Available packs can be queried with:
curl http://localhost:20128/api/compression/language-packsThe preview endpoint accepts language config overrides:
curl -X POST http://localhost:20128/api/compression/preview \
-H "Content-Type: application/json" \
-d '{
"mode": "standard",
"text": "Por favor, eu gostaria que voce basicamente resumisse isso.",
"config": {
"languageConfig": {
"defaultLanguage": "pt-BR",
"autoDetect": true
}
}
}'All 6 language packs received a SHARED_BOUNDARIES clause in v3.8.0 that is applied at every
Caveman intensity (LITE, FULL, ULTRA). It instructs the engine to preserve these patterns verbatim,
regardless of surrounding filler removal:
| Pattern type | Example |
|---|---|
| Fenced code blocks | ```python\n...\n``` |
| Inline code | `my_var` |
| URLs | https://example.com/path |
| File paths (absolute + relative) |
/etc/hosts, ./src/index.ts
|
| Error headers |
Error:, TypeError:, SyntaxError:
|
| Stack trace lines | at functionName (file.ts:12:3) |
These patterns are populated in DEFAULT_CAVEMAN_CONFIG.preservePatterns (previously []). The
constant lives in open-sse/services/compression/types.ts.
Without SHARED_BOUNDARIES, aggressive Caveman modes could strip content that looked like repetitive prose but was actually a code snippet, file path, or error stack. SHARED_BOUNDARIES acts as a language-agnostic safety net applied before filler rules run.
Additional patterns can be added at runtime via compression settings:
{
"cavemanConfig": {
"preservePatterns": [
"```[\\s\\S]*?```",
"`[^`]+`",
"https?://\\S+",
"(?:/|\\./)[^\\s]+",
"\\b(?:Error|TypeError|SyntaxError|RangeError):",
"\\s+at\\s+\\S+\\s+\\(\\S+:\\d+:\\d+\\)"
]
}
}Custom patterns extend (not replace) the 6 defaults.
- English built-in rules remain the fallback when a language pack is missing.
- Invalid built-in JSON packs fail validation so release assets do not silently degrade.
- Rule packs are data-only and should not import code or run arbitrary logic.
- The compression analytics layer records the selected mode and engine, not full prompt text.
OmniRoute · Website · npm · Docker Hub
- Setup Guide
- User Guide
- Features
- Quick Start (Docker)
- Electron Desktop App
- Termux (Android)
- PWA Guide
- MCP Server
- A2A Server
- Agent Protocols
- OpenCode Plugin
- Webhooks
- Cloud Agents
- Skills
- Memory
- Evals
- Gamification
- Guardrails
- Compliance
- Error Sanitization
- Public Credentials
- Route Guard Tiers
- Stealth Guide
- CLI Token Auth