Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [1.3.0] - 2026-03-21

### Added

- **Quality benchmark overhaul** — replaced broken metrics (keywordRetention, factRetention, negationErrors) with five meaningful ones: task-based probes (~70 across 13 scenarios), information density, compressed-only quality score, negative compression detection, and summary coherence checks.
- **Task-based probes** — hand-curated per-scenario checks that verify whether specific critical information (identifiers, code patterns, config values) survives compression. Probe failures surface real quality issues.
- **LLM-as-judge scoring** (`--llm-judge` flag) — optional LLM evaluation of compression quality. Multi-provider support: OpenAI, Anthropic, Gemini (`@google/genai`), Ollama. Display-only, not used for regression testing.
- **Gemini provider** for LLM benchmarks via `GEMINI_API_KEY` env var (default model: `gemini-2.5-flash`).
- **Opt-in feature comparison** (`--features` flag) — runs quality benchmark with each opt-in feature enabled to measure their impact vs baseline.
- **Quality history documentation** (`docs/quality-history.md`) — version-over-version quality tracking across v1.0.0, v1.1.0, v1.2.0 with opt-in feature impact analysis.
- **Min-output-chars probes** to catch over-aggressive compression.
- **Code block language aliases** in benchmarks (typescript/ts, python/py, yaml/yml).
- New npm scripts: `bench:quality:judge`, `bench:quality:features`.

### Changed

- Coherence and negative compression regression thresholds now track increases from baseline, not just zero-to-nonzero transitions.
- Information density regression check only applies when compression actually occurs (ratio > 1.01).
- Quality benchmark table now shows: `Ratio EntRet CodeOK InfDen Probes Pass NegCp Coher CmpQ`.
- `analyzeQuality()` accepts optional `CompressOptions` for feature testing.

### Removed

- `keywordRetention` metric (tautological — 100% on 12/13 scenarios).
- `factRetention` and `factCount` metrics (fragile regex-based fact extractor).
- `negationErrors` metric (noisy, rarely triggered).
- `extractFacts()` and `analyzeSemanticFidelity()` functions.

## [1.2.0] - 2026-03-20

### Added
Expand Down
9 changes: 7 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ npm run format # Prettier write
npm run format:check # Prettier check
npm run bench # Run benchmark suite
npm run bench:save # Run, save baseline, regenerate docs/benchmark-results.md
npm run bench:quality # Run quality benchmark (probes, coherence, info density)
npm run bench:quality:save # Save quality baseline
npm run bench:quality:check # Compare against quality baseline
npm run bench:quality:judge # Run with LLM-as-judge (requires API key)
npm run bench:quality:features # Compare opt-in features vs baseline
```

Run a single test file:
Expand Down Expand Up @@ -65,7 +70,7 @@ main ← develop ← feature branches
- **TypeScript:** ES2020 target, NodeNext module resolution, strict mode, ESM-only
- **Unused params** must be prefixed with `_` (ESLint enforced)
- **Prettier:** 100 char width, 2-space indent, single quotes, trailing commas, semicolons
- **Tests:** Vitest 4, test files in `tests/`, coverage via `@vitest/coverage-v8` (Node 20+ only)
- **Node version:** ≥18 (.nvmrc: 22)
- **Tests:** Vitest 4, test files in `tests/`, coverage via `@vitest/coverage-v8`
- **Node version:** ≥20 (.nvmrc: 22)
- **Always run `npm run format` before committing** — CI enforces `format:check`
- **No author/co-author attribution** in commits, code, or docs
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,11 @@ const { messages: originals } = uncompress(compressed, verbatim);

No API keys. No network calls. Runs synchronously by default. Under 2ms for typical conversations.

The classifier is content-aware, not domain-specific. It preserves structured data (code, JSON, SQL, tables, citations, formulas) and compresses surrounding prose — optimized for LLM conversations and technical documentation.
The classifier is content-aware, not domain-specific. It preserves structured data (code, JSON, SQL, tables, citations, formulas) and compresses surrounding prose — making it useful anywhere dense reference material is mixed with natural language: LLM conversations, legal briefs, medical records, technical documentation, support logs.

## Key findings

The deterministic engine achieves **1.3-6.1x compression with zero latency and zero cost.** It scores sentences, packs a budget, strips filler — and in most scenarios, it compresses tighter than an LLM. LLM summarization is opt-in for cases where semantic understanding improves quality. See [Benchmarks](docs/benchmarks.md) for methodology and [Benchmark Results](docs/benchmark-results.md) for the latest numbers and version history.
The deterministic engine achieves **1.3-6.1x compression with zero latency and zero cost.** It scores sentences, packs a budget, strips filler — and in most scenarios, it compresses tighter than an LLM. LLM summarization is opt-in for cases where semantic understanding improves quality. See [Benchmarks](docs/benchmarks.md) for methodology, [Benchmark Results](docs/benchmark-results.md) for the latest numbers, and [Quality History](docs/quality-history.md) for version-over-version quality tracking.

## Features

Expand Down
Loading
Loading