Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- **Importance-weighted retention** (`importanceScoring: true`) — per-message importance scoring based on forward-reference density (how many later messages share entities with this one), decision/correction content signals, and recency. Messages scoring above `importanceThreshold` (default 0.35) are preserved even outside the recency window. `forceConverge` truncates low-importance messages first. New stats: `messages_importance_preserved`.
- **Contradiction detection** (`contradictionDetection: true`) — detects later messages that correct or override earlier ones using topic-overlap gating (word-level Jaccard) and correction signal patterns (`actually`, `don't use`, `instead`, `scratch that`, etc.). Superseded messages are compressed with a provenance annotation (`[cce:superseded by ...]`) linking to the correction. New stats: `messages_contradicted`. New decision action: `contradicted`.
- New exports: `computeImportance`, `scoreContentSignals`, `DEFAULT_IMPORTANCE_THRESHOLD`, `analyzeContradictions` for standalone use outside `compress()`.
- New types: `ImportanceMap`, `ContradictionAnnotation`.

## [1.1.0] - 2026-03-19

### Added
Expand Down
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,9 @@ messages → classify → dedup → merge → summarize → size guard → resul

- **classify** (`src/classify.ts`) — three-tier classification (T0 = preserve verbatim, T2 = compressible prose, T3 = filler/removable). Uses structural pattern detection (code fences, JSON, YAML, LaTeX), SQL/API-key anchors, and prose density scoring.
- **dedup** (`src/dedup.ts`) — exact (djb2 hash + full comparison) and fuzzy (line-level Jaccard similarity) duplicate detection. Earlier duplicates are replaced with compact references.
- **compress** (`src/compress.ts`) — orchestrator. Handles message merging, code-bearing message splitting (prose compressed, fences preserved inline), budget binary search over `recencyWindow`, and `forceConverge` hard-truncation.
- **importance** (`src/importance.ts`) — per-message importance scoring: forward-reference density (how many later messages share entities), decision/correction content signals, and recency bonus. High-importance messages resist compression even outside recency window. Opt-in via `importanceScoring: true`.
- **contradiction** (`src/contradiction.ts`) — detects later messages that correct/override earlier ones (topic-overlap gating + correction signal patterns like "actually", "don't use", "instead"). Superseded messages are compressed with provenance annotations. Opt-in via `contradictionDetection: true`.
- **compress** (`src/compress.ts`) — orchestrator. Handles message merging, code-bearing message splitting (prose compressed, fences preserved inline), budget binary search over `recencyWindow`, and `forceConverge` hard-truncation (importance-aware ordering when `importanceScoring` is on).
- **summarize** (internal in `compress.ts`) — deterministic sentence scoring: rewards technical identifiers (camelCase, snake_case), emphasis phrases, status words; penalizes filler. Paragraph-aware to keep topic boundaries.
- **summarizer** (`src/summarizer.ts`) — LLM-powered summarization. `createSummarizer` wraps an LLM call with a prompt template. `createEscalatingSummarizer` adds three-level fallback: normal → aggressive → deterministic.
- **expand** (`src/expand.ts`) — `uncompress()` restores originals from a `VerbatimMap` or lookup function. Supports recursive expansion for multi-round compression chains (max depth 10).
Expand Down
106 changes: 106 additions & 0 deletions bench/baseline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,23 @@ export interface RetentionResult {
structuralRetention: number;
}

export interface AncsResult {
baselineRatio: number;
importanceRatio: number;
contradictionRatio: number;
combinedRatio: number;
importancePreserved: number;
contradicted: number;
}

export interface BenchmarkResults {
basic: Record<string, BasicResult>;
tokenBudget: Record<string, TokenBudgetResult>;
dedup: Record<string, DedupResult>;
fuzzyDedup: Record<string, FuzzyDedupResult>;
bundleSize: Record<string, BundleSizeResult>;
retention?: Record<string, RetentionResult>;
ancs?: Record<string, AncsResult>;
}

export interface Baseline {
Expand Down Expand Up @@ -413,6 +423,71 @@ export function compareResults(
checkNum(regressions, 'fuzzyDedup', name, 'ratio', exp.ratio, act.ratio, tolerance);
}

// ANCS
if (baseline.ancs && current.ancs) {
for (const [name, exp] of Object.entries(baseline.ancs)) {
const act = current.ancs[name];
if (!act) {
missing(regressions, 'ancs', name);
continue;
}
checkNum(
regressions,
'ancs',
name,
'baselineRatio',
exp.baselineRatio,
act.baselineRatio,
tolerance,
);
checkNum(
regressions,
'ancs',
name,
'importanceRatio',
exp.importanceRatio,
act.importanceRatio,
tolerance,
);
checkNum(
regressions,
'ancs',
name,
'contradictionRatio',
exp.contradictionRatio,
act.contradictionRatio,
tolerance,
);
checkNum(
regressions,
'ancs',
name,
'combinedRatio',
exp.combinedRatio,
act.combinedRatio,
tolerance,
);
checkNum(
regressions,
'ancs',
name,
'importancePreserved',
exp.importancePreserved,
act.importancePreserved,
tolerance,
);
checkNum(
regressions,
'ancs',
name,
'contradicted',
exp.contradicted,
act.contradicted,
tolerance,
);
}
}

// Bundle size
for (const [name, exp] of Object.entries(baseline.bundleSize ?? {})) {
const act = current.bundleSize?.[name];
Expand Down Expand Up @@ -652,6 +727,7 @@ const SHORT_NAMES: Record<string, string> = {
'Technical explanation': 'Technical',
'Structured content': 'Structured',
'Agentic coding session': 'Agentic',
'Iterative design': 'Iterative',
};

function shortName(name: string): string {
Expand Down Expand Up @@ -864,6 +940,29 @@ function generateDedupSection(r: BenchmarkResults): string[] {
return lines;
}

function generateAncsSection(r: BenchmarkResults): string[] {
if (!r.ancs || Object.keys(r.ancs).length === 0) return [];

const lines: string[] = [];
lines.push('## ANCS-Inspired Features');
lines.push('');
lines.push(
'> Importance scoring preserves high-value messages outside the recency window. ' +
'Contradiction detection compresses superseded messages.',
);
lines.push('');
lines.push(
'| Scenario | Baseline | +Importance | +Contradiction | Combined | Imp. Preserved | Contradicted |',
);
lines.push('| --- | ---: | ---: | ---: | ---: | ---: | ---: |');
for (const [name, v] of Object.entries(r.ancs)) {
lines.push(
`| ${name} | ${fix(v.baselineRatio)} | ${fix(v.importanceRatio)} | ${fix(v.contradictionRatio)} | ${fix(v.combinedRatio)} | ${v.importancePreserved} | ${v.contradicted} |`,
);
}
return lines;
}

function generateTokenBudgetSection(r: BenchmarkResults): string[] {
const lines: string[] = [];
const entries = Object.entries(r.tokenBudget);
Expand Down Expand Up @@ -1113,6 +1212,13 @@ export function generateBenchmarkDocs(baselinesDir: string, outputPath: string):
lines.push(...generateDedupSection(latest.results));
lines.push('');

// --- ANCS ---
const ancsSection = generateAncsSection(latest.results);
if (ancsSection.length > 0) {
lines.push(...ancsSection);
lines.push('');
}

// --- Token budget ---
lines.push(...generateTokenBudgetSection(latest.results));
lines.push('');
Expand Down
48 changes: 41 additions & 7 deletions bench/baselines/current.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"version": "1.1.0",
"generated": "2026-03-20T15:50:37.630Z",
"generated": "2026-03-20T18:05:08.551Z",
"results": {
"basic": {
"Coding assistant": {
Expand Down Expand Up @@ -200,8 +200,12 @@
"gzipBytes": 4452
},
"compress.js": {
"bytes": 48312,
"gzipBytes": 10901
"bytes": 53439,
"gzipBytes": 11671
},
"contradiction.js": {
"bytes": 7700,
"gzipBytes": 2717
},
"dedup.js": {
"bytes": 10260,
Expand All @@ -215,9 +219,13 @@
"bytes": 11923,
"gzipBytes": 2941
},
"importance.js": {
"bytes": 4759,
"gzipBytes": 1849
},
"index.js": {
"bytes": 608,
"gzipBytes": 311
"bytes": 854,
"gzipBytes": 405
},
"summarizer.js": {
"bytes": 2542,
Expand All @@ -228,8 +236,8 @@
"gzipBytes": 31
},
"total": {
"bytes": 96252,
"gzipBytes": 26383
"bytes": 114084,
"gzipBytes": 31813
}
},
"retention": {
Expand Down Expand Up @@ -273,6 +281,32 @@
"entityRetention": 0.918918918918919,
"structuralRetention": 1
}
},
"ancs": {
"Deep conversation": {
"baselineRatio": 2.3650251770931128,
"importanceRatio": 2.3650251770931128,
"contradictionRatio": 2.3650251770931128,
"combinedRatio": 2.3650251770931128,
"importancePreserved": 0,
"contradicted": 0
},
"Agentic coding session": {
"baselineRatio": 1.4749403341288783,
"importanceRatio": 1.2383115148276784,
"contradictionRatio": 1.4749403341288783,
"combinedRatio": 1.2383115148276784,
"importancePreserved": 4,
"contradicted": 0
},
"Iterative design": {
"baselineRatio": 1.6188055908513341,
"importanceRatio": 1.2567200986436498,
"contradictionRatio": 1.61572606214331,
"combinedRatio": 1.2567200986436498,
"importancePreserved": 6,
"contradicted": 2
}
}
}
}
Loading
Loading