Conversation
Add two ANCS-inspired features as opt-in options: - importanceScoring: scores messages by forward-reference density, decision/correction content, and recency. High-importance messages are preserved outside the recency window. forceConverge truncates low-importance messages first. - contradictionDetection: detects later messages that correct earlier ones (via topic overlap + correction signal patterns). Superseded messages are compressed with a provenance annotation linking to the correction. Both features are off by default — zero impact on existing behavior. 28 new tests (540 total), zero TS errors.
- CLAUDE.md: add importance and contradiction modules to architecture - CHANGELOG.md: add [Unreleased] section with both features - api-reference.md: add 4 new CompressOptions, 2 new CompressResult stats, new exports section for importance/contradiction - compression-pipeline.md: add importance + contradiction to classification order, add contradiction output format
- Add iterative design scenario with architectural corrections to exercise contradiction detection and importance scoring - Add ANCS Features benchmark section comparing baseline vs importance vs contradiction vs combined, with round-trip verification - Add AncsResult type, regression comparison, and doc generation - Replace hardcoded English stopword list with IDF-weighted filtering (language-agnostic, adapts to message content) - Switch from Jaccard to Sørensen-Dice similarity (better sensitivity for short-document topic overlap) - Use smoothed IDF log(1+N/df) with fallback to unweighted Dice for < 3 documents
| function extractMessageEntities(content: string): Set<string> { | ||
| const entities = new Set<string>(); | ||
| for (const re of [CAMEL_RE, PASCAL_RE, SNAKE_RE, VOWELLESS_RE, FILE_REF_RE]) { | ||
| const matches = content.match(re); |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
- Fix unused `_` binding in importance test (use `.values()` iterator) - Fix stale JSDoc referencing BM25 when formula is smoothed IDF - Fix API docs referencing Jaccard when similarity is IDF-weighted Dice - Add camelCase/PascalCase/snake_case extraction to contradiction topic words — these identifiers carry the most topic signal - Document importanceScoring + tokenBudget interaction in API reference
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
importanceScoring: true) — preserves high-value messages outside the recency window based on forward-reference density, decision content signals, and recency bonuscontradictionDetection: true) — identifies later messages that correct/override earlier ones via IDF-weighted Sørensen-Dice topic overlap + correction signal patterns. Superseded messages are compressed with provenance annotationslog(1+N/df)), falling back to unweighted Dice for < 3 messagesiterativeDesignscenario with architectural corrections, comparing baseline vs importance vs contradiction vs combined across 3 scenariosBenchmark results (ANCS section)
All existing benchmarks unchanged (features are opt-in). 540 tests pass.
Test plan
npm test— 540 tests pass (28 new: contradiction, importance, ANCS integration)npm run bench— all scenarios PASS round-trip, ANCS section shows expected resultsnpm run lint && npm run format:check— clean