Skip to content

fix: strip all accumulated tags in stripTagPrefix to prevent tag accumulation#12

Merged
ualtinok merged 1 commit intocortexkit:masterfrom
tomolom:fix/tag-accumulation
Apr 12, 2026
Merged

fix: strip all accumulated tags in stripTagPrefix to prevent tag accumulation#12
ualtinok merged 1 commit intocortexkit:masterfrom
tomolom:fix/tag-accumulation

Conversation

@tomolom
Copy link
Copy Markdown
Contributor

@tomolom tomolom commented Apr 12, 2026

Summary

Fixes tag accumulation bug where stripTagPrefix only removed a single tag, causing §N§ symbols to accumulate when content is processed multiple times during transform passes, compartment compaction, or message replay.

Changes

Changed TAG_PREFIX_REGEX in tag-content-primitives.ts:

// Before: Only matches ONE tag
const TAG_PREFIX_REGEX = /^§\d+§\s*/;

// After: Matches one or more consecutive tags  
const TAG_PREFIX_REGEX = /^(?:§\d+§\s*)+/;

Problem

The original regex only stripped the outermost tag prefix, leaving previously accumulated tags intact. Each transform pass would prepend a new tag on top of old ones, resulting in unreadable messages like:

§4687§ §4686§ §4685§ ... actual content

Solution

The new regex uses a non-capturing group with + quantifier to match all consecutive tag prefixes at the start of the string. This ensures stripTagPrefix removes all accumulated cruft before prependTag adds a single fresh tag.

Testing

This change maintains backward compatibility - existing single-tag content works identically, and multi-tag content is now properly normalized to a single tag.

Fixes #11

…mulation

The TAG_PREFIX_REGEX was only matching a single tag prefix, causing tags
to accumulate when content is processed multiple times during transform
passes, compartment compaction, or message replay.

Changed the regex from /^§\d+§\s*/ to /^(?:§\d+§\s*)+/ to match one or more
consecutive tags at the start of the string. This ensures stripTagPrefix
removes ALL accumulated tags before prependTag adds a fresh one.

Fixes #11
Copilot AI review requested due to automatic review settings April 12, 2026 18:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a tag prefix normalization bug in the magic-context tagging utilities so repeated transform passes don’t accumulate multiple §<id>§ prefixes on the same content.

Changes:

  • Update the tag-prefix stripping regex to remove all consecutive §<id>§ prefixes at the start of a string (not just the first).
  • Ensure prependTag() consistently results in a single tag prefix even when content has already been tagged multiple times.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ualtinok ualtinok merged commit e76cff1 into cortexkit:master Apr 12, 2026
6 of 7 checks passed
@tomolom tomolom deleted the fix/tag-accumulation branch April 13, 2026 03:17
ualtinok added a commit that referenced this pull request Apr 18, 2026
Council flagged concrete correctness and observability bugs introduced (or
left untouched) by the recent tokenizer/sidebar work. All fixes apply to
uncommitted changes so no shipped behavior regresses.

Correctness fixes:

- `strip-content.ts` — `stripReasoningFromMergedAssistants` now strips
  `thinking` and `redacted_thinking` part types in addition to OpenCode's
  internal `reasoning`. Opus 4.7 emits wire-format `thinking` parts, and the
  workaround's whole purpose (keep thinking at position 0 in the merged
  Anthropic block) requires handling every reasoning-like type. Without
  this, two consecutive assistants each carrying a `thinking` block pass
  through unchanged and produce the exact "thinking blocks … cannot be
  modified" 400 the function was written to prevent. Adds 3 new tests
  covering `thinking`-typed consecutive runs and mixed type sequences.
- `messages-transform.ts` — distinguish SQLITE_BUSY (transient, log and
  skip) from persistent non-BUSY errors (log with full detail and persist a
  summary into `session_meta.last_transform_error`). The sidebar already
  reads that field, so persistent schema/programming failures now surface
  as a visible failure indicator instead of disabling magic-context
  silently forever.
- `event-handler.ts` + `event-payloads.ts` — invalidate the per-message
  token cache on `message.updated` (per-message, falls back to session-wide
  when the event lacks a message id) and on `session.compacted`
  (session-wide, since native compaction restructures messages).
  `MessageUpdatedAssistantInfo` gains an optional `messageID` sourced from
  `info.id`.

Hardening fixes:

- `inject-compartments.ts` — memory trim-to-budget now uses `estimateTokens`
  instead of chars/4, matching the rest of the plugin's token math. Removes
  the last unit-mismatched budget path in the injection pipeline.
- `image-token-estimate.ts` — `readUint32BE` now coerces via `>>> 0` so
  PNG headers with MSB-set bytes produce the correct unsigned value
  instead of a negative int that bypasses the `< 1` fallback. Removes
  dead `|| 0` in the WebP lossy parser; the `& 0x3fff` mask already
  produces a non-negative result.

Tests:

- `transform-index-staleness.test.ts` — the "clears reasoning before
  dropped messages" regression expected `m-reason-b`'s `thinking` to
  survive after pruning, but that expectation was only valid while
  `stripReasoningFromMergedAssistants` ignored `thinking` parts (the bug
  fixed in #2 above). Updated the assertion and comment to reflect the
  correct interaction: after pruning collapses adjacent assistants, the
  merge-strip correctly removes `thinking` from every assistant past the
  first in the run, even when the watermark wouldn't reach it.

Verified: 535 plugin tests pass, typecheck clean, build clean, lint clean
(pre-existing Intentional: warnings only).

Skipped findings (documented in synthesis.md):
- #10 self-heal oscillation (sticky-date already stabilizes main variance)
- #11 non-image attachments counted as 0 (would require document tokenization)
- #12 residual clamp masks drift (clamp-to-0 is more user-friendly than negative)
ualtinok added a commit that referenced this pull request Apr 23, 2026
Two definitions of 'cache-busting' coexist:
  - system-prompt-hash.ts + inject-compartments.ts: flush-only
  - transform-postprocess-phase.ts: flush-OR-execute

Intentional by design but undocumented — a maintenance footgun. Add
detailed design comments at both definition sites explaining why the
asymmetry matters:

  - Adjunct state (docs, user profile, sticky date) is disk/config-
    derived and unrelated to pending ops. Flush-only ensures it refreshes
    only on explicit user-driven events.
  - Message-level mutations (pending ops, sentinel registration,
    tool-drop finalization) correctly fire on scheduler 'execute' passes
    because that's when queued user drops get materialized.

Historian publication bridges the two via flushedSessions.add (just
fixed in the previous commit, council Finding #9). No behavioral change.

Closes council Finding #12 (MEDIUM, 4 members).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tag accumulation bug: stripTagPrefix only removes single tag causing excessive §N§ repetition

3 participants