Skip to content

fix(graph): advance the release watcher's dispatch high-water only on the Update COMMIT path#194

Merged
rbuergi merged 3 commits into
mainfrom
fix/watcher-highwater-commit
Jul 2, 2026
Merged

fix(graph): advance the release watcher's dispatch high-water only on the Update COMMIT path#194
rbuergi merged 3 commits into
mainfrom
fix/watcher-highwater-commit

Conversation

@rbuergi

@rbuergi rbuergi commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Addresses residual 1 of #185 (the watcher eager high-water advance) — residuals 2 (flake-repro workflow) and 3 remain open, so this does not close the issue.

Verdict: the lost trigger was REAL

Pinned first with a deterministic repro per repo rules — ReleaseRequestWatcherHighWaterTest parks the owner's serialized write queue with a gate write and enqueues two triggers while parked, forcing the exact interleaving with no timing races. Red on unmodified main (frozen at status=Ok, req=T2, handled=T1 — the trigger lost for the life of the process), green with the fix.

The interleaving (bail path C)

Two triggers T1 < T2 enqueue back-to-back → emission E1 advances the high-water eagerly to T1 and posts U1 → E2 (still settled, U1 not yet applied) advances eagerly to T2 and posts U2 → U1 commits (Pending, handled=T1) → U2 bails on the status guard without stamping → compile settles → the recovery re-emission req=T2 > handled=T1 fails req > dispatchHighWater (T2 > T2 is false) → T2 never dispatches. The in-code claim that "the in-flight compile carries this request to a terminal status" was false — the settled re-emission was the intended recovery, and the eager advance gated it off. Bail A (untyped-degrade) had the same lossy shape; bail B (triggerAt <= handled) is genuinely safe.

Fix

Re-derives #124's refinement against post-#173 code: the high-water advances only in the Update lambda's commit branch, in the same breath as the LastReleaseRequestHandledAt stamp — the in-memory mark can no longer run ahead of the on-node stamp, so a bailed trigger stays live and re-fires on the next settled emission. Since commit runs on the owner's write path while the Where reads on the emission path, the mark is a per-install MonotonicHighWaterMark (Interlocked over UTC ticks — no torn reads, no static state). Flap-back correctness unchanged (carried trigger + monotonic stamp + DropStaleMonotonicTriggers); the only cost is an occasional redundant Update in the dispatch-to-commit window, which bails on the existing guards.

Tests

  • New deterministic repro/ReleaseRequestWatcherHighWaterTest (red on main → green): documents the guarantee.
  • Regressions: CodeEditRecompileTest 5/5, NodeTypeReleaseGateTest 4/4, NodeTypeReleaseTest (Monolith + Graph) and NodeTypeCompilationHelpersTest 33/33.
  • Full-solution dotnet build -c Release -warnaserror: clean, including after merging current main.
  • No overlap with fix(graph): atomically publish compiled node DLL — close partial-DLL load race (#177 flake) #190 (it touches only MeshNodeCompilationService.EmitToDiskWithRetry).

🤖 Generated with Claude Code

rbuergi and others added 2 commits July 2, 2026 13:17
… the Update COMMIT path (#185)

InstallReleaseRequestWatcher advanced its process-local dispatchHighWater
EAGERLY in the Subscribe callback — before the stream Update that stamps
LastReleaseRequestHandledAt committed. The Update's bail paths (trigger
already handled; status already Pending/Compiling) return `curr` WITHOUT
stamping, so a trigger whose Update bailed was left with
high-water >= trigger while the on-node stamp stayed below it: the
post-settle re-emission then failed the `req > dispatchHighWater` gate and
the trigger was LOST for the life of the process. The in-code claim that
"an in-flight compile carries this request to a terminal status" did not
hold — the settled re-emission was the intended recovery, and the eager
advance gated it off.

Deterministic repro (ReleaseRequestWatcherHighWaterTest): park the owner's
serialized MeshNode write queue, enqueue two release triggers T1 < T2 while
parked (so both settled-status emissions pass the Where before the
watcher's first Update applies), release. U1 commits Pending + stamps T1;
U2 bails on the status guard; after the compile settles, the node sits at
req=T2 > handled=T1 forever — red on main, green with this fix.

Fix (re-deriving abandoned #124's refinement against post-#173 semantics):
advance the high-water ONLY in the Update lambda's commit branch — the one
path that stamps LastReleaseRequestHandledAt — so the in-memory mark never
runs ahead of the on-node stamp and a bailed trigger re-fires on the next
settled emission. The mark is now Interlocked over UTC ticks
(MonotonicHighWaterMark) because the commit-time advance runs on the
owner's serialized write path while the Where reads it on the
reduced-stream emission path. Flap-back correctness is unchanged: the
carried-trigger + monotonic on-node stamp (and the data layer's
DropStaleMonotonicTriggers) handle it; the eager advance only bought
suppression of redundant no-op Updates in the dispatch-to-commit window,
which now simply bail on the handled/status guards.

Regression-green: CodeEditRecompileTest 5/5, NodeTypeReleaseGateTest 4/4,
NodeTypeReleaseTest (Monolith 1/1, Graph 31/31 incl. helpers), repro 1/1.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a correctness bug in the Graph release-request watcher where the process-local “dispatch high-water” could advance ahead of the node’s committed LastReleaseRequestHandledAt stamp, allowing a newer trigger to be permanently gated off after an Update bail. Adds a deterministic integration test that reproduces the lost-trigger interleaving and verifies the fix.

Changes:

  • Advance the watcher’s high-water mark only on the Update lambda’s commit branch (in lockstep with stamping LastReleaseRequestHandledAt).
  • Introduce a small MonotonicHighWaterMark helper (Interlocked-backed) to keep cross-context reads/writes tear-free.
  • Add ReleaseRequestWatcherHighWaterTest as a deterministic repro and regression test for issue #185 residual 1.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
test/MeshWeaver.Hosting.Monolith.Test/ReleaseRequestWatcherHighWaterTest.cs New deterministic repro/regression test for the watcher high-water lost-trigger interleaving.
src/MeshWeaver.Graph/Configuration/NodeTypeCompilationHelpers.cs Fix watcher gating by moving high-water advancement to the Update commit path + add Interlocked high-water helper.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Test Results (shard 3)

792 tests  ±0   685 ✅ ±0   2m 35s ⏱️ -6s
 13 suites ±0   107 💤 ±0 
 13 files   ±0     0 ❌ ±0 

Results for commit 3e1786f. ± Comparison against base commit 9f7b0ea.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Test Results (shard 2)

   15 files  ±0     15 suites  ±0   7m 10s ⏱️ +4s
1 193 tests  - 7  1 190 ✅  - 7  3 💤 ±0  0 ❌ ±0 
1 195 runs   - 9  1 192 ✅  - 9  3 💤 ±0  0 ❌ ±0 

Results for commit 3e1786f. ± Comparison against base commit 9f7b0ea.

This pull request removes 39 and adds 32 tests. Note that renamed tests count towards both.
MeshWeaver.Persistence.Test.ConcurrentRequestsTest ‑ ConcurrentRequests_MultipleNodeTypes_AllLoadWithoutHanging
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.AI.AgenticAI.md")
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.AI.ChatCommands.md")
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.AI.PlatformProviderS"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.ActionB"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.AddingA"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.Agentic"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.EmailIn"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.ErrorPr"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.Extensi"···)
…
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.Debuggi"···)
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileCreation_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileCreation_ObserveQueryReceivesUpdate
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileDeletion_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileDeletion_ObserveQueryReceivesRemoval
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileModification_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalMarkdownCreation_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ Watcher_AfterStop_DoesNotNotify
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ Watcher_RapidChanges_Debounced
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ Watcher_StoppedByDefault_DoesNotNotify
…

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Test Results (shard 1)

1 393 tests  +1   1 392 ✅ +1   6m 35s ⏱️ -7s
   14 suites ±0       1 💤 ±0 
   14 files   ±0       0 ❌ ±0 

Results for commit 3e1786f. ± Comparison against base commit 9f7b0ea.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Test Results (shard 0)

1 478 tests  ±0   1 478 ✅ ±0   3m 58s ⏱️ +3s
   12 suites ±0       0 💤 ±0 
   12 files   ±0       0 ❌ ±0 

Results for commit 3e1786f. ± Comparison against base commit 9f7b0ea.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Test Results

   54 files  ±0     54 suites  ±0   20m 19s ⏱️ -6s
4 856 tests  - 6  4 745 ✅  - 6  111 💤 ±0  0 ❌ ±0 
4 858 runs   - 8  4 747 ✅  - 8  111 💤 ±0  0 ❌ ±0 

Results for commit 3e1786f. ± Comparison against base commit 9f7b0ea.

This pull request removes 39 and adds 33 tests. Note that renamed tests count towards both.
MeshWeaver.Persistence.Test.ConcurrentRequestsTest ‑ ConcurrentRequests_MultipleNodeTypes_AllLoadWithoutHanging
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.AI.AgenticAI.md")
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.AI.ChatCommands.md")
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.AI.PlatformProviderS"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.ActionB"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.AddingA"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.Agentic"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.EmailIn"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.ErrorPr"···)
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.Extensi"···)
…
MeshWeaver.Hosting.Monolith.Test.ReleaseRequestWatcherHighWaterTest ‑ BurstOfTwoReleaseTriggers_SecondTriggerIsNotLost
MeshWeaver.Persistence.Test.DocumentationCodeBlockCompilationTest ‑ ExecutedCsharpBlocks_MustCompile(embeddedResourceName: "MeshWeaver.Documentation.Data.Architecture.Debuggi"···)
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileCreation_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileCreation_ObserveQueryReceivesUpdate
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileDeletion_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileDeletion_ObserveQueryReceivesRemoval
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalFileModification_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ ExternalMarkdownCreation_NotifiesObservers
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ Watcher_AfterStop_DoesNotNotify
MeshWeaver.Persistence.Test.FileSystemChangeWatcherTests ‑ Watcher_RapidChanges_Debounced
…

♻️ This comment has been updated with latest results.

…andshake in the high-water repro

1. Replace the IDisposable sync-over-async disposal (base.DisposeAsync()
   .GetAwaiter().GetResult()) with an override of the base's async
   ValueTask DisposeAsync() — xUnit v3 awaits it naturally, the mesh tears
   down exactly once, and the per-test compile cache dir is cleaned after
   the mesh is gone.

2. Close the false-PASS window in the forced interleaving: the gate write
   is meant to park the owner's serialized MeshNode write queue before the
   two release triggers are enqueued, but nothing confirmed the gate lambda
   had actually STARTED executing. On a busy runner the gate could run only
   after gate.Set() — nothing parked, the interleaving degrades to ordinary
   timing, and the test could silently pass over a regressed watcher. Add a
   `parked` ManualResetEventSlim handshake (the project's established
   synchronous Subscribe-side signal — see DeleteNodeBehaviorTest): set at
   the top of the gate lambda, asserted on the test thread (bounded wait +
   Should().BeTrue) before P1/P2 are posted.

Re-verified end to end after hardening: the repro still FAILS against the
pre-fix watcher (46s, node frozen at req=T2 handled=T1 — same trace as the
original pin) and passes with the fix (3s); CodeEditRecompileTest 5/5;
test project builds Release -warnaserror clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@rbuergi rbuergi merged commit 77df906 into main Jul 2, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants