Skip to content

refactor: reactive persistence — IMeshStorage writes return IObservable#92

Merged
rbuergi merged 51 commits intomainfrom
bug_fix
Apr 19, 2026
Merged

refactor: reactive persistence — IMeshStorage writes return IObservable#92
rbuergi merged 51 commits intomainfrom
bug_fix

Conversation

@rbuergi
Copy link
Copy Markdown
Contributor

@rbuergi rbuergi commented Apr 19, 2026

Summary

  • Flip IMeshStorage write ops (SaveNode, DeleteNode, MoveNode, AddComment, DeleteComment, SavePartitionObjects, DeletePartitionObjects) from Task-returning to IObservable-returning. Subscribe to drive — no more await in hub handlers.
  • HandleCreateNodeRequest, HandleMoveNodeRequest, DeleteSelfFromStorage, post-creation save-extras, ActivityLogBundler, MeshDataSourceLayoutAreas, and MeshService.CreateTransient consume the observable directly.
  • MeshCatalog.CreateTransientNode returns IObservable<MeshNode>; dead UpdateAsync / ConfirmNodeAsync / IMeshCatalog.DeleteNodeAsync removed.
  • DeleteNode returns IObservable<string> (path) — no read-then-delete TOCTOU under concurrent writers.
  • Companion fix (prior commit on this branch): AccessControlLayoutArea now builds from host.Workspace.GetStream(new MeshNodeReference()) instead of meshQuery.QueryAsync, clearing the prod eternal-spinner on Settings/AccessControl.

Why: follows the CLAUDE.md "no await in hub flows / UI flows" rule. Every write to the mesh must compose as an IObservable<T> chain so the hub dispatcher never blocks and the click/handler path never deadlocks under load.

Test plan

  • dotnet test test/MeshWeaver.Graph.Test — 236/236 pass
  • dotnet test test/MeshWeaver.NodeOperations.Test — 73/73 pass (2 skipped by design)
  • Manual: open /Settings/AccessControl in prod after deploy — spinner should clear
  • Manual: create / move / delete nodes through Memex portal

🤖 Generated with Claude Code

rbuergi and others added 30 commits April 16, 2026 21:04
CurrentNamespace now returns PrimaryPath (the main node) instead of the
raw resolved address. Chat input, autocomplete, attachments, and creatable-
types loading all key off CurrentNamespace; on a thread URL like
/PartnerRe/AIConsulting/_Thread/abc, they were treating the satellite path
as the namespace, so @content/foo and @../sibling refs failed to route.

Also folds in two related satellite fixes already in the working tree:
- PathUtils.ResolveRelativePath strips _Thread/_Comment/_Activity segments
  from the base path before applying ../ traversal
- ThreadMessageLayoutAreas emits absolute hrefs for agent-emitted @refs in
  rendered messages (so a reader can interpret them without knowing the
  enclosing thread's context)

Adds three NavigationServiceTest cases covering satellite vs regular nodes
and creatable-type loading on a satellite page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds four GetDataRequest tests that exercise the user-reported scenario:
- content/<file> in default collection
- content/<spaced filename>
- content/<collection>/<spaced filename>
- content/<file> on a hub with no provider — must return error, not hang

All four pass against the current monolith handler, so the prod symptom
(10s AwaitResponse timeout against PartnerRe/AIConsulting) is not a
handler bug. The tests now form a regression net so any future change to
the content-resolver path that breaks slash format or spaces fails locally
before it ships.

Also folds in:
- ThreadMessageLayoutAreas: avoid `//path` when an agent emits an already-
  absolute path
- ToolStatusFormatterTest: update expectation to the new absolute-href
  format introduced in the previous commit

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Agents (and the autocomplete UI) wrap spaced filenames in double quotes
to "protect" them, producing paths like @/PartnerRe/AIConsulting/content/"Diskussion Thomas Final Report.docx".
ResolvePath previously only stripped a wrapping quote pair; embedded
quotes around a single segment survived and the file lookup went after
a literally-quoted name — returning "not found" or hanging on the prod
hub waiting for a routing response that never came.

Fix: drop every double quote from the path. Mesh paths don't contain
quotes legitimately, so this is safe regardless of position.

Two new repro tests (Get_AbsolutePath_QuotedSpacedFilename and
Get_AbsolutePath_QuotesAroundContentSegment) replicate the exact prod
shape — both fail on main, both pass with this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rance tests

Adds 25 unit tests for relative link resolution in satellite contexts
(_Thread, _Comment, _Tracking) plus end-to-end LinkUrlCleanupExtension
pipeline tests. Also adds tolerance matrix tests for spaced-filename
content access via MeshPlugin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extends the quote-stripping ResolvePath fix to handle:
- Single quotes wrapping a segment ('My File.docx')
- Surrounding/leading/trailing whitespace

Adds a Theory tolerance matrix (Get_AgentEmittedShapes_AllReturnFileContent)
covering 9 path shapes observed from agents and autocomplete:
  no quotes / wrapping double quotes / inner double quotes around filename /
  inner double quotes around content/file segment / quote after @ /
  quote after @ no leading slash / trailing whitespace / leading whitespace /
  single quotes around filename

Also adds an autocomplete round-trip test:
  type lowercase "markus" → AutocompleteRequest to node hub →
  ContentAutocompleteProvider returns quoted InsertText for spaced filename →
  feed that InsertText back through MeshPlugin.Get → returns file content

All 18 MeshPluginContentAccessTest cases now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ThreadChatView.SubmitMessageCore force-releases the submission handler
immediately after Submit so the input stays enabled for queueing. Without
a guard, a double-click or Enter+Send race re-entered TryBeginSubmit with
the same text and was wrongly accepted — two user cells were created and
the server watcher dispatched two execution rounds ("Generating response"
appearing twice in the UI).

Fix: text-based debounce in TryBeginSubmit. Same text submitted within
500ms of the previous accepted submission is rejected; different text
goes through (queueing UX preserved).

Two new tests:
- DoubleClick_SameTextWithinDebounce_RejectsSecondSubmission — fails on main
- ForceRelease_ThenDifferentText_SecondSubmissionAccepted — verifies
  queueing still works

All 17 ChatSubmissionHandler tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The server watcher's reentrancy guard was released the moment the
subscription handler returned — but DispatchRound only POSTS the
CreateNodeRequest and registers a callback; the IsExecuting=true commit
lands later, inside the callback. The window between handler exit and
that commit was wide enough for the user-cell creation emit to re-fire
the watcher, find IsExecuting still false + the same unprocessed user
message, and dispatch a SECOND round — producing a duplicate response
cell ("Generating response" appearing twice in the UI).

Fix: defer Interlocked.Exchange(ref dispatching, 0) until DispatchRound's
RegisterCallback runs (success or failure), via an onCompleted callback
hook. Subsequent watcher emits during the in-flight dispatch see
dispatching=1 and skip; once the response cell exists and IsExecuting is
true, the guard drops back to idle but the IsExecuting check now blocks
new dispatches.

Test: Submit_SingleSubmit_ProducesExactlyOneResponseCell asserts ONE
submit produces exactly one user + one response cell on the thread and
exactly one assistant cell node. Existing 8/8 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-add MCP SDK's AddMcp() for RFC 9728 OAuth resource metadata discovery,
fixing the root cause of the previous revert (ForwardAuthenticate = "Bearer"
hardcoded in SDK constructor takes priority over ForwardDefaultSelector).

Fix: set ForwardAuthenticate = null, use ForwardDefaultSelector to route
Bearer tokens to ApiToken handler and cookie sessions to Cookie scheme.

Add minimal OAuth authorization server (/connect/authorize, /connect/token,
/.well-known/oauth-authorization-server) implementing authorization code
flow with PKCE. Issues mw_ API tokens as access tokens, reusing existing
ApiTokenService infrastructure. Enables claude.ai Connectors support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…per-message metadata + thread header

Eliminates the "duplicate response cell", "Renderer has been disposed", and
"sub-thread streaming deadlocks" failures.

Pipeline:
- Atomic single-write submission via new ThreadInput.AppendUserInput
  (testable, Blazor-free); client posts one AppendUserMessageRequest
  instead of CreateNodeRequest + AppendUserMessageRequest. Handler runs
  ThreadInput on the thread hub for one local UpdateMeshNode patch.
- Watcher subscribes to MeshNodeReference (not collection-wide), holds
  the reentrancy guard until IsExecuting=true is observed back, materialises
  user satellites server-side from a new Thread.PendingUserMessages map,
  and rechecks idempotency inside the guard before dispatching.
- BlazorView gates its DataBind subscription callbacks on a _viewDisposed
  flag, before scheduling AND after the sync-context dispatch — kills
  late-callback "Renderer has been disposed" errors.

Sub-thread streaming (no awaits on parent):
- ThreadMessageLayoutAreas.Overview replaces the meshService.QueryAsync
  ToListAsync() deadlock with a reactive subscription that emits embedded
  LayoutAreaControls pointing at each delegation's Streaming area. The
  parent never reads sub-thread streams.

GUI enhancements:
- New token + CompletedAt fields on ThreadMessage, captured from
  Microsoft.Extensions.AI UsageContent during streaming. Per-message
  metadata row on assistant cells (timestamp · model · duration · tokens).
- New Header layout area: parent-thread back-link (path-derived for
  delegations) + aggregated UpdatedNodes summary linking to the existing
  VersionLayoutArea compare URL. Rendered above the message list.

Cancellation: queue-don't-cancel confirmed; explicit Cancel button
preserved. Mid-iteration drain of PendingUserMessages within a single
agent turn deferred — would require bypassing Microsoft.Extensions.AI's
auto-tool-invocation and rebuilding the loop manually.

Tests: AI.Test 294/294, Threading.Test 104/104.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Untracks .claude/settings.local.json (the per-user permission allowlist)
and ignores the whole folder so future plans/, scheduled_tasks.lock,
and any other local Claude Code state never accidentally gets committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aspire 13.2.2 (bumped in b949a74) requires project resources to have
a Properties/launchSettings.json with "commandName": "Project" to build
a valid `dotnet <dll>` launch command. Without it, Aspire invoked
dotnet with no DLL path, the migration resource printed the dotnet
usage help and exited, and memex-local (portal) never started because
of WaitForCompletion(dbMigration).

Unignore the file so a fresh clone doesn't hit the same problem; the
template generator already picks it up since Properties/ isn't in its
exclusion list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ream

The area control stream's upstream hub/workspace can throw
ObjectDisposedException during navigation or component teardown — that
is a normal lifecycle event, not a real error. NamedAreaView previously
formatted it as a user-visible "Error loading area: Cannot access a
disposed object" markdown; now it's debug-logged and skipped.

Also gates OnNext/OnError on IsViewDisposed (matching BlazorView's fix)
so late stream emissions can't touch a torn-down renderer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re confirmed created

Root cause of the residual "Cannot access a disposed object" errors: the
GUI iterates Thread.Messages to render one LayoutAreaControl per id, but
ThreadInput.AppendUserInput was adding the id to Messages BEFORE the
satellite ThreadMessage node existed on the hub. The renderer would then
subscribe to a layout area whose hub had no node yet — producing
ObjectDisposedException-style errors as the area stream tore down.

Fix:
- ThreadInput.AppendUserInput stops adding to Messages — it only stashes
  the message in PendingUserMessages + UserMessageIds.
- ThreadSubmissionServer.DispatchRound creates the user satellites FIRST
  (CombineLatest on IMeshService.CreateNode), THEN creates the response
  satellite (CreateNodeRequest + RegisterCallback), and only inside the
  response-cell-success callback does it commit one atomic UpdateMeshNode
  that adds both the user ids AND the response id into Messages.
- Watcher gets a 50 ms throttle on the MeshNodeReference stream so a
  burst of rapid submits coalesces into a single round (preserves the
  Submit_ThreeRapidSubmissions test contract).
- Contains check kept on the user-id append: needed for the resubmit
  case where ApplyResubmit re-queues an id that's still in Messages.

Tests: ThreadSubmissionIntegrationTest 8/8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… header + thread-create UX

Sub-thread display de-duplication:
- ThreadMessageBubbleControl already shows delegation tool-call chips inside
  the assistant bubble. The extra LayoutAreaControl I added in Phase F below
  the bubble rendered the SAME "Delegating to …" line a second time. Removed
  the outer embed; the in-bubble chip stays (click-through to sub-thread for
  full progress).

Token-usage forwarding:
- AgentChatClient.GetStreamingResponseAsync was filtering content types and
  dropped UsageContent. Added UsageContent to the forward list in both the
  main streaming loop and the handoff streaming loop so ThreadExecution can
  record InputTokens / OutputTokens / TotalTokens on the response cell and
  the per-message metadata row can show them.

Header-area skeleton phantom:
- Thread Header layout area uses SpinnerType.None now (was Skeleton), and
  emits an immediate placeholder via StartWith so the LayoutAreaView never
  shows a ghost skeleton before the aggregated UpdatedNodes list is ready.
  Fixes the "phantom You bubble" the user saw at the top of new threads.

Thread-create UX on embedded chat (User Activity dashboard):
- While CreateThreadAndSubmit is in flight the ThreadChatView now replaces
  the Monaco editor with an "Allocating agent…" progress panel instead of
  layering a blurred overlay over the still-visible editor. Clearer
  signal, fewer "text vanished" moments.

Node page header styling:
- Bumped the node title to 2rem / 700 weight / tight letter-spacing and
  widened the icon/title gap to 20 px with a 16 px top margin so the
  title separates cleanly from the parent back-link above it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed thread message updates

Eliminates the await-on-sub-thread deadlock. Microsoft.Extensions.AI's
FunctionInvokingChatClient used to block on Task<DelegationResult> until
the sub-thread finished — under Orleans the child's completion patch
queued behind the parent hub scheduler and never arrived. Tool signature
is now IAsyncEnumerable<string>, implemented as an async iterator that
streams the sub-thread's response-cell deltas back to the parent via a
Channel fed by the MeshNodeReference remote-stream subscription.

Changes:
- DelegationTool.CreateUnifiedDelegationTool: executeAsync signature
  flipped from Func<..., Task<DelegationResult>> to
  Func<..., IAsyncEnumerable<string>>. Tool body yields each delta
  up to FIC; FIC's aggregated tool result is the concatenated stream.
  Removed the unused CreateDelegationTool single-target variant.
- ChatClientAgentFactory: new ExecuteDelegationAsync private method is
  an async IAsyncEnumerable<string>. Sub-thread node + cells created
  fire-and-forget via IMeshService.CreateNode().Subscribe(); no TCS, no
  ObserveQuery, no 30-second completion timeout. Channel bounded by a
  5-minute safety CTS as a last-resort guard against a stuck sub-thread.
- UpdateThreadMessageContent: new TextDelta field for streaming appends.
  Text stays for final/terminal writes (completion text, errors, cancel).
  HandleUpdateContent prefers TextDelta — appends to current Text.
- ThreadExecution streaming loop: ships deltas (only the new characters
  since lastPushedTextLength) instead of re-sending the growing full text
  on every throttled push. First push still uses Text to clear the
  "Generating response…" placeholder.
- DelegationTest: test lambdas wrapped in a local ToStream helper so the
  existing Task-based delegation bodies adapt to the new streaming
  signature. 294/294 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Redesigns the "Modified nodes" section of the thread header:

- Collapsed by default (HTML <details>/<summary>); summary shows the count.
  Uncluttered header for threads that modify many nodes.
- Full-width rows. Each row is a proper bar that fills the panel so the
  path can occupy the available space without squashing the version
  chips or the actions button.
- Clickable node path → opens the current version (the node overview).
- Clickable old-version chip → /{path}/Versions?version={v}.
- Clickable new-version chip → /{path}/Versions?version={v} (styled as
  accent so the post-change state reads at a glance).
- Per-row ⋯ actions menu (itself a <details>) hidden until clicked.
  Offers Diff (old ↔ new), Restore to old, Restore to new, and an
  "All versions…" fallback link that opens the Versions area.

URLs follow the existing VersionLayoutArea shape — ?version= for a
single version, ?from=&to= for compare, ?restore= for restore intent.
VersionLayoutArea can honour these query params directly (compare
view already exists); restore is an auto-prompt on the Versions page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eme-safe colours

UI fixes to the thread-header modified-nodes panel:

- Theme-safe chip colours. Old and new version chips now use
  --neutral-foreground-rest on --neutral-layer-3 with a border; the new
  chip is distinguished by --accent-fill-rest TEXT + matching border
  (outlined, not filled), so there's no white-on-light-blue glitch in
  dark mode. Hover states use --neutral-layer-2.
- CSS-grid tabular layout. Columns: path · old-ver · arrow · new-ver ·
  Diff · Restore v{old} · Restore v{new}. Everything aligns across rows.
- Inline actions. Diff and both Restore links are visible on each row —
  the hidden <details>/⋯ menu is gone.
- Small-screen collapse. @media (max-width: 720px) hides the action
  columns and makes the whole section collapse behind its summary
  (expand with the chevron). Wide screens always show the full table.
- Relative display paths. The thread's root namespace is stripped from
  each displayed path (hrefs stay absolute so links still work). For a
  thread at "Org/_Thread/abc", a modified node at "Org/Contact/john"
  displays as "Contact/john".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… as h/m/s

Tokens weren't showing in the GUI because AzureClaudeChatClient.GetStreamingResponseAsync
never yielded a UsageContent update — Anthropic emits input-token count on
message_start and cumulative output-token count on message_delta, but the
stream-event model ignored both fields. Fixed:

- Added ClaudeUsage fields to ClaudeStreamMessage and ClaudeStreamEvent so
  the parser sees them.
- message_start handler captures input_tokens; message_delta updates the
  running output_tokens.
- New message_stop handler yields one ChatResponseUpdate carrying a
  UsageContent(InputTokenCount, OutputTokenCount, TotalTokenCount) so
  AgentChatClient forwards it (already added) and ThreadExecution stamps
  the response cell (already added).

Time format in the assistant metadata row: was "1800ms" / "1.8s". Now
"120ms" / "1.8s" / "42s" / "1m 23s" / "1h 5m 30s" — drops zero components.
Matches the h/m/s style the user asked for.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…modifying tools

- Tool-chip text reads cleaner. The old format ("Creating path: Org/Contact/john")
  exposed the raw argument line. The new FormatToolCallDisplay pulls the
  actual value (stripping the "path:" / "url:" / "query:" key prefix) and
  splits each chip into {Verb, Path, IsNodeModifying}. So instead of:
      ✓ Creating path: Org/Contact/john
  you now see:
      ✓ Created  Org/Contact/john
- Path is a clickable link to the node's overview.
- For Create / Update / Patch / Delete, the chip cross-references the
  message's UpdatedNodes list by path and appends inline Diff and Revert
  links with the matching before/after versions — same URL shape as the
  thread-header panel (?from=&to=, ?restore=). Absolute-path hrefs;
  theme-safe colours.
- Verb copy refined: past-tense for completed writes ("Created",
  "Updated", "Deleted"), present for reads ("Reading", "Searching").
- UpdatedNodes is now data-bound on ThreadMessageBubbleControl
  (ThreadMessageViewModel already carried it from the satellite).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delete click handler and render path are now fully reactive (no await on the
hub thread): Subscribe on the form-data stream, Subscribe on IMeshService.DeleteNode,
CombineLatest of ObservePermissions + FromAsync descendant count. A blocked hub
no longer deadlocks the UI.

DeleteSelfFromStorage posts the success response BEFORE issuing the persistence
delete. Under Orleans (and during monolith disposal) the storage write can tear
this hub down; replying first guarantees the caller's RegisterCallback resolves.
Validators have already passed at this point, so a late storage failure is
logged — the Ok reply cannot be walked back.

Tests exercise the exact production pattern: hub.Post(DeleteNodeRequest) +
hub.RegisterCallback, TaskCompletionSource driven by the callback, WaitAsync(10s)
as deadlock guard. Added recursive variant that would hang if the self-hub
disappeared before posting its reply.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `Sources` property to `NodeTypeDefinition` holding query-syntax lines
that point at the Code nodes to compile with the lambda. The compilation
service expands `$self` to the owning NodeType's path, rebases relative
`namespace:X` values (no `/`) onto that path, and ANDs each query with
`nodeType:Code` so non-Code children cannot leak in. `@path` / `@@path`
shorthand resolves to both a `path:` exact match and a
`namespace:... scope:subtree` folder match. Matches across lines are
de-duplicated. Default is `["namespace:_Source scope:subtree"]` — behaviour
preserved for existing NodeTypes.

Covers the NodeType cross-sharing case the ACME Project/Todo sample works
around today by duplicating Status/Category/Priority.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `GetDiagnostics(path)` tool on `MeshPlugin`/`MeshOperations` returning
`{status, nodeTypePath, error}` for a NodeType or any of its instances.
`Get` additionally wraps its response with a `compilationError` field when
the node's NodeType failed to compile, so callers that only call `Get` still
see the failure. `GetCompilationError` is now public on `INodeTypeService`.

Also fixes two `Post + RegisterCallback` sites in `AgentView` and
`AutocompleteClient` that blindly cast the callback response and would throw
`InvalidCastException` on `DeliveryFailure`.

Updates `Coder.md` to require `GetDiagnostics` verification after every
NodeType create/update — a NodeType is not "done" until `status: "Ok"`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a Recycle menu item (between Move and Delete) that sends DisposeRequest
to the current node's hub and redirects back to Overview after 100ms. Lets
users flush a cached / stuck grain — useful after fixing a compile error on
a NodeType whose hub was already instantiated with the broken configuration.

Reformats the compilation-error overlay as markdown (fenced code block for
the Roslyn diagnostics + a pointer to Recycle / GetDiagnostics) so it
renders legibly in both light and dark themes — the previous HTML used
hardcoded light-mode colours.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flip write ops (SaveNode, DeleteNode, MoveNode, AddComment, DeleteComment,
SavePartitionObjects, DeletePartitionObjects) from Task-returning to
IObservable-returning so handlers can Subscribe without await. Observable
carries state via OnNext and errors via OnError.

- Handlers (HandleCreateNodeRequest, HandleMoveNodeRequest, DeleteSelfFromStorage,
  post-creation save-extras) no longer wrap persistence calls in
  Observable.FromAsync; they consume the observable directly.
- HandleMoveNodeRequest posts its response inside Subscribe so the handler
  returns immediately without awaiting persistence.
- MeshCatalog.CreateTransientNode returns IObservable<MeshNode>; dead
  UpdateAsync / ConfirmNodeAsync / IMeshCatalog.DeleteNodeAsync removed.
- ActivityLogBundler, MeshDataSourceLayoutAreas, MeshService.CreateTransient
  migrated to Subscribe with explicit error logging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a NodeType's compile fails, the CompilationException now carries the
list of source queries that actually ran and the Code-node paths they
matched. The error overlay shows this report under "--- Source discovery ---",
making it obvious when the compile failed because zero code files were
pulled in (the most common cause of "type not found" errors).

Adds compile-in-progress tracking on NodeTypeService: IsCompiling(path) +
GetCompilationStartedAt(path). Exposed on INodeTypeService.
GetDiagnostics MCP tool now returns status:"Compiling" with elapsed ms
while a compile is running, so callers can show "Compiling…" progress
instead of blocking silently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rbuergi and others added 6 commits April 19, 2026 16:59
Change DeleteNode signature from IObservable<MeshNode> (pre-delete state)
to IObservable<string> (path). The previous implementation fetched the
node first, then deleted it — racy under concurrent writers. Call sites
(DeleteSelfFromStorage) already discard the emitted value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MeshPlugin.Recycle posts DisposeRequest to the target hub so agents can
flush a cached / stuck grain over MCP (mirrors the Recycle menu item).
Fire-and-forget via hub.Post; returns immediately. Caller should wait
~100ms before the next access so the grain teardown completes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GetDiagnostics and Recycle to McpMeshPlugin so MCP clients (including
the Coder agent's MCP tools, not just the in-process MeshPlugin) can verify
compilation status and flush stuck grains.

Coder.md already instructs agents to call GetDiagnostics after every
NodeType create/update; without this commit those calls would fail — the
MCP surface was limited to the Mesh CRUD tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- AccessControlLayoutArea now try/catches GetStream(new MeshNodeReference()).
  When the reducer is not registered (minimal test hubs), render the page
  once without a node instead of throwing DeliveryFailureException at the
  layout host. Catch observable errors too and fall back to a stream-less
  render.
- MenuAccessControlTest: add Pin (Permission.None — all authenticated users)
  and Recycle (Permission.Update — Editor/Admin) to the expected label sets.
  These menu items landed in earlier commits on this branch; the test never
  got updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llites

Profile was failing to compile because GatherInputsAsync called
meshStorage.GetChildrenAsync which excludes satellite-pattern nodes
(mainNode != path) — the Code nodes we persist via MCP land with
mainNode set to the parent _Source folder, so GetChildrenAsync skipped
them entirely even though they exist at the right paths.

Two changes:
- Switch to GetDescendantsAsync + a single-node fetch so satellite Code
  nodes are picked up. Dedup via path set.
- Route through ResolveSourcePaths to honor NodeTypeDefinition.Sources
  (the property that was added but unused here). Supports namespace:/path:
  qualifiers, $self macro, @path shorthand, and implicit self-relative
  folder names like "_Source". Defaults to "{nodeTypePath}/_Source".

Adds one structured log line per compile listing the Code node paths
pulled in, so future source-discovery issues are diagnosable without
redeploys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduces flakiness on CI where the 10s internal deadline was occasionally
hit while the async thread ingest was still in flight. Local runs complete
in ~2s, so 30s is plenty of headroom without slowing green runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 19, 2026

Test Results

2 890 tests  +60   2 877 ✅ +60   5m 59s ⏱️ +10s
   35 suites + 1      13 💤 ± 0 
   35 files   + 1       0 ❌ ± 0 

Results for commit 71d6540. ± Comparison against base commit 453c31f.

This pull request removes 133 and adds 193 tests. Note that renamed tests count towards both.
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_BasePathWithPrefix_ReturnsIcons
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_BasePathWithPrefix_SearchesWithinPath
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_CaseInsensitive_MatchesLowerInput
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_ContainsMatch_FindsBySubstring
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_EmptyPrefix_ReturnsIconsWhereAvailable
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_LimitIsRespected
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_NoMatch_ReturnsEmpty
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_PathMatch_FindsByPathSubstring
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_PrefixMatch_FindsByNameStart
MeshWeaver.Query.Test.AutocompleteIconTests ‑ Autocomplete_RelevanceFirst_PrefixMatchScoresHigher
…
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Autocomplete_RoundTrip_LowercaseQuery_QuotedInsertText_GetsContent
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AbsolutePath_QuoteAfterAtSign_ReturnsFileContent
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AbsolutePath_QuotedSpacedFilename_ReturnsFileContent
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AbsolutePath_QuotesAroundContentSegment_ReturnsFileContent
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AgentEmittedShapes_AllReturnFileContent(template: "   @/{NODE}/content/{FILE}", description: "leading whitespace")
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AgentEmittedShapes_AllReturnFileContent(template: "@/{NODE}/\"content/{FILE}\"", description: "quotes around content/filename together")
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AgentEmittedShapes_AllReturnFileContent(template: "@/{NODE}/content/'{FILE}'", description: "single quotes around filename")
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AgentEmittedShapes_AllReturnFileContent(template: "@/{NODE}/content/\"{FILE}\"", description: "quotes around filename only")
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AgentEmittedShapes_AllReturnFileContent(template: "@/{NODE}/content/{FILE}   ", description: "trailing whitespace")
MeshWeaver.AI.Test.MeshPluginContentAccessTest ‑ Get_AgentEmittedShapes_AllReturnFileContent(template: "@/{NODE}/content/{FILE}", description: "no quotes, absolute")
…

♻️ This comment has been updated with latest results.

rbuergi and others added 14 commits April 19, 2026 20:59
…flakes)

CLAUDE.md documents 60s per method but the runner config was 30s. Query
tests (AutocompleteMultiSourceTest, FanOutQueryOrderingTests) run ~22s
locally and occasionally blow past 30s under CI load. 60s matches the
documented value and leaves headroom without masking genuine hangs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eFeed

Two stale-cache fixes wired together:

1. NodeTypeService.InvalidateCache now also clears `_compilationErrors`
   and `_compilingInProgress`. Previously those survived every Recycle,
   so the user kept seeing an old error even after the hub was disposed
   and source files were fixed.

2. NodeTypeService subscribes to IMeshChangeFeed on construction and
   invalidates whenever an event arrives whose path is already in the
   local caches (or whose NodeType is the NodeType marker). This is the
   existing broadcast channel — in monolith it's in-process; in Orleans
   it's a BroadcastChannel that every silo subscribes to. So invalidation
   reaches all silos automatically.

MeshOperations.Recycle now:
- Calls InvalidateCache locally so the current silo flushes immediately.
- Publishes a synthetic MeshChangeEvent.Updated over IMeshChangeFeed
  with NodeType = "NodeType" — every silo's NodeTypeService picks it up
  and invalidates its local cache too.
- Posts DisposeRequest to the target hub as before.

Contract change: INodeTypeService.InvalidateCache is now public (was
internal on the impl) so MCP tools can trigger cross-silo eviction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ApplicationPage.razor was showing just 'Looking up <path>...' for the
entire duration of the blocking compile — users had no indication of
what the hub was actually busy with. Now:

- INodeTypeService exposes GetCompilingPaths() (a snapshot of paths
  with a compile task currently running).
- ApplicationPage polls it once per second while IsLoading is true
  and flips the placeholder to 'Compiling <path> (Ns)...' when any
  NodeType is mid-compile. The elapsed-second counter gives
  reassurance on long compiles.
- Timer is disposed with the page.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Expose WithHeartBeatHandler() in MeshExtensions so any hub can register
  the existing HandleHeartBeat handler without pulling in the full set of
  WithNodeOperationHandlers (which includes Create/Update/Delete/Move —
  not appropriate for leaf per-node hubs).
- Call it from MemexConfiguration.ConfigureDefaultNodeHub so every dynamic
  node hub (NodeType instances, threads, _Exec, etc.) acks heartbeats
  silently instead of logging a "No handler found for HeartBeatEvent"
  warning per beat. In monolith mode it's a no-op; in Orleans it walks
  the parent chain to call GrainKeepAliveCallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MeshWeaver was at Warning, which hid NodeType source-discovery
Information logs (e.g. `source discovery: N Code nodes from [...]`).
Only MeshWeaver.AI was at Information. Bumping MeshWeaver default to
Information and pinning MeshWeaver.Graph.Configuration so future
compile-path diagnostics surface without log-level tweaks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ject

Two crash surfaces mopped up:

1. NodeTypeService constructor wraps the IMeshChangeFeed.Subscribe call
   in try/catch. If the feed implementation throws during early
   subscription (timing issues at cluster startup), every silo's DI
   blew up and the whole mesh deadlocked. Handler body is also now
   try/catch'd so one faulted event doesn't kill the subscription.

2. ApplicationPage.razor.cs no longer hard [Inject]s INodeTypeService —
   it lazy-resolves via IServiceProvider.GetService. A hard inject
   threw during component construction when the service wasn't
   registered (e.g. distributed portal startup) which left the user
   with a black screen. Timer tick is also try/catch'd to avoid
   unhandled exceptions killing the circuit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of the 'types not found' compile failure: Code nodes are
persisted as satellites (MainNode = parent _Source folder, Path =
Code node path). InMemoryPersistenceService.GetDescendantsAsync
explicitly excludes every node where MainNode != Path (lines 202-204).
My previous fix still used that storage API, so the compile path
kept seeing zero Code files.

Switch to the local QueryAsync<MeshNode> helper which runs through
IMeshQueryProvider — it has no satellite filter and returns the
Code nodes regardless of how MainNode was set. For each configured
source path, run both a path:X exact-match and a namespace:X subtree
query, so single-file shorthand and folder queries both resolve.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates Anthropic__Models__1 and ModelTier__Heavy from claude-opus-4-6 to
claude-opus-4-7 in the Aspire AppHost configuration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…streams

CombineLatest(permissions, descendantCount) required BOTH sources to emit
before the delete page rendered. If either stream was slow or stuck (hub
saturation, query hang), users saw an eternal spinner and the GUI appeared
frozen.

- StartWith a "Loading…" placeholder so the click-through from the menu
  always renders something.
- 10s Timeout + Catch on each source stream; on failure, deny permission
  and render zero descendants rather than blocking forever.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Query.Test was the heaviest test assembly (~400 tests, samples/Graph +
content-collections + 3 partitions) and a CI runner OOM killed it mid-run.
Moves the five Autocomplete* files (incl. the 800-line MultiSource class
that creates content files on disk) to their own project. Each project
already runs in its own dotnet test invocation in the workflow, so memory
is fully released between them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ContentAutocompleteProvider was registered up to 3 times in chained hub
configs (PortalApplication → PortalNodeType → OrganizationNodeType) because
AddContentCollectionsInfrastructure ran WithServices(AddContentService)
unconditionally on every nested call. The flag guard previously only
protected AddContentCollections (layout areas), not the infrastructure path.

Move the guard down so WithServices(AddContentService) runs at most once
per hub-config chain. Combined with the existing TryAddEnumerable in
AddContentService, this prevents the autocomplete provider from yielding
duplicate items at the consumer side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Picks up in-flight changes across ThreadLayoutAreas, ThreadMessageBubbleView,
JsonSynchronizationStream, SynchronizationStream, and VersionLayoutArea.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented Apr 19, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown

Copilot AI commented Apr 19, 2026

@copilot resolve the merge conflicts in this pull request

Resolved in 1f53d8e. The two conflicts were:

  • .gitignore: used origin/main's targeted ignores (.claude/settings.local.json, .claude/scheduled_tasks.lock) instead of blanket-ignoring .claude/
  • .claude/settings.json: accepted origin/main's renamed file (previously settings.local.json)

@rbuergi rbuergi merged commit 05ba282 into main Apr 19, 2026
1 check failed
@rbuergi rbuergi deleted the bug_fix branch April 19, 2026 21:28
Copilot stopped work on behalf of rbuergi due to an error April 19, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants