Skip to content

Refine scripting accepted and visibility semantics#181

Merged
louis4li merged 5 commits intodevfrom
fix/2026-04-10_scripting-authority-write-path-cqrs-closure
Apr 14, 2026
Merged

Refine scripting accepted and visibility semantics#181
louis4li merged 5 commits intodevfrom
fix/2026-04-10_scripting-authority-write-path-cqrs-closure

Conversation

@louis4li
Copy link
Copy Markdown
Contributor

What changed

  • narrowed scripting write-path success semantics to accepted for dispatch and removed write-side coupling to read-model catch-up
  • added stable accepted receipts plus save-observation flow for scope/studio save APIs
  • updated Studio web and CLI flows so promotion only shows as applied after the existing catalog read model reflects the target revision
  • removed wall-clock supersede inference from save observation and switched to durable catalog facts
  • aligned the CQRS design doc with the shipped behavior, including reusing existing catalog read APIs instead of adding a dedicated promotion observation endpoint

Why

The write path had drifted back toward read/write coupling in two places: backend services previously waited for query-visible catalog state, and UI flows treated an accepted promotion decision as if the catalog had already switched. That made accepted, applied, and query-visible semantically blurry and introduced a user-visible race.

Impact

  • command endpoints now honestly return accepted handles instead of implying read freshness
  • save and promotion UX now distinguishes "accepted" from "catalog visible"
  • read-side visibility continues to come from the existing read-model interfaces

Root cause

The catalog promote/save paths were mixing write-side dispatch acknowledgement with read-side materialization state. Save observation also used cross-service wall-clock ordering, which is not a reliable authority signal.

Validation

  • dotnet test test/Aevatar.GAgentService.Integration.Tests/Aevatar.GAgentService.Integration.Tests.csproj --filter ScopeScriptApplicationServicesTests --nologo
  • bash tools/ci/test_stability_guards.sh
  • pnpm tsc --noEmit in apps/aevatar-console-web
  • pnpm jest src/modules/studio/scripts/ScriptsWorkbenchPage.test.tsx --runInBand in apps/aevatar-console-web
  • npm run build in tools/Aevatar.Tools.Cli/Frontend

Issues

@louis4li louis4li marked this pull request as ready for review April 13, 2026 13:40
@louis4li louis4li changed the title [codex] Refine scripting accepted and visibility semantics Refine scripting accepted and visibility semantics Apr 13, 2026
@louis4li louis4li requested review from eanzhao and potter-sun and removed request for eanzhao April 13, 2026 13:42
@louis4li louis4li self-assigned this Apr 13, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 95.80838% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.62%. Comparing base (670f6f2) to head (4aa3a76).
⚠️ Report is 6 commits behind head on dev.

Files with missing lines Patch % Lines
...ation/Scripts/ScopeScriptSaveObservationService.cs 91.42% 0 Missing and 6 partials ⚠️
...on/Scripts/ScopeScriptCommandApplicationService.cs 95.00% 0 Missing and 1 partial ⚠️
@@            Coverage Diff             @@
##              dev     #181      +/-   ##
==========================================
- Coverage   81.67%   81.62%   -0.06%     
==========================================
  Files         739      741       +2     
  Lines       46991    47055      +64     
  Branches     6234     6230       -4     
==========================================
+ Hits        38382    38410      +28     
- Misses       5918     5954      +36     
  Partials     2691     2691              
Flag Coverage Δ
ci 81.62% <95.80%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ation/Runtime/ScriptBehaviorRuntimeCapabilities.cs 96.25% <100.00%> (-0.25%) ⬇️
.../Runtime/ScriptBehaviorRuntimeCapabilityFactory.cs 100.00% <100.00%> (ø)
....Scripting.Core/Ports/IScriptCatalogCommandPort.cs 100.00% <ø> (ø)
...ripting.Core/Ports/IScriptDefinitionCommandPort.cs 42.30% <100.00%> (+2.30%) ⬆️
...ting.Core/Ports/ScriptingCommandAcceptedReceipt.cs 100.00% <100.00%> (ø)
...ucture/Ports/RuntimeScriptCatalogCommandService.cs 96.00% <100.00%> (+2.40%) ⬆️
...ure/Ports/RuntimeScriptDefinitionCommandService.cs 95.45% <100.00%> (-0.11%) ⬇️
...re/Ports/ScriptingCommandAcceptedReceiptFactory.cs 100.00% <100.00%> (ø)
...astructure/Ports/ScriptingCommandDispatchModels.cs 90.81% <ø> (+0.62%) ⬆️
...ice.Abstractions/ScopeScripts/ScopeScriptModels.cs 100.00% <100.00%> (ø)
... and 4 more

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

eanzhao commented Apr 14, 2026

I don't think the authority-activation dependency has actually been removed yet.

The services were changed to accept IScriptAuthorityReadModelActivationPort?, but the default host still registers that port in AddScriptingProjectionComponents(), and the write/runtime paths still call ActivateAsync(...) when it is present (RuntimeScriptDefinitionCommandService, RuntimeScriptCatalogCommandService, ScriptBehaviorRuntimeCapabilities). In the default production graph this branch still executes, so the write path is still controlling authority projection lifecycle.

To match the design goal here, I think the activation dependency/calls need to be removed from these paths entirely, or the default graph needs to stop wiring the port into them.

Copy link
Copy Markdown
Contributor

eanzhao commented Apr 14, 2026

saveCurrentDraftToScope captures resolvedScopeId, but resolvedScopeId is not part of the callback dependency list in ScriptsWorkbenchPage.tsx.

This page is rendered without a scope-based remount key from pages/studio/index.tsx, so after switching scope the callback can still PUT to the previous scope until something else recreates it. That makes the save path vulnerable to writing into the wrong scope.

I think resolvedScopeId needs to be included in the callback dependencies, or the callback needs to resolve the current scope at call time instead of closing over an old value.

Copy link
Copy Markdown
Contributor

eanzhao commented Apr 14, 2026

The new save-observation polling treats the first transient query error as a hard save failure.

observeAcceptedSave in the web UI and the analogous loop in the CLI both call the observation endpoint without the transient-error retry handling that promotion polling already has. A single timeout / 5xx from /save-observation can therefore surface as "save failed" even though the accepted save becomes visible moments later.

I think this should handle transient observation failures the same way promotion polling does: retry within the observation window, and only fail after the retry budget is exhausted.

Copy link
Copy Markdown
Contributor Author

louis4li commented Apr 14, 2026

Addressed the three scripting review comments; the latest pushed commit is 4aa3a760 after rebasing onto the updated branch tip.

  1. Removed the remaining authority read-model activation coupling from the write/runtime paths. RuntimeScriptDefinitionCommandService, RuntimeScriptCatalogCommandService, and ScriptBehaviorRuntimeCapabilities no longer accept or call IScriptAuthorityReadModelActivationPort, so the write path is no longer controlling authority projection lifecycle.
  2. Fixed the Studio save callback to track the latest resolvedScopeId, so switching scope no longer leaves saveCurrentDraftToScope closing over the previous scope id.
  3. Added transient-error retry handling to save observation polling in both Studio web and the CLI frontend, matching the promotion polling behavior. A temporary timeout / 5xx during /save-observation no longer immediately surfaces as a save failure if the save becomes visible within the observation window.

Validation run after rebasing onto the latest branch state:

  • bash tools/ci/query_projection_priming_guard.sh
  • dotnet test test/Aevatar.Scripting.Core.Tests/Aevatar.Scripting.Core.Tests.csproj --filter "ScriptAgentLifecycleCapabilitiesTests|ScriptingBranchCoverageTests" --nologo
  • pnpm jest src/modules/studio/scripts/ScriptsWorkbenchPage.test.tsx --runInBand
  • pnpm tsc --noEmit in apps/aevatar-console-web

Earlier in the same fix cycle I also ran:

  • bash tools/ci/test_stability_guards.sh
  • npm run build in tools/Aevatar.Tools.Cli/Frontend

Copy link
Copy Markdown
Contributor

@eanzhao eanzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@louis4li louis4li removed the request for review from potter-sun April 14, 2026 07:25
@louis4li louis4li merged commit 067996e into dev Apr 14, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[P0] 去掉 catalog promote/rollback 成功路径中的 query-port 追平依赖 [P0] 从 scripting 写路径中移除 authority read-model activation

2 participants