Skip to content

[#255] Phase 0: Channel RFC landing + infra prerequisites#267

Merged
eanzhao merged 9 commits intodevfrom
feat/2026-04-20_issue-255-channel-phase0
Apr 21, 2026
Merged

[#255] Phase 0: Channel RFC landing + infra prerequisites#267
eanzhao merged 9 commits intodevfrom
feat/2026-04-20_issue-255-channel-phase0

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented Apr 20, 2026

Closes #255

Summary

This PR lands the Phase 0 infra prerequisites for the Channel RFC together with the canonical RFC document.

The RFC now lives at docs/canon/aevatar-channel-architecture.md so it matches the repository's docs governance rules.

Relationship to issues

What's in this PR

  • docs/canon/aevatar-channel-architecture.md — canonical RFC document
  • IProjectionWriteDispatcher<T>.DeleteAsync + IProjectionWriteSink<T>.DeleteAsync
  • Kafka persistent stream provider redelivery validation harness (KafkaPersistentStreamProviderRedeliveryValidationTests)
  • ICredentialProvider + credential_ref scaffold wired through Channel Runtime registration proto/state/read model

Scope note

  • The repository currently ships InMemory and KafkaProvider Orleans stream backends only.
  • The validation harness in this PR verifies the Kafka persistent-provider semantics that the repo actually supports today.
  • EventHubs-specific validation is intentionally not claimed here; if an EventHubs backend is added later, this harness should be extended for that backend explicitly instead of assuming Kafka behavior carries over.

Test plan

  • dotnet test test/Aevatar.CQRS.Projection.Core.Tests/Aevatar.CQRS.Projection.Core.Tests.csproj --filter ElasticsearchProjectionDocumentStoreBehaviorTests --nologo
  • dotnet test test/Aevatar.Foundation.Runtime.Hosting.Tests/Aevatar.Foundation.Runtime.Hosting.Tests.csproj --filter KafkaPersistentStreamProviderRedeliveryValidationTests --nologo
  • bash tools/ci/test_stability_guards.sh
  • bash tools/docs/lint.sh

Notes:

  • The Kafka persistent-provider tests are environment-gated and skip unless AEVATAR_TEST_KAFKA_BOOTSTRAP_SERVERS and AEVATAR_TEST_GARNET_CONNECTION_STRING are set.
  • DeleteAsync delete-path semantics were tightened so deletes no longer bootstrap a missing Elasticsearch index.

Notes for reviewers

The PR now contains the complete Phase 0 scope for #255. The remaining review focus should be:

  • whether the DeleteAsync contract is honest and side-effect free on the delete path
  • whether the Kafka persistent-provider harness is scoped honestly to the runtime backends that exist in-repo today
  • whether the credential_ref scaffold stays a pure persistence/contract change and avoids premature runtime secret-resolution coupling

eanzhao and others added 3 commits April 20, 2026 18:44
Part of #254 and first commit of PR for #255.

Canonical location is now docs/rfcs/aevatar-channel-architecture.md. Main
issue #254 carries the same content across comment parts (due to GitHub 64KB
body limit); this file is the authoritative version going forward.

Subsequent commits on this branch will implement the #255 Phase 0 infra
prerequisites: IProjectionWriteDispatcher.DeleteAsync, Orleans Streams
persistent provider validation harness, and secret manager scaffold /
ICredentialProvider interface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaces the tombstone capability promised by Channel RFC §17.1 so the
future PerEntryDocumentProjector (§7.1) can remove orphaned entries
without hacking around IProjectionWriteDispatcher's upsert-only surface.
Existing projectors keep their current upsert-only behavior; the
projector comments now point at the §7.1 refactor as the wiring point.

- Interfaces: IProjectionWriteDispatcher<T> / IProjectionWriteSink<T> /
  IProjectionDocumentWriter<T> each gain
  Task<ProjectionWriteResult> DeleteAsync(string id, CancellationToken)
- ProjectionStoreDispatcher / ProjectionDocumentStoreBinding delegate
  to the bound writer
- InMemoryProjectionDocumentStore removes the key under its gate and
  reports Applied / Duplicate based on pre-delete presence
- ElasticsearchProjectionDocumentStore issues DELETE /index/_doc/{id},
  parses the `result` field to distinguish deleted vs not_found, and
  reuses existing missing-index handling

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Channel RFC §9.5.6 pins durability on the Orleans persistent stream
provider redelivery contract: OnNextAsync return advances the
subscription checkpoint, throw must not. Before the RFC commits to
EventHubs as the §17 Phase 0 default, that contract must be verified
empirically rather than inferred from vendor docs.

Adds two [KafkaGarnetIntegrationFact] tests against the aevatar Kafka
persistent-provider integration infra (same checkpoint-commit contract
as EventHubs): one asserts no redelivery on clean return, the other
asserts at-least-one redelivery when the agent throws and the envelope
carries Dispatch.PropagateFailure=true. The same harness, re-run
against an EventHubs stream backend (StreamBackend = EventHubs +
namespace connection string), is the EventHubs validation step called
for in the RFC — result captured in the Phase 0 PR body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 86.40000% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.57%. Comparing base (017b789) to head (d462ba7).
⚠️ Report is 10 commits behind head on dev.

Files with missing lines Patch % Lines
...InMemory/Stores/InMemoryProjectionDocumentStore.cs 61.76% 13 Missing ⚠️
...rch/Stores/ElasticsearchProjectionDocumentStore.cs 95.06% 2 Missing and 2 partials ⚠️
@@            Coverage Diff             @@
##              dev     #267      +/-   ##
==========================================
+ Coverage   68.54%   68.57%   +0.02%     
==========================================
  Files        1109     1109              
  Lines       77905    78030     +125     
  Branches    10204    10213       +9     
==========================================
+ Hits        53401    53506     +105     
- Misses      20599    20615      +16     
- Partials     3905     3909       +4     
Flag Coverage Δ
ci 68.57% <86.40%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
....Runtime/Runtime/ProjectionDocumentStoreBinding.cs 89.47% <100.00%> (+3.75%) ⬆️
...ction.Runtime/Runtime/ProjectionStoreDispatcher.cs 100.00% <100.00%> (ø)
...rch/Stores/ElasticsearchProjectionDocumentStore.cs 76.52% <95.06%> (+4.14%) ⬆️
...InMemory/Stores/InMemoryProjectionDocumentStore.cs 29.66% <61.76%> (+3.41%) ⬆️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eanzhao and others added 6 commits April 20, 2026 19:16
Channel RFC §17.3 requires raw secrets to leave persisted state and
command envelopes, replaced by an opaque credential_ref that the
provider boundary materializes at use-time. Before the main Channel RFC
implementation can start migrating Lark encrypt_key and nyx_user_token
off raw persistence, the framework-level contract and the proto field
itself must exist.

- Adds Aevatar.Foundation.Abstractions.Credentials.ICredentialProvider:
  Task<string?> ResolveAsync(string credentialRef, CancellationToken ct).
  Returns null when the reference is unknown so boundary adapters can
  fall through to legacy raw fields during migration.
- Adds credential_ref to ChannelBotRegistrationEntry (10),
  ChannelBotRegisterCommand (9), and ChannelBotRegistrationDocument
  (13). Existing encrypt_key / nyx_user_token fields remain during the
  migration window; the main Channel RFC PR migrates callers and drops
  the raw fields.
- Tests: contract coverage (hit + miss) via a FakeCredentialProvider
  test double demonstrating the interface is implementable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elivery harness

Codex review flagged two P1s on the Phase 0 scaffold:

1. credential_ref was silently dropped between command and state. If a
   Phase 1 caller set only credential_ref, the raw encrypt_key path
   would still see empty and callbacks would fail mysteriously. Fix:
   carry the field through ChannelBotRegisterCommand → entry →
   projector → document. Adapter-side resolution still deferred to the
   main Channel RFC (§17.3 "within this RFC"), but the persistence
   plumbing is no longer a silent hole.

2. Task.Delay(NoRedeliveryQuietPeriod) in the redelivery harness
   tripped tools/ci/test_stability_guards.sh. The delay is load-bearing:
   the test verifies the ABSENCE of redelivery on the return path, and
   "no event fired for N seconds" has no deterministic signal. Added
   the file to tools/ci/test_polling_allowlist.txt with rationale.

Tests added:
- ChannelBotRegistrationStoreTests.HandleRegister_PersistsCredentialRef
- ChannelBotRegistrationProjectorTests.ProjectAsync_PropagatesCredentialRef

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to codex review. Prior commit plumbed credential_ref through
command → entry → event → state → projector → document but the query
port's ToEntry() still dropped the field, so a Phase 1 adapter reading
registration.CredentialRef would see empty.

- ChannelBotRegistrationQueryPort.ToEntry() now copies the field.
- RegistrationQueryPortTests gets BotQueryPort_GetAsync_PropagatesCredentialRef.
- channel_runtime_messages.proto comment spells out Phase 0 state (state
  and query port round-trip; no host surface exposure, no adapter
  consumption) vs Phase 1 (main Channel RFC: host surfaces accept it,
  LarkPlatformAdapter resolves via ICredentialProvider, raw encrypt_key
  becomes optional then removed).

Host-surface exposure (ChannelCallbackEndpoints / ChannelRegistrationTool)
is deliberately NOT added here — exposing a field that the adapter doesn't
consume yet would let callers register a "credential_ref-only" bot whose
callbacks silently fail. That surface lands with the adapter-side
resolution in the main Channel RFC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ores

codecov/patch went red at 8.92% because the two concrete store
implementations of IProjectionDocumentWriter.DeleteAsync had no direct
unit coverage — ProjectionStoreDispatcher / ProjectionDocumentStoreBinding
were covered, but the 130 lines added in InMemoryProjectionDocumentStore
and ElasticsearchProjectionDocumentStore were not.

New tests:
- InMemoryProjectionDocumentStoreBehaviorTests: exists→Applied,
  missing→Duplicate, blank-id throws, cancellation throws, key trim,
  idempotency across repeated calls.
- ElasticsearchProjectionDocumentStoreBehaviorTests.DeleteAsync_*:
  HTTP DELETE issues + "result":"deleted"→Applied vs "not_found"→
  Duplicate parsing, 404 index_not_found under Throw vs WarnAndReturnEmpty
  behaviors, dynamic index scope guard, blank-id guard, malformed JSON
  body falls back to Applied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

eanzhao commented Apr 21, 2026

Follow-up fixes from review have been addressed in commit d462ba7.

What was fixed:

  • ElasticsearchProjectionDocumentStore.DeleteAsync no longer bootstraps index creation on delete path (avoids side effects from AutoCreateIndex=true).
  • Added regression coverage for delete semantics (including "no index bootstrap before delete").
  • Renamed and clarified persistent-provider redelivery harness scope to Kafka-only (KafkaPersistentStreamProviderRedeliveryValidationTests), matching the stream backends currently present in-repo.
  • Moved RFC document to canonical docs location (docs/canon/aevatar-channel-architecture.md) and added required frontmatter fields (status, owner).

Validation run locally:

  • dotnet test test/Aevatar.CQRS.Projection.Core.Tests/Aevatar.CQRS.Projection.Core.Tests.csproj --filter ElasticsearchProjectionDocumentStoreBehaviorTests --nologo
  • dotnet test test/Aevatar.Foundation.Runtime.Hosting.Tests/Aevatar.Foundation.Runtime.Hosting.Tests.csproj --filter KafkaPersistentStreamProviderRedeliveryValidationTests --nologo (env-gated tests skipped without Kafka/Garnet vars)
  • bash tools/ci/test_stability_guards.sh
  • bash tools/docs/lint.sh

CI status:

  • GitHub Actions ci run #1112: success.

@eanzhao eanzhao merged commit ec09dcc into dev Apr 21, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Channel RFC] Phase 0 infra prerequisites (DeleteAsync / EventHubs harness / secret manager)

1 participant