You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discussion seeded by #498 (Adopt AgentKind + Kind Registry for runtime actor
identity). #498 covers the identity layer — kind ≠ CLR type. This issue
covers the state / event layer: when a refactor changes business model
shape (split, merge, re-key, schema upgrade), what is the prescribed pattern,
and what infrastructure does each pattern need?
The trigger was a proposal to ship a generic IActorMigration { Migrate(old) → new } interface that runs lazily on OnActivateAsync. That pattern is partially correct — it is the right
answer for narrow within-actor upgrades and the wrong answer for cross-actor
splits / merges / re-keying. We need the matrix written down before we start
adding interfaces.
Doctrinal test for "is this a lazy-migration case": the new state can be
derived from the old state alone, in pure code, without re-reading any
events. If you need events, it's a projection rebuild, not a lazy migration.
Run from RuntimeActorGrain.OnActivateAsync after state load:
Read Identity.state_schema_version from the runtime envelope.
While there is a registered migration with FromStateVersion == current,
apply it; advance.
Persist the new state with the new state_schema_version before
processing any command.
If migration throws, fail activation explicitly — do not swallow.
Constraints (locked at the contract level, enforced by CI guard):
Pure function of input state. No I/O, no other-actor calls, no
random / time-dependent inputs.
Idempotent — applying twice must yield the same result.
Total — must not throw on any well-formed historical state.
Migrations form a chain (v1→v2, v2→v3); skipping is forbidden.
Zero-dependency constructor: implementations may not depend on IServiceProvider, any IClient*, any *Async* service, ITimeService, IRandom, or anything that performs I/O. CI guard scans constructor
parameters of IActorStateMigration implementations and fails the build
on violations. This is the structural defense against drift toward a
"general-purpose data transformation framework".
state_schema_version placement (resolved)
Resolved via companion ADR actor-state-version-placement (co-issued with #498):
Business state protos remain pure domain artifacts. Migration concern does
not leak into them.
Migration registration keys on (state_proto_descriptor, from, to);
runtime reads version from the envelope.
YAGNI: the interface is deferred
Lazy-migration applies to exactly two row types in the matrix and there is
no concrete case driving either today. Per CLAUDE.md ("Don't design for
hypothetical future requirements" / "抽象一旦能被滥用即设计未完成"):
This issue ships doctrine + matrix + ADRs — not the interface.
IActorStateMigration<TState> is sketched here for future reference.
The first real within-actor migration case implements the interface
alongside its concrete migration. Until then, no empty foundation.
This avoids the slippery slope toward a "general-purpose data transformation
framework" (the explicit non-goal below).
Where lazy migration is wrong — use the projection pipeline
The lazy on-activation interface cannot safely support:
Actor split (one → many): actor A would have to spawn / initialize A''
during its own activation — A is mutating another actor's authoritative
state, violating "事实源唯一".
Actor merge (many → one): needs reads across multiple actor streams
during a single activation — outside any one actor's boundary.
Identity re-keying: requires global awareness that the same business
fact moved key — not solvable from inside one activation.
Mixed-version safety: migration that mutates state on activation breaks
pods running older code that still expect the pre-migration shape.
For all of these, the architecturally correct path is projection-pipeline
driven migration, using infrastructure that already exists plus one new
capability (bootstrap-from-projection — see below):
A's committed events are already in the projection main pipeline
(per docs/canon/event-sourcing.md — "committed domain event 必须可观察").
Stand up new projection consumers for A' / A'' that consume A's committed
events and materialize A' / A'' state into a dedicated readmodel.
New actor A' / A'' bootstraps its initial state from that readmodel via IActorBootstrapPort (one-time import on first activation), then becomes
authoritative.
Write commands progressively cut over from A to A' / A''. A keeps running
as source-of-truth during the transition window.
This is the "Strangler Fig" pattern at the actor level. It is gradual,
distributed-safe, replayable, and reversible.
Missing infrastructure: bootstrap-from-projection
The split / merge cookbook glosses "stand up new projection consumers for A'
and A''" — but RuntimeActorGrain today only initializes from its own
persisted state slot (AgentStateSnapshot). There is no contract for a new actor to bootstrap from projected state derived from another actor's
events. Without this, the strangler-fig pattern at the actor level cannot
work end-to-end.
This bootstrap contract is filed as a separate prerequisite issue. The
split / merge / re-key cookbook in this issue remains documentation-only
until that issue lands. Operational deliverables (worked example, CI gates
that depend on the cookbook) wait on it.
Re-keying: separate spec, not extension of retired-actor spec
IRetiredActorSpec retires a kind ("this kind no longer exists"). Re-keying
preserves the kind but moves the actor id. Different semantics; conflating
them pollutes the #495 contract.
Same hosted-service entrypoint as IRetiredActorSpec, executed once at
startup, idempotent. Filed as a separate issue when the first concrete
re-keying case arrives.
Doctrine: events are append-only, semantics are immutable
The matrix's "event semantic change" row is doctrine, not infrastructure.
Concrete rules (recorded in ADR event-immutability-policy):
A committed event's TypeUrl pins its semantics forever.
New semantics → new event type with a new TypeUrl. Old type stays for
history; projectors handle both during the transition window.
Adding optional fields to an event proto is permitted (proto3 evolution
rules). Adding fields whose absence implies a different semantic is
forbidden — that is a semantic change, not a shape change.
Backfilling history by replaying old events under new semantic assumptions
is forbidden in normal operation. If a projection has to be rebuilt, it
rebuilds under the original semantics of each event.
This row exists in the matrix because the most common silent failure mode
in event-sourced systems is "we tweaked what UserUpdated means" — the
matrix should refuse that path explicitly.
Open design questions (resolved or deferred)
Where does state_version live? Resolved: runtime envelope
(RuntimeActorIdentity.state_schema_version), not business state proto.
See companion ADR.
Failure mode when no migration is registered for a stale state_schema_version: Resolved: fail activation hard. Silent data
drift is worse than visible startup error.
Re-keying mechanism: Resolved: separate IActorRedirectSpec,
separate issue. Not a flavor of retire and not a flavor of state migration.
Migration registration shape: Deferred until first concrete case.
Default position: attribute-based
([StateMigration(typeof(SkillDefinitionState), from: 1, to: 2)]) for
discoverability; DI registration acceptable for tests.
Projection-driven split protocol: Cookbook documented in docs/canon/projection-driven-actor-split.md — but operational deliverable
blocks on bootstrap-from-projection issue.
Deliverables
docs/canon/actor-model-evolution.md — the matrix above plus a
one-paragraph example for each cell. Ship first.
ADR docs/adr/NNNN-actor-evolution-pattern-decision-matrix.md — the
decision matrix locked, supersedes any ad-hoc framing.
ADR docs/adr/NNNN-event-immutability-policy.md — events append-only,
semantics immutable; new TypeUrl on semantic change.
CI / review skill check: every refactor PR that deletes / renames /
moves an actor type or *State proto must declare in the description
which row of the matrix it falls under.
docs/canon/projection-driven-actor-split.md — split / merge cookbook
with one worked example end-to-end (write commands cut-over phases,
retire timing, projection consumer wiring, bootstrap port import). Blocks on bootstrap-from-projection issue landing.
IActorStateMigration<TState> interface — deferred to first real
case. Sketch retained in this issue for reference; CI guard
(zero-dependency constructor) lands together with the interface, not
before.
Worked split-cookbook exemplar (e.g., a hypothetical Foo → FooConfig + FooRun). Blocks on bootstrap port.
(new) Bootstrap-from-projection contract: prerequisite for the split /
merge / re-key rows of this matrix to be operational. To be filed
separately.
(new) IActorRedirectSpec: prerequisite for the re-keying row. To be
filed separately when first concrete case appears.
Non-goals
A general-purpose "data transformation framework". The lazy migration
interface is intentionally narrow — purity is enforced by CI guard on
constructor dependencies.
Replacing proto3 / [LegacyProtoFullName] for payload codec compatibility.
Those layers stay untouched.
Online schema migration tooling for state stores other than Aevatar's own
event store + actor state.
Mutating event semantics in place. Always new TypeUrl; old type retained
for history.
Background
Discussion seeded by #498 (Adopt AgentKind + Kind Registry for runtime actor
identity). #498 covers the identity layer — kind ≠ CLR type. This issue
covers the state / event layer: when a refactor changes business model
shape (split, merge, re-key, schema upgrade), what is the prescribed pattern,
and what infrastructure does each pattern need?
The trigger was a proposal to ship a generic
IActorMigration { Migrate(old) → new }interface that runs lazily onOnActivateAsync. That pattern is partially correct — it is the rightanswer for narrow within-actor upgrades and the wrong answer for cross-actor
splits / merges / re-keying. We need the matrix written down before we start
adding interfaces.
Decision matrix
ChannelRuntime.X→Scheduled.X[LegacyAgentKind]reserved 8,10,11,12on a state proto[LegacyProtoFullName]IActorStateMigrationSkillRunner→SkillDefinition+SkillExecutionskill-runner-{user}-{name}→skill-definition-{team}-{name}IActorRedirectSpecUserUpdatedsemantics shift over timeWhere lazy on-activation migration is right (narrow, real)
IActorStateMigrationis the right tool when the migration is fully withinone actor's boundary:
field's algorithm changed).
These work because:
state_schema_versionfield carried in the runtimeenvelope (
RuntimeActorIdentity.state_schema_versionfrom Adopt AgentKind + Kind Registry for runtime actor identity #498).Doctrinal test for "is this a lazy-migration case": the new state can be
derived from the old state alone, in pure code, without re-reading any
events. If you need events, it's a projection rebuild, not a lazy migration.
Proposed contract (sketch — not landing now)
Run from
RuntimeActorGrain.OnActivateAsyncafter state load:Identity.state_schema_versionfrom the runtime envelope.FromStateVersion == current,apply it; advance.
state_schema_versionbeforeprocessing any command.
Constraints (locked at the contract level, enforced by CI guard):
random / time-dependent inputs.
v1→v2,v2→v3); skipping is forbidden.IServiceProvider, anyIClient*, any*Async*service,ITimeService,IRandom, or anything that performs I/O. CI guard scans constructorparameters of
IActorStateMigrationimplementations and fails the buildon violations. This is the structural defense against drift toward a
"general-purpose data transformation framework".
state_schema_versionplacement (resolved)Resolved via companion ADR
actor-state-version-placement(co-issued with#498):
RuntimeActorIdentity.state_schema_versionper Adopt AgentKind + Kind Registry for runtime actor identity #498), not on business state protos.
not leak into them.
(state_proto_descriptor, from, to);runtime reads version from the envelope.
YAGNI: the interface is deferred
Lazy-migration applies to exactly two row types in the matrix and there is
no concrete case driving either today. Per CLAUDE.md ("Don't design for
hypothetical future requirements" / "抽象一旦能被滥用即设计未完成"):
IActorStateMigration<TState>is sketched here for future reference.alongside its concrete migration. Until then, no empty foundation.
This avoids the slippery slope toward a "general-purpose data transformation
framework" (the explicit non-goal below).
Where lazy migration is wrong — use the projection pipeline
The lazy on-activation interface cannot safely support:
during its own activation — A is mutating another actor's authoritative
state, violating "事实源唯一".
during a single activation — outside any one actor's boundary.
fact moved key — not solvable from inside one activation.
pods running older code that still expect the pre-migration shape.
For all of these, the architecturally correct path is projection-pipeline
driven migration, using infrastructure that already exists plus one new
capability (bootstrap-from-projection — see below):
(per
docs/canon/event-sourcing.md— "committed domain event 必须可观察").events and materialize A' / A'' state into a dedicated readmodel.
IActorBootstrapPort(one-time import on first activation), then becomesauthoritative.
as source-of-truth during the transition window.
AgentKindfrom Adopt AgentKind + Kind Registry for runtime actor identity #498) onceread paths are migrated.
This is the "Strangler Fig" pattern at the actor level. It is gradual,
distributed-safe, replayable, and reversible.
Missing infrastructure: bootstrap-from-projection
The split / merge cookbook glosses "stand up new projection consumers for A'
and A''" — but
RuntimeActorGraintoday only initializes from its ownpersisted state slot (
AgentStateSnapshot). There is no contract fora new actor to bootstrap from projected state derived from another actor's
events. Without this, the strangler-fig pattern at the actor level cannot
work end-to-end.
This bootstrap contract is filed as a separate prerequisite issue. The
split / merge / re-key cookbook in this issue remains documentation-only
until that issue lands. Operational deliverables (worked example, CI gates
that depend on the cookbook) wait on it.
Re-keying: separate spec, not extension of retired-actor spec
IRetiredActorSpecretires a kind ("this kind no longer exists"). Re-keyingpreserves the kind but moves the actor id. Different semantics; conflating
them pollutes the #495 contract.
Re-keying gets its own:
Same hosted-service entrypoint as
IRetiredActorSpec, executed once atstartup, idempotent. Filed as a separate issue when the first concrete
re-keying case arrives.
Doctrine: events are append-only, semantics are immutable
The matrix's "event semantic change" row is doctrine, not infrastructure.
Concrete rules (recorded in ADR
event-immutability-policy):history; projectors handle both during the transition window.
rules). Adding fields whose absence implies a different semantic is
forbidden — that is a semantic change, not a shape change.
is forbidden in normal operation. If a projection has to be rebuilt, it
rebuilds under the original semantics of each event.
This row exists in the matrix because the most common silent failure mode
in event-sourced systems is "we tweaked what
UserUpdatedmeans" — thematrix should refuse that path explicitly.
Open design questions (resolved or deferred)
Where doesResolved: runtime envelopestate_versionlive?(
RuntimeActorIdentity.state_schema_version), not business state proto.See companion ADR.
Failure mode when no migration is registered for a stale
: Resolved: fail activation hard. Silent datastate_schema_versiondrift is worse than visible startup error.
Re-keying mechanism: Resolved: separateIActorRedirectSpec,separate issue. Not a flavor of retire and not a flavor of state migration.
Default position: attribute-based
(
[StateMigration(typeof(SkillDefinitionState), from: 1, to: 2)]) fordiscoverability; DI registration acceptable for tests.
docs/canon/projection-driven-actor-split.md— but operational deliverableblocks on bootstrap-from-projection issue.
Deliverables
docs/canon/actor-model-evolution.md— the matrix above plus aone-paragraph example for each cell. Ship first.
docs/adr/NNNN-actor-evolution-pattern-decision-matrix.md— thedecision matrix locked, supersedes any ad-hoc framing.
docs/adr/NNNN-actor-state-version-placement.md— co-issued withAdopt AgentKind + Kind Registry for runtime actor identity #498; locks placement on runtime envelope.
docs/adr/NNNN-event-immutability-policy.md— events append-only,semantics immutable; new TypeUrl on semantic change.
moves an actor type or
*Stateproto must declare in the descriptionwhich row of the matrix it falls under.
docs/canon/projection-driven-actor-split.md— split / merge cookbookwith one worked example end-to-end (write commands cut-over phases,
retire timing, projection consumer wiring, bootstrap port import).
Blocks on bootstrap-from-projection issue landing.
IActorStateMigration<TState>interface — deferred to first realcase. Sketch retained in this issue for reference; CI guard
(zero-dependency constructor) lands together with the interface, not
before.
Foo→FooConfig+FooRun). Blocks on bootstrap port.Relationship to other issues
state_schema_versionlivesin the
RuntimeActorIdentitysub-message landed in Adopt AgentKind + Kind Registry for runtime actor identity #498 Phase 1.Cross-assembly rename and identity-only refactors collapse to kind-alias
and never touch this issue.
Retirehalf of any split / merge.Re-keying is not covered by Fix retired ChannelRuntime startup cleanup #495 — separate
IActorRedirectSpecspec.SkillRunnersplit): currently blocks on Adopt AgentKind + Kind Registry for runtime actor identity #498. Its split isnot a state-migration case —
SkillExecutionGAgentis brand-new andsession-scoped; no historical execution data needs migration. If we later
want to back-fill historical execution actors from
SkillRunnereventhistory, that becomes the worked example for the split cookbook (and
exercises bootstrap-from-projection).
merge / re-key rows of this matrix to be operational. To be filed
separately.
IActorRedirectSpec: prerequisite for the re-keying row. To befiled separately when first concrete case appears.
Non-goals
interface is intentionally narrow — purity is enforced by CI guard on
constructor dependencies.
[LegacyProtoFullName]for payload codec compatibility.Those layers stay untouched.
event store + actor state.
for history.