Skip to content

Refactor workflow architecture to separate definition and run actors#29

Closed
loning wants to merge 24 commits intochore/actor-runtime-stream-async-blueprint-20260305from
refactor/workflow-actorized-run-persistent-state
Closed

Refactor workflow architecture to separate definition and run actors#29
loning wants to merge 24 commits intochore/actor-runtime-stream-async-blueprint-20260305from
refactor/workflow-actorized-run-persistent-state

Conversation

@loning
Copy link
Copy Markdown
Contributor

@loning loning commented Mar 6, 2026

  • Introduced WorkflowRunGAgent to manage the state and execution of individual workflow runs, distinct from the WorkflowGAgent which now solely handles workflow definitions and bindings.
  • Updated documentation to reflect the new architecture, emphasizing the separation of concerns between definition and run actors.
  • Enhanced event handling to ensure that all run-related facts are persisted within WorkflowRunState, improving reliability and recovery during reactivations.
  • Modified existing message types to include new fields for better tracking of run states and events, such as resume_token and wait_token.
  • Adjusted application services and query interfaces to accommodate the new actor model, ensuring compatibility with the updated workflow execution flow.

loning added 24 commits March 7, 2026 05:28
- Introduced `WorkflowRunGAgent` to manage the state and execution of individual workflow runs, distinct from the `WorkflowGAgent` which now solely handles workflow definitions and bindings.
- Updated documentation to reflect the new architecture, emphasizing the separation of concerns between definition and run actors.
- Enhanced event handling to ensure that all run-related facts are persisted within `WorkflowRunState`, improving reliability and recovery during reactivations.
- Modified existing message types to include new fields for better tracking of run states and events, such as `resume_token` and `wait_token`.
- Adjusted application services and query interfaces to accommodate the new actor model, ensuring compatibility with the updated workflow execution flow.
- Deleted the `maker_recursive` module from the workflow, transitioning to a simplified architecture that retains only the `maker_vote` primitive.
- Updated the `README.md` to reflect the current state of the demo, emphasizing that the recursive version is now archived and no longer operational.
- Adjusted the `maker_analysis.yaml` to serve as an archived placeholder, removing references to the recursive workflow steps.
- Refactored the `MakerRunProjectionAccumulator` to eliminate checks and logging related to the removed recursive steps.
- Enhanced documentation across various files to clarify the new workflow structure and the implications of the changes.
- Deleted the entire `Aevatar.Demos.CaseProjection` module, including all associated files and projects, to streamline the solution.
- Updated the solution file to remove references to the deleted projects.
- Revised the README documentation to reflect the removal of the Case Projection demo and its components.
- Ensured that all related dependencies and configurations were cleaned up to maintain a tidy project structure.
- Revised the README.md in the Workflow Primitives Demo to clarify the focus on built-in primitives and role routing capabilities.
- Updated references from "Role Event Modules" to "Role Routed Extensions" across various files for consistency.
- Enhanced the Scripting Architecture documentation to reflect changes in the core actor structure, including the introduction of the `ScriptEvolutionSessionGAgent`.
- Adjusted the Workflow Capability API documentation to accurately describe the new endpoint structure and functionality.
- Improved clarity in the Workflow Application README regarding the in-memory definition catalog and its purpose.
- Ensured all changes align with the updated architecture and improve overall documentation coherence.
…cumentation

- Introduced a new document outlining the proposed architecture for phase-4 core decomposition and final cleanup, detailing the current state and remaining structural issues.
- Highlighted key architectural decisions, including the need to narrow down the `WorkflowRunGAgent` and eliminate unnecessary index actors.
- Provided a comprehensive overview of the target architecture, emphasizing the separation of concerns and the minimization of persisted facts.
- Documented the goals and structural changes necessary to achieve a more stable and efficient workflow runtime environment.
- Ensured alignment with previous architectural revisions and established clear acceptance criteria for future refactoring efforts.
…fficiency

- Updated workflow classification to reflect changes in human interaction grouping, renaming "human-interaction-legacy" to "human-interaction-auto" for consistency.
- Removed the `ScriptEvolutionManagerGAgent` and associated fallback mechanisms to streamline the script evolution process, consolidating responsibilities within the `ScriptEvolutionSessionGAgent`.
- Eliminated legacy timeout callback handling and unnecessary state fields, enhancing the clarity of the script definition query states.
- Revised documentation to align with architectural changes, ensuring accurate representation of the current system structure and functionality.
- Improved overall code quality by removing deprecated components and refining event handling across various modules.
…grade feature

- Clarified the purpose and functionality of the mixed-version rolling upgrade feature in the architecture documentation, emphasizing its role in production environments.
- Added detailed comments in the `CompatibilityFailureInjectionPolicy` and `RuntimeActorGrain` classes to explain the test-only fault injection mechanism and its relation to the production feature.
- Updated the smoke test script to validate the production mixed-version feature by enabling compatibility failure injection for testing purposes.
- Ensured that all changes align with the goal of maintaining service availability during rolling upgrades while old and new binaries coexist.
…ntation

- Introduced a comprehensive document outlining the proposed architecture for separating production capabilities from testing validation in the Orleans mixed-version context.
- Defined the scope, background, and current issues related to the integration of production and testing paths, emphasizing the need for clear boundaries.
- Established core architectural decisions and goals, including the introduction of typed options for production configurations and the abstraction of testing mechanisms.
- Provided detailed implementation design and acceptance criteria to guide future development efforts, ensuring alignment with the overall architectural vision.
- Added `AddInMemoryWorkflowDefinitionCatalog()` to the service configuration for in-memory workflow definitions, improving flexibility for development and testing environments.
- Updated documentation to clarify the assembly rules for the definition catalog, emphasizing the need for explicit registration in development and production scenarios.
- Introduced a new architectural document detailing the runtime phase-5 changes, focusing on the core decomposition and the removal of the `RunManager` to streamline run lifecycle management.
- Removed the `RunManager` interface and its implementation, consolidating run context management responsibilities to improve code clarity and reduce complexity.
- Enhanced the `RuntimeActorGrain` with a new envelope processing pipeline, incorporating compatibility checks, deduplication, and forwarding guards to ensure robust event handling.
- Replaced `IConnectorRegistry` with `IConnectorCatalog` to improve the immutability and clarity of connector management.
- Introduced `AddConfiguredConnectorCatalog()` for streamlined connector configuration during application startup.
- Updated documentation to reflect changes in connector handling, emphasizing the transition from mutable to immutable catalog structures.
- Removed the `ConnectorBootstrapHostedService` to simplify the service registration process and enhance clarity in connector loading.
- Enhanced architectural documentation to outline the new connector management approach and its implications for workflow execution.
…documentation

- Introduced a comprehensive document outlining the proposed architecture for the retirement of the event module and the transition to a thin owner model in the workflow system.
- Defined the scope, best practices, and current issues related to the existing architecture, emphasizing the need for a clear separation of concerns and the elimination of outdated components.
- Established core architectural goals and constraints for phase-7, focusing on the reduction of complexity and the enhancement of system clarity.
- Documented the target architecture and the necessary changes to achieve a streamlined workflow runtime environment, ensuring alignment with previous architectural revisions.
- Removed deprecated role-level event modules in favor of explicit workflow steps, enhancing clarity and maintainability.
- Introduced new YAML workflows that replace old role-event-module configurations with deterministic step definitions.
- Updated documentation to reflect the transition from role-based routing to explicit workflow definitions, ensuring consistency across examples.
- Enhanced the README to clarify the focus on built-in workflow primitives and explicit orchestration patterns.
…n handling

- Removed the `dispatchToSelfAsync` function from `OrleansGrainEventPublisher` to streamline event dispatching.
- Updated the event publishing logic to directly use the stream provider for self-dispatching.
- Enhanced the `ScriptEvolutionSessionGAgent` to include detailed logging and improved error handling during script evolution sessions.
- Introduced request and reply identifiers in various script-related requests to facilitate better tracking and response management.
- Added new event types and updated existing ones in the protocol buffer definitions to support enhanced functionality in script evolution and definition management.
…ack refactor blueprint documentation

- Introduced a new document outlining the proposed architecture for the unified execution kernel and its interaction with workflow orchestration and scripting capabilities.
- Defined the scope, core principles, and best practices for integrating scripting as a dynamic capability implementation layer alongside traditional workflow orchestration.
- Clarified the roles of `workflow` and `scripting`, emphasizing their coexistence without becoming parallel runtime systems.
- Documented current issues and architectural decisions to guide future development and ensure alignment with overarching system goals.
- Updated the documentation to reflect the transition of the scripting architecture to version 9, clarifying the roles of scripting and workflow as distinct yet complementary components.
- Enhanced the `ScriptEvolutionSessionGAgent` to streamline command acknowledgment and snapshot querying, improving the handling of script evolution sessions.
- Introduced new event types and refined existing ones in the protocol buffer definitions to support enhanced functionality in script execution and evolution.
- Removed deprecated decision query handling to simplify the architecture and improve clarity in the script evolution process.
- Added new query application services and updated existing interfaces to facilitate better access to script evolution and execution snapshots.
- Improved overall code quality by consolidating responsibilities and enhancing event handling across various modules.
- Updated documentation to reflect the addition of `WorkflowRunStepRequestFactory` and `WorkflowRunSupport` classes, which encapsulate step request construction and various helper functions.
- Refactored existing code to utilize the new support classes, improving clarity and reducing complexity in the `WorkflowRunGAgent`.
- Renamed several methods and variables for consistency and clarity, particularly in relation to the handling of workflow steps and execution states.
- Documented the transition from deprecated methods to the new architecture, ensuring alignment with the overall workflow system goals.
- Introduced new runtime classes: `WorkflowRunAIRuntime`, `WorkflowRunAsyncPolicyRuntime`, `WorkflowRunCallbackRuntime`, `WorkflowRunCompositionRuntime`, `WorkflowRunControlFlowRuntime`, and `WorkflowRunDispatchRuntime` to improve modularity and clarity in workflow execution.
- Updated documentation to reflect the restructuring of workflow components, emphasizing the separation of concerns and the new runtime architecture.
- Renamed and reorganized existing classes and methods to align with the new architecture, enhancing readability and maintainability.
- Documented the transition from previous implementations to the new runtime structure, ensuring clarity for future development and integration efforts.
- Introduced new runtime classes: `WorkflowRunFanOutRuntime`, `WorkflowRunSubWorkflowRuntime`, `WorkflowRunAggregationCompletionRuntime`, and `WorkflowRunProgressionCompletionRuntime` to improve modularity and clarity in workflow execution.
- Updated existing classes and methods to align with the new architecture, enhancing readability and maintainability.
- Refactored the `WorkflowRunGAgent` to integrate the new runtime components, ensuring a streamlined workflow execution process.
- Enhanced documentation to reflect the restructuring of workflow components and the introduction of new runtime functionalities, providing clarity for future development.
…management

- Introduced `WorkflowRunRuntimeContext` to centralize access to shared runtime state and effects across various runtime classes.
- Updated multiple runtime classes (`WorkflowRunAggregationCompletionRuntime`, `WorkflowRunAIRuntime`, `WorkflowRunAsyncPolicyRuntime`, `WorkflowRunCallbackRuntime`, `WorkflowRunControlFlowRuntime`, `WorkflowRunDispatchRuntime`) to utilize the new context for state management, improving clarity and reducing redundancy.
- Enhanced documentation to reflect the restructuring of workflow components and the integration of the new runtime context, ensuring better understanding for future development.
- Consolidated `WorkflowRunGAgent` into a thin owner model, reducing its direct responsibility for runtime fields and improving modularity.
- Introduced new runtime classes: `WorkflowRunAIResponseRuntime`, `WorkflowRunCacheRuntime`, and updated existing classes to streamline workflow execution and enhance clarity.
- Removed deprecated classes such as `WorkflowRunAIRuntime` and `WorkflowRunCallbackRuntime`, ensuring a cleaner architecture.
- Updated documentation to reflect the restructuring of workflow components, emphasizing the new runtime architecture and the integration of shared runtime context.
- Introduced new interfaces: `IWorkflowChildRunCompletionHandler`, `IWorkflowInternalSignalHandler`, `IWorkflowResponseHandler`, and `IWorkflowStatefulCompletionHandler` to improve modularity and clarity in handling workflow events and responses.
- Updated existing runtime classes to utilize the new handler interfaces, enhancing the separation of concerns and reducing direct dependencies on specific implementations.
- Consolidated the `WorkflowAsyncOperationReconciler` and other runtime classes to leverage the new registries for handling stateful completions, internal signals, and responses, streamlining the workflow execution process.
- Enhanced documentation to reflect the restructuring of workflow components and the introduction of new handler functionalities, providing clarity for future development.
- Introduced a comprehensive document outlining the proposed architecture for the capability-oriented refactor of Workflow.Core, addressing structural issues and design goals.
- Defined key objectives such as aggregating code by business capabilities, simplifying the owner model, and establishing a unified routing mechanism.
- Highlighted existing structural problems within the current implementation, including the fragmentation of business capabilities and excessive dependency exposure.
- Emphasized strict constraints and guidelines to ensure adherence to the new architectural principles, promoting modularity and clarity in future development.
- Updated the `WORKFLOW.md` to reflect the new capability-oriented model and removed references to the old `RuntimeSuite` structure.
- Consolidated the responsibilities of `WorkflowGAgent` and `WorkflowRunGAgent`, clarifying their roles in managing workflow definitions and run states.
- Deleted obsolete interfaces and classes, including `IWorkflowChildRunCompletionHandler`, `IWorkflowInternalSignalHandler`, and others, to streamline the codebase and enhance modularity.
- Enhanced documentation to provide a clearer understanding of the current architecture and its components, ensuring alignment with the new design principles.
…nd envelope dispatching

- Introduced `IActorEnvelopeDispatcher` and `IActorStateSnapshotReader` interfaces to define contracts for dispatching envelopes and reading actor state snapshots.
- Implemented `OrleansActorEnvelopeDispatcher` and `OrleansActorStateSnapshotReader` to provide Orleans-specific functionality for the new abstractions.
- Enhanced `GAgentBase` to implement `IAgentStateSnapshotSource`, allowing agents to expose their state snapshots.
- Updated `RuntimeStreamRequestReplyClient` to improve response handling and readiness checks.
- Added new tests for the state snapshot reader and runtime stream client to ensure functionality and reliability.
- Updated dependency injection to register new services and ensure proper integration within the Orleans runtime environment.
@loning loning closed this Mar 8, 2026
eanzhao added a commit that referenced this pull request May 8, 2026
Address review batch on PR #562 (10 inline comments). All in files I have
recent ownership of and require no architectural shifts:

- #16 (blocker, security): ssh_exec is now opt-in via NyxIdToolOptions.
  EnableSshExecTool. Hosts that haven't wired the approval middleware no
  longer see the tool by default. Mainnet host opts in (Lark bot needs it).
- #21 (major, bug): code_execute keeps the modern /execute + {language,
  script} contract, but on a NyxID-proxy upstream 404 it retries the legacy
  /run + {language, code} contract so deployments still pinned to old
  chrono-sandbox-service builds keep working.
- #22 (major, bug): SkillRegistry.IsFresh now exempts SkillSource != Remote
  from TTL — local skills are baked in at registration and don't need
  expiring; prior behavior dropped them from use_skill after the first 5min.
- #18 (major, bug): TurnRunner.TryResolveSenderBindingAsync narrows the
  catch to transient infra errors (Http/Timeout/IO/JSON) and surfaces
  non-transient (logic, NRE, serialization) at Error level so ops can
  distinguish "sender unbound" from "binding store broken".
- #19 (major, bug): ConversationReplyGenerator narrows the
  sender-route-fallback catch to transient errors via
  IsRetryableSenderRouteFailure. Programmer errors no longer cost an LLM
  round on retry.
- #29 + #30 (minor): inbox runtime gives metadata enrichment its own 15s
  budget separate from the LLM run, surfacing
  errorCode=llm_reply_metadata_timeout when scope/UserConfig lookup is
  slow. ResolveFallbackTimeout treats ResponseTimeoutSeconds<=0 as "no
  timeout" rather than silently snapping back to 120s.
- #12 (minor): ConversationGAgent's stream-chunk and final-stream-chunk
  edits run under a 10s CTS now; the failure path already uses one. A hung
  relay can no longer pin the actor turn forever.
- #27 (minor, security): ConstantTimeEquals docstring tightened — removed
  the "for future callers" line and added a SCOPE comment that this helper
  is rebuild-admin-only and shouldn't be promoted to internal/public
  without replacing its length-leak with a length-padding scheme.
- #23 (major, bug): CLI ornn skills slug default → ornn-api (matches the
  registered slug; bare "ornn" is the SPA frontend that returns HTML).

Build clean (NyxId / Skills / NyxidChat / Mainnet hosts), 30 AI tests +
15 inbox runtime tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant