Skip to content

refactor(onboard): add FSM runner shell#4453

Draft
cv wants to merge 1 commit into
stack/onboard-fsm-provider-inference-resultfrom
stack/onboard-fsm-runner
Draft

refactor(onboard): add FSM runner shell#4453
cv wants to merge 1 commit into
stack/onboard-fsm-provider-inference-resultfrom
stack/onboard-fsm-runner

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 28, 2026

Summary

Add a reusable onboarding FSM runner shell that dispatches handlers until the machine reaches a terminal state. The runner applies handler results through OnboardRuntime, supporting advance, retry, branch, complete, and failure paths without wiring it into live onboarding yet.

Changes

  • Add src/lib/onboard/machine/runner.ts with handler dispatch, context update, terminal-state, and missing-handler support.
  • Add runner tests covering a full path with retry and branch transitions.
  • Add tests for terminal failure and missing non-terminal handlers.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this May 28, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0cb8ca32-9a96-4dc6-8b1e-a6e5424d93e2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stack/onboard-fsm-runner

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 2 nice ideas
Top item: Consider documenting or bounding retry-loop responsibility

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • None.

🌱 Nice ideas

  • Consider documenting or bounding retry-loop responsibility (src/lib/onboard/machine/runner.ts:57): The runner loops until the machine reaches a terminal state, and the transition graph intentionally supports retry cycles. A buggy handler can therefore keep returning retry/advance transitions indefinitely and hang onboarding. This is not a security issue in this shell because result application remains delegated to OnboardRuntime, but it is an availability/caller-contract edge case.
    • Recommendation: Either document that handlers must bound retries, or add an optional max-transition/abort guard before wiring this runner into live onboarding.
    • Evidence: runOnboardMachine uses `while (!isTerminalOnboardMachineState(session.machine.state))`; existing transitions include an inference/provider_selection retry cycle; current tests cover a finite retry but not an unbounded handler.
  • Add contract tests for terminal starts and runtime error propagation (src/lib/onboard/machine/runner.test.ts:73): The tests cover the main happy path, failed result, and missing handler behavior. Additional small tests would lock down important runner contracts before live onboarding depends on it: immediate return for already-terminal sessions, propagation of invalid transition errors from OnboardRuntime, and no updateContext call after a runtime failure.
    • Recommendation: Add focused unit tests for already-complete/already-failed initial sessions and for invalid transition errors propagating from the runtime without invoking updateContext.
    • Evidence: Current test cases exercise retry/branch completion, failed terminal result, and missing non-terminal handler, but do not cover terminal initial sessions or runtime validation failures.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: cloud-onboard-e2e, onboard-resume-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/stack/onboard-fsm-provider-inference-result
Head: HEAD
Confidence: high

Required E2E

  • None. No merge-blocking E2E is required for this PR as shown: the production runtime path does not import the new runner yet, and the behavioral changes are covered by the newly added unit tests. Run optional onboarding E2Es only for broader adjacent confidence.

Optional E2E

  • cloud-onboard-e2e (high): Optional adjacent confidence for the current cloud OpenClaw onboarding happy path, including installer, sandbox health, policy presets, credential handling, and inference.local. Not merge-blocking because the new machine runner is not currently wired into production onboarding.
  • onboard-resume-e2e (high): Optional adjacent confidence for interrupted onboarding and resume/session behavior, which is conceptually close to a state-machine runner. Not merge-blocking because this PR only adds an isolated runner module and unit tests.

New E2E recommendations

  • onboarding-state-machine (high): Existing E2E jobs validate the currently wired onboarding scripts, but they do not directly cover this new FSM runner while it remains unimported. When the runner is connected to the onboard CLI, add E2E coverage for retry/branch transitions across provider_selection, inference, sandbox, policies, and post_verify, including terminal failure handling and session state persistence.
    • Suggested test: Add an onboarding FSM scenario or script E2E that uses failure injection to force provider/inference retry, branch to OpenClaw setup, resume from an interrupted non-terminal state, and verify final machine/session state.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/stack/onboard-fsm-provider-inference-result
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@wscurran wscurran added enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. refactor This is a refactor of the code and/or architecture. labels May 28, 2026
@cv cv added the onboarding Making the onboarding experience better label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. onboarding Making the onboarding experience better refactor This is a refactor of the code and/or architecture.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants