Skip to content

refactor(onboard): support FSM runner stop states#4480

Draft
cv wants to merge 1 commit into
stack/onboard-fsm-sequence-adapterfrom
stack/onboard-fsm-runner-stop-states
Draft

refactor(onboard): support FSM runner stop states#4480
cv wants to merge 1 commit into
stack/onboard-fsm-sequence-adapterfrom
stack/onboard-fsm-runner-stop-states

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 29, 2026

Summary

Add stop-state support to the onboarding FSM runner so partial live-flow migrations can stop at a handoff state without requiring that state's handler. This enables incremental runner adoption by slices.

Changes

  • Add stopStates to runOnboardMachine() options.
  • Stop the runner loop before configured non-terminal states.
  • Add runner coverage proving a prefix flow can stop at provider_selection without a provider handler.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this May 29, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 29, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fb8a52ea-1722-4472-93b7-39a22eca55e7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch stack/onboard-fsm-runner-stop-states

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e
Optional E2E: onboard-resume-e2e, onboard-negative-paths-e2e

Dispatch hint: cloud-onboard-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/stack/onboard-fsm-sequence-adapter
Head: HEAD
Confidence: high

Required E2E

  • cloud-onboard-e2e (high): Runs the real cloud onboarding path through install, sandbox creation, policy selection, and health/security checks. This is the most targeted existing E2E for ensuring the default onboarding FSM runner behavior still reaches completion after the core loop change.

Optional E2E

  • onboard-resume-e2e (medium): Useful adjacent coverage for interrupted onboarding and resume semantics, which depend on persisted onboarding machine/session state and correct continuation through later non-terminal states.
  • onboard-negative-paths-e2e (medium): Provides additional confidence that non-interactive onboarding validation and friendly failure paths still behave correctly around the onboarding state machine, but the changed stopStates option is not directly exercised by this job.

New E2E recommendations

  • onboarding-fsm-runner-stop-states (medium): No existing user-facing E2E appears to exercise the new stopStates option directly. If this option becomes part of a CLI/sequence-adapter flow, add an E2E that intentionally stops at a configured non-terminal onboarding state, verifies persisted machine state, then resumes/continues to completion without requiring a handler for that stop state.
    • Suggested test: Add an onboarding sequence-adapter E2E covering stopStates pause at provider_selection and subsequent completion/resume.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: cloud-onboard-e2e

@github-actions
Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/stack/onboard-fsm-sequence-adapter
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Top item: Check stop states between multi-result transitions

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Onboard FSM runner `stopStates` compatibility handoff: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `stopStates` is a localized compatibility/control-flow option added to the generic runner. The positive test covers one handoff, but no test or inline contract explains whether stop states must be honored between results returned by the same handler.
  • Stop states are only checked between handlers, not between multi-result transitions (src/lib/onboard/machine/runner.ts:62): The runner now stops when the current session state is in `stopStates`, but that check only occurs at the top of the outer loop. Existing runner behavior explicitly allows a handler to return a list of results. If one result transitions into a configured stop state and a later result in the same list transitions onward, the runner will pass through the handoff state before observing the stop condition. That makes the new stop-state contract ambiguous for the multi-result path already covered by this runner.
    • Recommendation: Clarify the intended contract and, if stop states mean the runner must not advance past a handoff state, check `stopStates.includes(session.machine.state)` after each `runtime.applyResult(result)` and break/return immediately. Add a regression test where a handler returns multiple results that enter a stop state before a later result.
    • Evidence: `while (!isTerminalOnboardMachineState(session.machine.state) && !stopStates.includes(session.machine.state))` gates only the outer loop, while `for (const result of results) { session = await runtime.applyResult(result); ... }` applies every result in the list without rechecking `stopStates`. The new test only covers single-result transitions into `provider_selection`.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@cv cv added the onboarding Making the onboarding experience better label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

onboarding Making the onboarding experience better

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant