Skip to content

fix(vm): restore sandboxes after gateway restart#1407

Merged
drew merged 3 commits into
mainfrom
vm-restart-durabaility
May 15, 2026
Merged

fix(vm): restore sandboxes after gateway restart#1407
drew merged 3 commits into
mainfrom
vm-restart-durabaility

Conversation

@drew
Copy link
Copy Markdown
Collaborator

@drew drew commented May 15, 2026

Summary

Restore VM sandboxes when a standalone gateway restarts by persisting driver launch metadata and preserving the writable overlay disk.

Related Issue

N/A

Changes

  • Persist VM driver sandbox requests to sandbox.pb with owner-only permissions and scan them on driver startup
  • Recreate VM launcher processes during driver startup while preserving existing overlay.ext4 state
  • Add dedicated VM gateway-resume e2e coverage in vm_gateway_resume, gated by the new e2e-vm feature
  • Keep Docker gateway resume coverage on gateway_resume under e2e-docker
  • Configure the VM e2e wrapper to use e2e-vm by default and document restart behavior

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated
  • cargo test -p openshell-driver-vm
  • cargo test --manifest-path e2e/rust/Cargo.toml --features e2e-docker --test gateway_resume -- --nocapture
  • cargo test --manifest-path e2e/rust/Cargo.toml --features e2e-vm --test vm_gateway_resume -- --nocapture
  • cargo test --manifest-path e2e/rust/Cargo.toml --features e2e-vm --test smoke --no-run
  • bash -n e2e/rust/e2e-vm.sh
  • Live VM e2e with OPENSHELL_E2E_VM_TEST=vm_gateway_resume e2e/rust/e2e-vm.sh not run locally because target/vm-runtime-compressed is absent

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@drew drew requested review from a team, derekwaynecarr, maxamillion and mrunalp as code owners May 15, 2026 17:48
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@drew drew changed the base branch from main to fix/vm-sandbox-workdir-image May 15, 2026 17:55
@drew drew force-pushed the fix/vm-sandbox-workdir-image branch 2 times, most recently from 9de17f9 to 7d0c92d Compare May 15, 2026 20:12
Base automatically changed from fix/vm-sandbox-workdir-image to main May 15, 2026 20:23
drew added 2 commits May 15, 2026 13:31
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
@drew drew force-pushed the vm-restart-durabaility branch from dee2733 to a7c22c4 Compare May 15, 2026 20:32
@github-actions
Copy link
Copy Markdown

@drew drew merged commit f819f7d into main May 15, 2026
27 checks passed
@drew drew deleted the vm-restart-durabaility branch May 15, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants