Skip to content

nvrc: supervise kata-agent in a child PID namespace#157

Closed
fidencio wants to merge 4 commits into
NVIDIA:mainfrom
fidencio:topic/nvrc-supervise-agent
Closed

nvrc: supervise kata-agent in a child PID namespace#157
fidencio wants to merge 4 commits into
NVIDIA:mainfrom
fidencio:topic/nvrc-supervise-agent

Conversation

@fidencio
Copy link
Copy Markdown
Collaborator

NVRC currently forks and exec()s kata-agent in the parent, so kata-agent inherits PID 1 and becomes the guest init. When kata-agent processes destroy_sandbox it then calls reboot(RB_POWER_OFF) from inside the guest, which races the host shim: qemu halts before the shim has finished its post-StopVM cleanup (stopping monitor / Cleanup agent / TaskExit), the shim catches SIGTERM from systemd ending the per-container scope, and /run/vc/sbs/ is left behind. The follow-up cleanup-shim then dials a dead vsock and surfaces a fatal "ttrpc: closed" to the runtime.

Hand the actual VM power-off to NVRC without changing kata-agent:

  • unshare(CLONE_NEWPID) before forking so the child enters a fresh PID namespace, where it is pid 1. kata-agent in that namespace still has init_mode = true, so its init_agent_as_init setup (cgroups mount, /dev/ptmx symlink, setsid, sethostname) is performed exactly as today.
  • In a non-initial PID namespace the kernel reinterprets reboot(RB_POWER_OFF) as SIGINT to the namespace's init process, kata-agent itself, so it terminates instead of halting the VM.
  • NVRC remains pid 1 in the initial namespace, polls waitpid in a 500ms loop, opportunistically drains /dev/log via syslog::try_poll, replacing the previous syslog_loop child, and after kata-agent exits issues the real reboot(RB_POWER_OFF) from the initial namespace, where it actually halts the guest.

The handover is purely kernel-mediated: kata-agent code is unchanged and still believes it owns shutdown.

NVRC currently forks and exec()s kata-agent in the parent, so kata-agent
inherits PID 1 and becomes the guest init. When kata-agent processes
destroy_sandbox it then calls reboot(RB_POWER_OFF) from inside the guest,
which races the host shim: qemu halts before the shim has finished its
post-StopVM cleanup (stopping monitor / Cleanup agent / TaskExit), the
shim catches SIGTERM from systemd ending the per-container scope, and
/run/vc/sbs/<id> is left behind. The follow-up cleanup-shim then dials
a dead vsock and surfaces a fatal "ttrpc: closed" to the runtime.

Hand the actual VM power-off to NVRC without changing kata-agent:

* unshare(CLONE_NEWPID) before forking so the child enters a fresh PID
  namespace, where it is pid 1. kata-agent in that namespace still has
  init_mode = true, so its init_agent_as_init setup (cgroups mount,
  /dev/ptmx symlink, setsid, sethostname) is performed exactly as today.
* In a non-initial PID namespace the kernel reinterprets
  reboot(RB_POWER_OFF) as SIGINT to the namespace's init process,
  kata-agent itself, so it terminates instead of halting the VM.
* NVRC remains pid 1 in the initial namespace, polls waitpid in a 500ms
  loop, opportunistically drains /dev/log via syslog::try_poll, replacing
  the previous syslog_loop child, and after kata-agent exits issues the
  real reboot(RB_POWER_OFF) from the initial namespace, where it actually
  halts the guest.

The handover is purely kernel-mediated: kata-agent code is unchanged and
still believes it owns shutdown.

Signed-off-by: Fabiano Fidencio <ffidencio@nvidia.com>
@fidencio fidencio added the ok-to-test Ok to test label May 25, 2026
fidencio added 3 commits May 25, 2026 20:20
Emit info/warn lifecycle logs and NSpid snapshots around unshare/fork/wait so CI can conclusively confirm namespace handoff and child-exit behavior.
Delay the final VM power-off briefly after kata-agent exits so host-side
shim/ttrpc teardown can complete without racing into ttrpc: closed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
@fidencio
Copy link
Copy Markdown
Collaborator Author

Just adds extra complication for something that must be solved on Kata Containers side, thus closing it and focusing on fixing it properly on Kata.

@fidencio fidencio closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Ok to test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant