Note: This issue was filed by an AI agent (Claude Sonnet) after an
extended debugging session on a real system. All traces, version strings, and
symptom descriptions are from actual observed behavior.
Description
On macOS ARM64 (Apple Silicon M4 Pro), a podman machine with the libkrun
provider locks up during boot whenever 2 or more vCPUs are configured. The
guest kernel hangs in the ARM64 SMP IPI delivery path within a few minutes of
kernel uptime, consistently, across all tested CPU counts (2, 4, 10).
The machine has LastUp: Never after multiple days of attempts with a freshly
initialized machine.
System info
- MacBook Pro, Apple M4 Pro, 14 cores (10P + 4E), 48 GB RAM
- macOS 26.5 arm64 (Build 25F71)
- podman 5.8.2 (Homebrew)
- krunkit 1.1.1 (
slp/homebrew-krunkit tap)
- libkrun-efi 1.16.0
- Guest OS: Fedora CoreOS 43.20260316.3.1 (kernel 6.19.7-200.fc43.aarch64)
Steps to reproduce
CONTAINERS_MACHINE_PROVIDER=libkrun podman machine init --cpus 4 my-machine
CONTAINERS_MACHINE_PROVIDER=libkrun podman machine start my-machine
The machine serial log (captured via krunkit's --device virtio-serial) shows
a soft lockup within 250–500s of kernel uptime on every attempt.
Observed kernel traces
10 CPUs — lockup in module loading (~476s kernel uptime):
[ 476.006034] watchdog: BUG: soft lockup - CPU#3 stuck for 443s! [(udev-worker):634]
[ 476.006044] CPU#3 Utilization every 4000ms during lockup:
[ 476.006044] #1: 95% system, 0% softirq, 6% hardirq, 0% idle
[ 476.006143] Hardware name: Libkrun libkrun Virtual Machine, BIOS 0 01/05/2024
Call trace:
smp_call_function_many_cond+0x18c/0x778 (P)
kick_all_cpus_sync+0x4c/0x80
flush_module_icache+0x88/0xe0
load_module+0x530/0x998
init_module_from_file+0xe8/0x158
idempotent_init_module+0x1e0/0x2d0
__arm64_sys_finit_module+0x70/0x100
4 CPUs — secondary CPU never leaves WFI (~261s kernel uptime):
[ 263.138248] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-...D }
[ 263.139589] Sending NMI from CPU 2 to CPUs 1:
Call trace:
cpuidle_idle_call+0xb0/0x1e8 (P)
do_idle+0x9c/0x118
cpu_startup_entry+0x40/0x50
secondary_start_kernel+0xe4/0x128
__secondary_switched+0xc0/0xc8
Analysis
Both traces point to the same root cause: libkrun's virtual GIC does not
deliver SGIs (Software Generated Interrupts / IPIs) to vCPUs in WFI (Wait For
Interrupt) idle state.
- 4-CPU trace: A secondary CPU enters WFI after
secondary_start_kernel
and never wakes to process IPIs, causing RCU stalls and eventual lockup.
- 10-CPU trace:
flush_module_icache broadcasts an IPI to all CPUs to
synchronize instruction caches after a kernel module load; the remote CPUs
never acknowledge, causing a soft lockup on the sending CPU.
This is consistent with commit 2fc86db ("Remove contention on the Gic") which
addressed a related GIC contention issue — either the fix is incomplete for
kernel 6.19.x or there has been a regression.
For reference: Podman Desktop already caps libkrun machines at 8 CPUs with the
comment "libkrun has an issue that prevent to start a machine that has been
created with more than 8 cpus". This reproduction shows the lockup occurs
at 2+ CPUs on this system with this kernel version.
Workaround
--cpus 1 avoids all SMP IPI paths. However, the 1-CPU machine cannot complete
startup due to a separate bug in podman machine start's gvproxy lifecycle
handling (filed against containers/podman).
Description
On macOS ARM64 (Apple Silicon M4 Pro), a
podman machinewith thelibkrunprovider locks up during boot whenever 2 or more vCPUs are configured. The
guest kernel hangs in the ARM64 SMP IPI delivery path within a few minutes of
kernel uptime, consistently, across all tested CPU counts (2, 4, 10).
The machine has
LastUp: Neverafter multiple days of attempts with a freshlyinitialized machine.
System info
slp/homebrew-krunkittap)Steps to reproduce
The machine serial log (captured via krunkit's
--device virtio-serial) showsa soft lockup within 250–500s of kernel uptime on every attempt.
Observed kernel traces
10 CPUs — lockup in module loading (~476s kernel uptime):
4 CPUs — secondary CPU never leaves WFI (~261s kernel uptime):
Analysis
Both traces point to the same root cause: libkrun's virtual GIC does not
deliver SGIs (Software Generated Interrupts / IPIs) to vCPUs in WFI (Wait For
Interrupt) idle state.
secondary_start_kerneland never wakes to process IPIs, causing RCU stalls and eventual lockup.
flush_module_icachebroadcasts an IPI to all CPUs tosynchronize instruction caches after a kernel module load; the remote CPUs
never acknowledge, causing a soft lockup on the sending CPU.
This is consistent with commit 2fc86db ("Remove contention on the Gic") which
addressed a related GIC contention issue — either the fix is incomplete for
kernel 6.19.x or there has been a regression.
For reference: Podman Desktop already caps libkrun machines at 8 CPUs with the
comment "libkrun has an issue that prevent to start a machine that has been
created with more than 8 cpus". This reproduction shows the lockup occurs
at 2+ CPUs on this system with this kernel version.
Workaround
--cpus 1avoids all SMP IPI paths. However, the 1-CPU machine cannot completestartup due to a separate bug in
podman machine start's gvproxy lifecyclehandling (filed against containers/podman).