Windows guest PnP rebalance deadlocks boot when reprogrammed BAR overlaps another live device

**Describe the bug**

Windows 11 25H2 guests boot intermittently (~50% in our tests) deadlock when the Windows PnP arbiter rebalances PCI resources at driver-load time and the new BAR address it picks for one virtio device collides with another live virtio device's BAR. CH's `move_bar` correctly rejects the conflicting allocation, but the guest kernel keeps re-issuing the same write — and after a handful of retries it leaves its internal PCI state out of sync with what CH actually has mapped, wedging early boot before `cocoon-agent` (or any user-mode service) can start.

This is distinct from #7938 (driver bug → BAR programmed outside allocator range): here the BAR write is legitimate, addresses a valid MMIO region, and is exactly what `pci.sys` is *documented* to do.

**Root cause: Windows PnP "resource rebalance"**

`pci.sys` implements the PnP Manager's resource rebalance protocol. When a new FDO loads, the arbiter is allowed to recompute and rewrite the BARs of peer devices on the same bus. Microsoft documents this explicitly:

> If a user adds a device to a system, and if the device requires system resources that the PnP manager has already assigned to another device, the PnP manager attempts to reassign resources. … It then delivers new resource lists to the devices so that they can restart, using the new resources.
> — [The PnP Manager Redistributes System Resources](https://learn.microsoft.com/en-us/windows-hardware/drivers/wdf/the-pnp-manager-redistributes-system-resources)

The full sequence is `IRP_MN_QUERY_STOP_DEVICE` → `IRP_MN_STOP_DEVICE` → arbiter recomputes → `IRP_MN_START_DEVICE` with a new resource list ([Stopping a Device to Rebalance Resources](https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/stopping-a-device-to-rebalance-resources)).

Confirming this isn't virtio-win's doing: `VirtIO/WDF/PCI.c` only *consumes* the resource list WDF passes it (`PCIAllocBars`, `MmMapIoSpace`) — there is no rewrite or veto path in any virtio Windows driver. The rewrite originates above virtio-win, in `pci.sys`'s arbiter.

Linux guests never trigger this — Linux only re-assigns BARs in `pci_assign_unassigned_resources` (hotplug, rescan), not on every driver bind.

**Why CH wedges where QEMU doesn't**

QEMU's `pci_update_mappings` simply unmaps the old `MemoryRegion` and remaps at the new address on every BAR config write — no allocator check, no failure path. Two BARs targeting the same address would stomp each other on real hardware, and QEMU follows that.

CH instead routes the new address through `mmio64_allocator` via `move_bar`. If the new range is occupied, the allocator returns `Overlap` and the move fails. Even with #7938's rollback fix in place — which correctly restores `bars[i].addr` and the config register — the guest's view diverges:

1. Guest writes new BAR (`B`) via `IRP_MN_START_DEVICE`'s resource list.
2. CH detects reprogramming, `move_bar(A → B)` fails, `restore_bar_addr` writes config back to `A`.
3. Guest reads back `A`, doesn't reconcile with the resource list it just installed, re-issues the write of `B`. Loop.
4. After ~6 retries (`pci.sys` gives up), the guest kernel thinks the BAR is at `B` (cached in its DEVICE_OBJECT), CH has it mapped at `A`. Subsequent virtio queue setup writes target `B`, which actually maps to *another live virtio device's BAR* (the one we refused to evict for the rebalance). The other device's worker thread receives garbage notifications; the originating driver waits forever for ring response; boot deadlocks at ~65 s VM uptime.

**Reproducer**

Layout that triggers reliably with Win11 25H2 ~50% of the time:

- 5 virtio devices on bus 0 with 512 KiB BARs packed at the top of MMIO64 (the CH default with --disk + --net + --rng + --vsock + --watchdog): slots 1–5 at `0x3fffffe00000`, `0x3fffffd80000`, `0x3fffffd00000`, `0x3fffffc80000`, `0x3fffffc00000`.
- Win11 25H2 with virtio-win 0.1.285.
- After cold boot, around 65 s into VM uptime, on the worst case the arbiter writes vsock's BAR (`0x3fffffd00000`) → `0x3fffffd80000` (rng's slot). `move_bar` rejects.

Observed:
- ~50% of boots: arbiter happens to pick non-conflicting targets, boot proceeds.
- ~50% of boots: target collides → 6 retries in ~350 ms → guest wedged.
- All 5 virtio devices on the bus, not just NICs — even 0-NIC configs deadlock.

**Error log**

```
cloud-hypervisor: 65.940720s: <vcpu0> WARN: pci/src/bus.rs:471 -- Failed moving device BAR: failed allocating new MMIO range: 0x3fffffd00000->0x3fffffd80000(0x80000), keeping old BAR
cloud-hypervisor: 66.007469s: <vcpu1> WARN: ...same...
cloud-hypervisor: 66.076797s: <vcpu1> WARN: ...same...
cloud-hypervisor: 66.143539s: <vcpu0> WARN: ...same...
cloud-hypervisor: 66.212357s: <vcpu0> WARN: ...same...
cloud-hypervisor: 66.285027s: <vcpu1> WARN: ...same...
(no further log; CH consumes 90+ % CPU indefinitely, guest never reaches login)
```

**Version**

```
cloud-hypervisor v51.0.0 + #7938 fix (PR #7950)
```

**VM configuration**

```
cloud-hypervisor \
  --firmware CLOUDHV.fd \
  --disk path=windows.qcow2,image_type=qcow2,backing_files=on \
  --net tap=tap0,mac=...,num_queues=4 \
  --rng src=/dev/urandom \
  --vsock cid=3,socket=/path/to/vsock.uds \
  --watchdog \
  --cpus boot=2,kvm_hyperv=on \
  --memory size=4G
```

Guest: Windows 11 25H2 (build 26100), virtio-win 0.1.285
Host: Linux 6.17.0-1009-gcp, KVM, 46-bit phys addressing

**Related issues**

- #326 — original "BAR reprogramming" baseline; PR #385 added `detect_bar_reprogramming` based on the PCI spec note that the OS may program BARs at addresses different from the initial assignment.
- #7938 — same code path, different trigger (driver bug → unallocatable address). Fixed in #7950 with config-register rollback. That rollback is correct but not sufficient for the rebalance case, because the rebalance is legitimate and the guest *will* keep retrying.
- #7010 — confirms CH's allocator policy is stricter than QEMU.

**Proposed fixes**

Three options, increasing in invasiveness:

1. **Sparse initial layout (mitigation, low risk).** Increase the allocator alignment for the virtio capability BAR from `CAPABILITY_BAR_SIZE` (512 KiB) to e.g. 8 MiB. Initial devices then land 8 MiB apart in MMIO64, leaving ample slack for any address the arbiter might compute. On a 46-bit-phys host the alignment-induced waste is negligible (~64 TiB available). We have this on a defensive branch and Win11 boots to login cleanly with it.

2. **Coordinated swap on conflicting move (proper fix).** When `move_bar(A → B)` finds `B` occupied by device `D2`, allocate a fresh address `C` for `D2` first, move `D2`'s mapping to `C`, then perform `D2`'s `BAR.addr = C` write into its config register too, finally complete the original `A → B` move. The guest's subsequent reads of `D2`'s BAR will see `C` and reconcile. Mirrors what real PCI bus rebalance does on hardware.

3. **QEMU-style accept-and-stomp.** Make `mmio_bus.insert` an upsert and let two BARs temporarily overlap. Simplest to implement but loses CH's allocator invariants and risks silent data corruption if a device worker hits the overlapped range before the guest finishes its rebalance.

**Defensive branch**

Mitigation #1 is on a dedicated branch off `cloud-hypervisor/main`, ready as a PR if useful. We suspect the proper fix is #2, but #1 is what we ship on our dev fork today since it deterministically avoids the deadlock in our reproducer:

- Branch: https://github.com/cocoonstack/cloud-hypervisor/tree/fix/win-bar-rebalance
- `virtio: 8 MiB-aligned initial BAR placement` — sparse initial layout
- `virtio: relax BAR alignment on restore` — snapshot/restore needs the natural BAR alignment, since the guest may have rewritten BARs to non-aligned addresses post-rebalance

**Logs**

The 6-retry pattern in the error log above is deterministic when the rebalance hits a conflict. The deadlock is observable as: (a) CH process at 90+ % CPU continuously, (b) `vm.info` shows `pci_devices_down == 0` and all 5 BARs at their original addresses, (c) no further log entries after the 6th retry, (d) no progress on vsock or guest agent indefinitely.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows guest PnP rebalance deadlocks boot when reprogrammed BAR overlaps another live device #8202

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Windows guest PnP rebalance deadlocks boot when reprogrammed BAR overlaps another live device #8202

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions