-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to boot VM on ARM64 mounting filesystem through virtiofsd #4314
Comments
Hi @DennisHung -, Good catch. I'm trying to debug this issue and will give some update later. |
@jongwu let me know if you have any findings or additional experiment you want me to try. |
Hello @DennisHung -, I raise a PR to fix this issue, see this and have a try. |
The correct fix might be to make the Cloud Hypervisor side work with fewer queues. I think we did this before. See a105089 |
Set the default number of queues to 1. Which matches with the spec and also aligns with our testing. Fixes: cloud-hypervisor#4314 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Set the default number of queues to 1. Which matches with the spec and also aligns with our testing. Fixes: cloud-hypervisor#4314 Signed-off-by: Rob Bradford <robert.bradford@intel.com>
@rbradford based on the spec, there are at least 2 queues, the |
Yes, I can see the 1.3 spec was reworded vs the 1.2 spec which makes it obvious that least two queues are needed. |
@rbradford @sboeuf @jongwu , can you point me which fix should I try first? |
I try @rbradford's change but the firmware boot stuck at virtiofs probing. @DennisHung you can check it either. |
As a workaround for @DennisHung, it's totally acceptable to build the FW without the virtio-fs driver. But long term we need to understand where the issue comes from and fix it in the right repo. |
@rbradford after doing a bit of research, I think the problem comes from the fact the FW only enables one queue, the request one on queue index |
I think setting |
Because the backend expect queue index 0 to be |
@rbradford I think the current design we have in Cloud Hypervisor only covers the case where we can ask for multi queues but only get a subset being enabled from the guest. And that's because we don't expect some queues to be disabled in the middle of enabled queues. |
Some more questions:
|
I'm wondering exactly the same thing! I have no idea if the backend is meant to work without the |
(Architecturally I think preserving the queue index in a vector of tuples of |
Yes I agree I like this approach better since we know the list of queues is only about enabled queues. About the tuple, you're thinking about passing it everywhere? Or would you rather use a wrapping structure/type? |
A tuple is cleaner as you can continue to nicely iterate over the vector. |
Just a note that I've been investigating this issue all day and there are multiple aspects that makes it not work. The way the FW works, it polls to check if a descriptor has been used but unfortunately if we don't provide an eventfd to |
@sboeuf @rbradford @jongwu I'm able to reproduce same outcomes as @jongwu tried. The only working workaround is disable virtiofs driver in edk2 firmware. The workaround is good to unblock my application hence I can use the workaround now then migrate correct solution later. |
I've been able to boot with the EDK2 firmware (with the VirtioFs driver) by fixing things both in Cloud Hypervisor and |
@sboeuf @jongwu , I saw another new issue after pull workaround from @jongwu . Pasting the steps and outcome from experiment in the followings and looking for suggestions. Thanks: Experiment Steps:
sudo ./virtiofsd --socket-path /tmp/virtiofsd.socket --shared-dir /tmp/test1
9. VM stuck <-- This is an issue since I have to mount multiple volumes.
|
Hi @DennisHung -, I have not reproduced this new issue. I think it has no relationship with firmware as the kernel has controlled the whole thing. Can you test it for direct kernel boot use the same linux kernel as firmware boot? |
@DennisHung I can't reproduce it either (I'm using the setup I've described), and here is the output I get: $ df -aTh -t virtiofs
Filesystem Type Size Used Avail Use% Mounted on
myfs virtiofs 468G 296G 149G 67% /mnt |
5.4.0 kernel is pretty old. It's possible that the virtio-fs support is not as mature as the current kernels. |
@rbradford @sboeuf @jongwu I found issue is gone after update virtiofsd driver. Let's focus on the original issue now. What's the plan for release the solution? |
@DennisHung -, if you mean "removing the virtio-fs driver" in edk2 for aarch64/CloudHv, I will send the change to upstream later. |
Here is the status:
diff --git a/Cargo.toml b/Cargo.toml
index c2a72f4..f049f46 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -20,13 +20,14 @@ libc = "~0.2.120"
log = "0.4"
libseccomp-sys = "0.2"
structopt = "0.3"
-vhost-user-backend = "0.5"
+vhost-user-backend = { git = "https://github.com/rust-vmm/vhost-user-backend" }
vhost = "0.4"
virtio-bindings = { version = "0.1", features = ["virtio-v5_0_0"] }
vm-memory = { version = ">=0.7", features = ["backend-mmap", "backend-atomic"] }
virtio-queue = "0.4"
-vmm-sys-util = "0.9"
+vmm-sys-util = "0.10"
syslog = "6.0"
[profile.release]
lto = true
and running
|
OK, thanks @sboeuf |
@sboeuf is there a virtiofsd branch that has pulled in this change? If so, it would make it much easier to divert our CI to another commit/branch than a new private fork. I have forked a private repo to try out this change with the private build of virtiofsd & cloud hypervisor and it works. I am not seeing the queues issue anymore. |
Not yet. We need a release to be cut on |
Just a heads up that |
https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/129 has been merged, therefore the main branch of |
Thank you @sboeuf, @rbradford, @jongwu, your efforts are much appreciated! |
Hi – I am unable to boot a VM with Cloud-Hypervisor with the filesystem mounted by virtiofsd on an ARM64 device using EDK2 following the instructions at:
cloud-hypervisor/arm64.md at main · cloud-hypervisor/cloud-hypervisor (github.com)
Note: The VM boots fine without trying to mount the filesystem i.e. without specifying the –-fs option to cloud-hypervisor. I also tried with different VM images (ubuntu, debian, archlinux) and got the same error when using the -–fs option.
Versions:
Cloud-hypervisor: v25.0
Virtiofsd: v1.3.0
Linux Kernel: 5.17
I built cloud-hypervisor, virtiofsd and EDK2 (CLOUDHV_EFI.fd) on the ARM64 device, and am trying to boot a VM image (focal-server-cloudimg-arm64.raw) using the following steps:
cloud-hypervisor, virtiofsd, CLOUDHV_EFI.fd, focal-server-cloudimg-arm64.raw
sudo ./virtiofsd --socket-path /tmp/virtiofsd.socket --shared-dir /tmp/test1
Output:
[2022-07-14T18:22:07Z INFO virtiofsd] Waiting for vhost-user socket connection...
Below after step 3 is executed:
[2022-07-14T18:22:21Z INFO virtiofsd] Client connected, servicing requests
[2022-07-14T18:22:25Z INFO virtiofsd] Client disconnected, shutting down
sudo RUST_BACKTRACE=1 ./cloud-hypervisor -v
--api-socket /tmp/cloud-hypervisor.socket
--kernel ./CLOUDHV_EFI.fd
--disk path=./focal-server-cloudimg-arm64.raw
--cpus boot=1
--fs tag=ip_tag,socket=/tmp/virtiofsd.socket
--memory size=1024M,shared=true
--serial tty --console off
Below is the output and error that I see (bolded below):
cloud-hypervisor: 11.720533ms: INFO:vmm/src/lib.rs:1611 -- API request event: VmCreate(Mutex { data: VmConfig { cpus: CpusConfig { boot_vcpus: 1, max_vcpus: 1, topology: None, kvm_hyperv: false, max_phys_bits: 46, affinity: None, features: CpuFeatures }, memory: MemoryConfig { size: 1073741824, mergeable: false, hotplug_method: Acpi, hotplug_size: None, hotplugged_size: None, shared: true, hugepages: false, hugepage_size: None, prefault: false, zones: None }, kernel: Some(KernelConfig { path: "./CLOUDHV_EFI.fd" }), initramfs: None, cmdline: CmdlineConfig { args: "" }, disks: Some([DiskConfig { path: Some("./focal-server-cloudimg-arm64.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: None, disable_io_uring: false, pci_segment: 0 }]), net: None, rng: RngConfig { src: "/dev/urandom", iommu: false }, balloon: None, fs: Some([FsConfig { tag: "ip_file_path", socket: "/tmp/virtiofsd.socket", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }]), pmem: None, serial: ConsoleConfig { file: None, mode: Null, iommu: false }, console: ConsoleConfig { file: None, mode: Tty, iommu: false }, devices: None, user_devices: None, vdpa: None, vsock: None, iommu: false, numa: None, watchdog: false, platform: None }, poisoned: false, .. }, Sender { .. })
cloud-hypervisor: 15.084085ms: INFO:vmm/src/lib.rs:1611 -- API request event: VmBoot(Sender { .. })
cloud-hypervisor: 17.236596ms: INFO:vmm/src/memory_manager.rs:1487 -- Creating userspace mapping: 40000000 -> ffff60000000 40000000, slot 0
cloud-hypervisor: 18.856473ms: INFO:vmm/src/memory_manager.rs:1521 -- Created userspace mapping: 40000000 -> ffff60000000 40000000
cloud-hypervisor: 18.987041ms: INFO:vmm/src/vm.rs:524 -- Booting VM from config: Mutex { data: VmConfig { cpus: CpusConfig { boot_vcpus: 1, max_vcpus: 1, topology: None, kvm_hyperv: false, max_phys_bits: 46, affinity: None, features: CpuFeatures }, memory: MemoryConfig { size: 1073741824, mergeable: false, hotplug_method: Acpi, hotplug_size: None, hotplugged_size: None, shared: true, hugepages: false, hugepage_size: None, prefault: false, zones: None }, kernel: Some(KernelConfig { path: "./CLOUDHV_EFI.fd" }), initramfs: None, cmdline: CmdlineConfig { args: "" }, disks: Some([DiskConfig { path: Some("./focal-server-cloudimg-arm64.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: None, disable_io_uring: false, pci_segment: 0 }]), net: None, rng: RngConfig { src: "/dev/urandom", iommu: false }, balloon: None, fs: Some([FsConfig { tag: "ip_file_path", socket: "/tmp/virtiofsd.socket", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }]), pmem: None, serial: ConsoleConfig { file: None, mode: Null, iommu: false }, console: ConsoleConfig { file: None, mode: Tty, iommu: false }, devices: None, user_devices: None, vdpa: None, vsock: None, iommu: false, numa: None, watchdog: false, platform: None }, poisoned: false, .. }
cloud-hypervisor: 20.676707ms: INFO:vmm/src/pci_segment.rs:90 -- Adding PCI segment: id=0, PCI MMIO config address: 0x30000000, device area [0x100000000-0xfeffffffff
cloud-hypervisor: 23.750323ms: INFO:vmm/src/device_manager.rs:2042 -- Creating virtio-block device: DiskConfig { path: Some("./focal-server-cloudimg-arm64.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 51.19658ms: INFO:vmm/src/device_manager.rs:2115 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 51.277514ms: INFO:virtio-devices/src/block.rs:445 -- Disk topology: DiskTopology { logical_block_size: 512, physical_block_size: 512, minimum_io_size: 512, optimal_io_size: 0 }
cloud-hypervisor: 51.362406ms: INFO:vmm/src/device_manager.rs:2347 -- Creating virtio-rng device: RngConfig { src: "/dev/urandom", iommu: false }
cloud-hypervisor: 51.488807ms: INFO:vmm/src/device_manager.rs:2395 -- Creating virtio-fs device: FsConfig { tag: "ip_file_path", socket: "/tmp/virtiofsd.socket", num_queues: 1, queue_size: 1024, id: Some("_fs1"), pci_segment: 0 }
cloud-hypervisor: 63.856343ms: INFO:vmm/src/vm.rs:2089 -- Booting VM
cloud-hypervisor: 73.926999ms: INFO:vmm/src/cpu.rs:768 -- Request to create new vCPUs: desired = 1, max = 1, allocated = 0, present = 0
cloud-hypervisor: 73.976372ms: INFO:vmm/src/cpu.rs:734 -- Creating vCPU: cpu_id = 0
cloud-hypervisor: 74.060691ms: INFO:vmm/src/cpu.rs:323 -- Configuring vCPU: cpu_id = 0
cloud-hypervisor: 74.44505ms: INFO:vmm/src/acpi.rs:772 -- Generated ACPI tables: took 355µs size = 5631
cloud-hypervisor: 74.476455ms: INFO:vmm/src/vm.rs:2066 -- Created ACPI tables: rsdp_addr = 0x40200000
cloud-hypervisor: 107.60736ms: INFO:vmm/src/cpu.rs:1082 -- Starting vCPUs: desired = 1, allocated = 1, present = 0
cloud-hypervisor: 107.696262ms: INFO:vmm/src/cpu.rs:871 -- Starting vCPU: cpu_id = 0
cloud-hypervisor: 186.764965ms: INFO:pci/src/configuration.rs:850 -- Detected BAR reprogramming: (BAR 4) 0x2ff80000->0x10000000
cloud-hypervisor: 191.851932ms: INFO:pci/src/configuration.rs:876 -- Detected BAR reprogramming: (BAR 5) 0xfe->0x2
cloud-hypervisor: 193.113908ms: INFO:pci/src/configuration.rs:876 -- Detected BAR reprogramming: (BAR 5) 0xfe->0x2
cloud-hypervisor: 193.951737ms: INFO:pci/src/configuration.rs:876 -- Detected BAR reprogramming: (BAR 5) 0xfe->0x2
cloud-hypervisor: 3.216324121s: INFO:virtio-devices/src/transport/pci_device.rs:1096 -- _virtio-pci-_disk0: Needs activation; writing to activate event fd
cloud-hypervisor: 3.216439689s: INFO:virtio-devices/src/transport/pci_device.rs:1101 -- _virtio-pci-_disk0: Needs activation; returning barrier
cloud-hypervisor: 3.216474583s: INFO:vmm/src/vm.rs:396 -- Waiting for barrier
cloud-hypervisor: 3.216530726s: INFO:vmm/src/lib.rs:1596 -- Trying to activate pending virtio devices: count = 1
cloud-hypervisor: 3.216630253s: INFO:virtio-devices/src/block.rs:526 -- Changing cache mode to writeback
cloud-hypervisor: 3.232045365s: INFO:virtio-devices/src/transport/pci_device.rs:312 -- _virtio-pci-_disk0: Waiting for barrier
cloud-hypervisor: 3.232326187s: INFO:virtio-devices/src/transport/pci_device.rs:314 -- _virtio-pci-_disk0: Barrier released
cloud-hypervisor: 3.232791271s: INFO:vmm/src/vm.rs:398 -- Barrier released
cloud-hypervisor: 3.25131682s: INFO:virtio-devices/src/transport/pci_device.rs:1096 -- _virtio-pci-__rng: Needs activation; writing to activate event fd
cloud-hypervisor: 3.251357444s: INFO:virtio-devices/src/transport/pci_device.rs:1101 -- _virtio-pci-__rng: Needs activation; returning barrier
cloud-hypervisor: 3.251368277s: INFO:vmm/src/vm.rs:396 -- Waiting for barrier
cloud-hypervisor: 3.251399577s: INFO:vmm/src/lib.rs:1596 -- Trying to activate pending virtio devices: count = 1
cloud-hypervisor: 3.252102307s: INFO:virtio-devices/src/transport/pci_device.rs:312 -- _virtio-pci-__rng: Waiting for barrier
cloud-hypervisor: 3.252220323s: INFO:virtio-devices/src/transport/pci_device.rs:314 -- _virtio-pci-__rng: Barrier released
cloud-hypervisor: 3.252391461s: INFO:vmm/src/vm.rs:398 -- Barrier released
cloud-hypervisor: 3.253086796s: INFO:virtio-devices/src/transport/pci_device.rs:1096 -- _virtio-pci-_fs1: Needs activation; writing to activate event fd
cloud-hypervisor: 3.253130856s: INFO:virtio-devices/src/transport/pci_device.rs:1101 -- _virtio-pci-_fs1: Needs activation; returning barrier
cloud-hypervisor: 3.253142887s: INFO:vmm/src/vm.rs:396 -- Waiting for barrier
cloud-hypervisor: 3.253155178s: INFO:vmm/src/lib.rs:1596 -- Trying to activate pending virtio devices: count = 1
cloud-hypervisor: 3.25327684s: ERROR:virtio-devices/src/device.rs:268 -- Number of enabled queues lower than min: 1 vs 2
cloud-hypervisor: 3.253821347s: <_disk0_q0> INFO:virtio-devices/src/epoll_helper.rs:135 -- KILL_EVENT received, stopping epoll loop
cloud-hypervisor: 3.253920666s: <__rng> INFO:virtio-devices/src/epoll_helper.rs:135 -- KILL_EVENT received, stopping epoll loop
VMM thread exited with error: Error activating virtio devices: ActivateVirtioDevices(VirtioActivate(BadActivate))
Any suggestion is appreciated. Thanks!
The text was updated successfully, but these errors were encountered: