Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to boot VM on ARM64 mounting filesystem through virtiofsd #4314

Closed
DennisHung opened this issue Jul 14, 2022 · 34 comments
Closed

Unable to boot VM on ARM64 mounting filesystem through virtiofsd #4314

DennisHung opened this issue Jul 14, 2022 · 34 comments
Assignees

Comments

@DennisHung
Copy link

Hi – I am unable to boot a VM with Cloud-Hypervisor with the filesystem mounted by virtiofsd on an ARM64 device using EDK2 following the instructions at:

cloud-hypervisor/arm64.md at main · cloud-hypervisor/cloud-hypervisor (github.com)

Note: The VM boots fine without trying to mount the filesystem i.e. without specifying the –-fs option to cloud-hypervisor. I also tried with different VM images (ubuntu, debian, archlinux) and got the same error when using the -–fs option.

Versions:

Cloud-hypervisor: v25.0
Virtiofsd: v1.3.0
Linux Kernel: 5.17

I built cloud-hypervisor, virtiofsd and EDK2 (CLOUDHV_EFI.fd) on the ARM64 device, and am trying to boot a VM image (focal-server-cloudimg-arm64.raw) using the following steps:

  1. Copy the following binaries to a folder:

cloud-hypervisor, virtiofsd, CLOUDHV_EFI.fd, focal-server-cloudimg-arm64.raw

  1. Start virtiofsd:

sudo ./virtiofsd --socket-path /tmp/virtiofsd.socket --shared-dir /tmp/test1

Output:

[2022-07-14T18:22:07Z INFO virtiofsd] Waiting for vhost-user socket connection...

Below after step 3 is executed:

[2022-07-14T18:22:21Z INFO virtiofsd] Client connected, servicing requests
[2022-07-14T18:22:25Z INFO virtiofsd] Client disconnected, shutting down

  1. Use Cloud-Hypervisor to boot the VM:

sudo RUST_BACKTRACE=1 ./cloud-hypervisor -v
--api-socket /tmp/cloud-hypervisor.socket
--kernel ./CLOUDHV_EFI.fd
--disk path=./focal-server-cloudimg-arm64.raw
--cpus boot=1
--fs tag=ip_tag,socket=/tmp/virtiofsd.socket
--memory size=1024M,shared=true
--serial tty --console off

Below is the output and error that I see (bolded below):

cloud-hypervisor: 11.720533ms: INFO:vmm/src/lib.rs:1611 -- API request event: VmCreate(Mutex { data: VmConfig { cpus: CpusConfig { boot_vcpus: 1, max_vcpus: 1, topology: None, kvm_hyperv: false, max_phys_bits: 46, affinity: None, features: CpuFeatures }, memory: MemoryConfig { size: 1073741824, mergeable: false, hotplug_method: Acpi, hotplug_size: None, hotplugged_size: None, shared: true, hugepages: false, hugepage_size: None, prefault: false, zones: None }, kernel: Some(KernelConfig { path: "./CLOUDHV_EFI.fd" }), initramfs: None, cmdline: CmdlineConfig { args: "" }, disks: Some([DiskConfig { path: Some("./focal-server-cloudimg-arm64.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: None, disable_io_uring: false, pci_segment: 0 }]), net: None, rng: RngConfig { src: "/dev/urandom", iommu: false }, balloon: None, fs: Some([FsConfig { tag: "ip_file_path", socket: "/tmp/virtiofsd.socket", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }]), pmem: None, serial: ConsoleConfig { file: None, mode: Null, iommu: false }, console: ConsoleConfig { file: None, mode: Tty, iommu: false }, devices: None, user_devices: None, vdpa: None, vsock: None, iommu: false, numa: None, watchdog: false, platform: None }, poisoned: false, .. }, Sender { .. })
cloud-hypervisor: 15.084085ms: INFO:vmm/src/lib.rs:1611 -- API request event: VmBoot(Sender { .. })
cloud-hypervisor: 17.236596ms: INFO:vmm/src/memory_manager.rs:1487 -- Creating userspace mapping: 40000000 -> ffff60000000 40000000, slot 0
cloud-hypervisor: 18.856473ms: INFO:vmm/src/memory_manager.rs:1521 -- Created userspace mapping: 40000000 -> ffff60000000 40000000
cloud-hypervisor: 18.987041ms: INFO:vmm/src/vm.rs:524 -- Booting VM from config: Mutex { data: VmConfig { cpus: CpusConfig { boot_vcpus: 1, max_vcpus: 1, topology: None, kvm_hyperv: false, max_phys_bits: 46, affinity: None, features: CpuFeatures }, memory: MemoryConfig { size: 1073741824, mergeable: false, hotplug_method: Acpi, hotplug_size: None, hotplugged_size: None, shared: true, hugepages: false, hugepage_size: None, prefault: false, zones: None }, kernel: Some(KernelConfig { path: "./CLOUDHV_EFI.fd" }), initramfs: None, cmdline: CmdlineConfig { args: "" }, disks: Some([DiskConfig { path: Some("./focal-server-cloudimg-arm64.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: None, disable_io_uring: false, pci_segment: 0 }]), net: None, rng: RngConfig { src: "/dev/urandom", iommu: false }, balloon: None, fs: Some([FsConfig { tag: "ip_file_path", socket: "/tmp/virtiofsd.socket", num_queues: 1, queue_size: 1024, id: None, pci_segment: 0 }]), pmem: None, serial: ConsoleConfig { file: None, mode: Null, iommu: false }, console: ConsoleConfig { file: None, mode: Tty, iommu: false }, devices: None, user_devices: None, vdpa: None, vsock: None, iommu: false, numa: None, watchdog: false, platform: None }, poisoned: false, .. }
cloud-hypervisor: 20.676707ms: INFO:vmm/src/pci_segment.rs:90 -- Adding PCI segment: id=0, PCI MMIO config address: 0x30000000, device area [0x100000000-0xfeffffffff
cloud-hypervisor: 23.750323ms: INFO:vmm/src/device_manager.rs:2042 -- Creating virtio-block device: DiskConfig { path: Some("./focal-server-cloudimg-arm64.raw"), readonly: false, direct: false, iommu: false, num_queues: 1, queue_size: 128, vhost_user: false, vhost_socket: None, poll_queue: true, rate_limiter_config: None, id: Some("_disk0"), disable_io_uring: false, pci_segment: 0 }
cloud-hypervisor: 51.19658ms: INFO:vmm/src/device_manager.rs:2115 -- Using asynchronous RAW disk file (io_uring)
cloud-hypervisor: 51.277514ms: INFO:virtio-devices/src/block.rs:445 -- Disk topology: DiskTopology { logical_block_size: 512, physical_block_size: 512, minimum_io_size: 512, optimal_io_size: 0 }
cloud-hypervisor: 51.362406ms: INFO:vmm/src/device_manager.rs:2347 -- Creating virtio-rng device: RngConfig { src: "/dev/urandom", iommu: false }
cloud-hypervisor: 51.488807ms: INFO:vmm/src/device_manager.rs:2395 -- Creating virtio-fs device: FsConfig { tag: "ip_file_path", socket: "/tmp/virtiofsd.socket", num_queues: 1, queue_size: 1024, id: Some("_fs1"), pci_segment: 0 }
cloud-hypervisor: 63.856343ms: INFO:vmm/src/vm.rs:2089 -- Booting VM
cloud-hypervisor: 73.926999ms: INFO:vmm/src/cpu.rs:768 -- Request to create new vCPUs: desired = 1, max = 1, allocated = 0, present = 0
cloud-hypervisor: 73.976372ms: INFO:vmm/src/cpu.rs:734 -- Creating vCPU: cpu_id = 0
cloud-hypervisor: 74.060691ms: INFO:vmm/src/cpu.rs:323 -- Configuring vCPU: cpu_id = 0
cloud-hypervisor: 74.44505ms: INFO:vmm/src/acpi.rs:772 -- Generated ACPI tables: took 355µs size = 5631
cloud-hypervisor: 74.476455ms: INFO:vmm/src/vm.rs:2066 -- Created ACPI tables: rsdp_addr = 0x40200000
cloud-hypervisor: 107.60736ms: INFO:vmm/src/cpu.rs:1082 -- Starting vCPUs: desired = 1, allocated = 1, present = 0
cloud-hypervisor: 107.696262ms: INFO:vmm/src/cpu.rs:871 -- Starting vCPU: cpu_id = 0
cloud-hypervisor: 186.764965ms: INFO:pci/src/configuration.rs:850 -- Detected BAR reprogramming: (BAR 4) 0x2ff80000->0x10000000
cloud-hypervisor: 191.851932ms: INFO:pci/src/configuration.rs:876 -- Detected BAR reprogramming: (BAR 5) 0xfe->0x2
cloud-hypervisor: 193.113908ms: INFO:pci/src/configuration.rs:876 -- Detected BAR reprogramming: (BAR 5) 0xfe->0x2
cloud-hypervisor: 193.951737ms: INFO:pci/src/configuration.rs:876 -- Detected BAR reprogramming: (BAR 5) 0xfe->0x2
cloud-hypervisor: 3.216324121s: INFO:virtio-devices/src/transport/pci_device.rs:1096 -- _virtio-pci-_disk0: Needs activation; writing to activate event fd
cloud-hypervisor: 3.216439689s: INFO:virtio-devices/src/transport/pci_device.rs:1101 -- _virtio-pci-_disk0: Needs activation; returning barrier
cloud-hypervisor: 3.216474583s: INFO:vmm/src/vm.rs:396 -- Waiting for barrier
cloud-hypervisor: 3.216530726s: INFO:vmm/src/lib.rs:1596 -- Trying to activate pending virtio devices: count = 1
cloud-hypervisor: 3.216630253s: INFO:virtio-devices/src/block.rs:526 -- Changing cache mode to writeback
cloud-hypervisor: 3.232045365s: INFO:virtio-devices/src/transport/pci_device.rs:312 -- _virtio-pci-_disk0: Waiting for barrier
cloud-hypervisor: 3.232326187s: INFO:virtio-devices/src/transport/pci_device.rs:314 -- _virtio-pci-_disk0: Barrier released
cloud-hypervisor: 3.232791271s: INFO:vmm/src/vm.rs:398 -- Barrier released
cloud-hypervisor: 3.25131682s: INFO:virtio-devices/src/transport/pci_device.rs:1096 -- _virtio-pci-__rng: Needs activation; writing to activate event fd
cloud-hypervisor: 3.251357444s: INFO:virtio-devices/src/transport/pci_device.rs:1101 -- _virtio-pci-__rng: Needs activation; returning barrier
cloud-hypervisor: 3.251368277s: INFO:vmm/src/vm.rs:396 -- Waiting for barrier
cloud-hypervisor: 3.251399577s: INFO:vmm/src/lib.rs:1596 -- Trying to activate pending virtio devices: count = 1
cloud-hypervisor: 3.252102307s: INFO:virtio-devices/src/transport/pci_device.rs:312 -- _virtio-pci-__rng: Waiting for barrier
cloud-hypervisor: 3.252220323s: INFO:virtio-devices/src/transport/pci_device.rs:314 -- _virtio-pci-__rng: Barrier released
cloud-hypervisor: 3.252391461s: INFO:vmm/src/vm.rs:398 -- Barrier released
cloud-hypervisor: 3.253086796s: INFO:virtio-devices/src/transport/pci_device.rs:1096 -- _virtio-pci-_fs1: Needs activation; writing to activate event fd
cloud-hypervisor: 3.253130856s: INFO:virtio-devices/src/transport/pci_device.rs:1101 -- _virtio-pci-_fs1: Needs activation; returning barrier
cloud-hypervisor: 3.253142887s: INFO:vmm/src/vm.rs:396 -- Waiting for barrier
cloud-hypervisor: 3.253155178s: INFO:vmm/src/lib.rs:1596 -- Trying to activate pending virtio devices: count = 1
cloud-hypervisor: 3.25327684s: ERROR:virtio-devices/src/device.rs:268 -- Number of enabled queues lower than min: 1 vs 2
cloud-hypervisor: 3.253821347s: <_disk0_q0> INFO:virtio-devices/src/epoll_helper.rs:135 -- KILL_EVENT received, stopping epoll loop
cloud-hypervisor: 3.253920666s: <__rng> INFO:virtio-devices/src/epoll_helper.rs:135 -- KILL_EVENT received, stopping epoll loop
VMM thread exited with error: Error activating virtio devices: ActivateVirtioDevices(VirtioActivate(BadActivate))

Any suggestion is appreciated. Thanks!

@jongwu
Copy link
Contributor

jongwu commented Jul 15, 2022

Hi @DennisHung -, Good catch. I'm trying to debug this issue and will give some update later.

@DennisHung
Copy link
Author

Hi @DennisHung -, Good catch. I'm trying to debug this issue and will give some update later.

@jongwu let me know if you have any findings or additional experiment you want me to try.

@jongwu
Copy link
Contributor

jongwu commented Jul 18, 2022

Hello @DennisHung -, I raise a PR to fix this issue, see this and have a try.

@rbradford
Copy link
Member

The correct fix might be to make the Cloud Hypervisor side work with fewer queues. I think we did this before. See a105089

rbradford added a commit to rbradford/cloud-hypervisor that referenced this issue Jul 18, 2022
Set the default number of queues to 1. Which matches with the spec and
also aligns with our testing.

Fixes: cloud-hypervisor#4314

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
rbradford added a commit to rbradford/cloud-hypervisor that referenced this issue Jul 18, 2022
Set the default number of queues to 1. Which matches with the spec and
also aligns with our testing.

Fixes: cloud-hypervisor#4314

Signed-off-by: Rob Bradford <robert.bradford@intel.com>
@sboeuf
Copy link
Member

sboeuf commented Jul 18, 2022

@rbradford based on the spec, there are at least 2 queues, the hiprio one and one request queue.
Based on https://github.com/oasis-tcs/virtio-spec/blob/virtio-1.3/virtio-fs.tex

@rbradford
Copy link
Member

Yes, I can see the 1.3 spec was reworded vs the 1.2 spec which makes it obvious that least two queues are needed.

@DennisHung
Copy link
Author

@rbradford @sboeuf @jongwu , can you point me which fix should I try first?

@jongwu
Copy link
Contributor

jongwu commented Jul 19, 2022

I try @rbradford's change but the firmware boot stuck at virtiofs probing. @DennisHung you can check it either.
For now, I don't know where the bug reside, clh or edk2. As we don't need virtiofs in firmware boot, remove the driver in edk2 part is reasonable for me. But we need also find the root cause.

@sboeuf
Copy link
Member

sboeuf commented Jul 19, 2022

As a workaround for @DennisHung, it's totally acceptable to build the FW without the virtio-fs driver. But long term we need to understand where the issue comes from and fix it in the right repo.

@sboeuf
Copy link
Member

sboeuf commented Jul 19, 2022

@rbradford after doing a bit of research, I think the problem comes from the fact the FW only enables one queue, the request one on queue index 1. The hiprio on queue index 0 isn't enabled, therefore we end up with the error posted by @DennisHung.
We can easily fix one part of that problem by expecting min_queues to be equal to the amount of requested request queues rather than requested request queues + one hiprio queue.
But we have another problem then, we must be able to tell which queue is enabled and which one isn't. That means we shouldn't pass only the list of enabled queues because we lose the information about the queue index. If we go down this path, we would end up modifying every virtio device implementation that we have since we would check if the queue is enabled or not before using it.
This issue clearly arises in this context where the queue index 0 isn't enabled, but the queue index 1 is.

@rbradford
Copy link
Member

rbradford commented Jul 19, 2022

@rbradford after doing a bit of research, I think the problem comes from the fact the FW only enables one queue, the request one on queue index 1. The hiprio on queue index 0 isn't enabled, therefore we end up with the error posted by @DennisHung. We can easily fix one part of that problem by expecting min_queues to be equal to the amount of requested request queues rather than requested request queues + one hiprio queue. But we have another problem then, we must be able to tell which queue is enabled and which one isn't. That means we shouldn't pass only the list of enabled queues because we lose the information about the queue index. If we go down this path, we would end up modifying every virtio device implementation that we have since we would check if the queue is enabled or not before using it. This issue clearly arises in this context where the queue index 0 isn't enabled, but the queue index 1 is.

I think setting min_queues (which is really the number of minimum enabled queues that the backend supports) to 1 is fine. But why do we, in Cloud Hypervisor care about which queue is which? Don't we just pass them to the backend... for it to decide?

@sboeuf
Copy link
Member

sboeuf commented Jul 19, 2022

I think setting min_queues (which is really the number of minimum enabled queues that the backend supports) to 1 is fine. But why do we, in Cloud Hypervisor care about which queue is which? Don't we just pass them to the backend... for it to decide?

Because the backend expect queue index 0 to be hiprio and queue index 1 to be request. And the vhost-user spec expects the VMM to provide the queue index as well.

@sboeuf
Copy link
Member

sboeuf commented Jul 19, 2022

@rbradford I think the current design we have in Cloud Hypervisor only covers the case where we can ask for multi queues but only get a subset being enabled from the guest. And that's because we don't expect some queues to be disabled in the middle of enabled queues.

@rbradford
Copy link
Member

I think setting min_queues (which is really the number of minimum enabled queues that the backend supports) to 1 is fine. But why do we, in Cloud Hypervisor care about which queue is which? Don't we just pass them to the backend... for it to decide?

Because the backend expect queue index 0 to be hiprio and queue index 1 to be request. And the vhost-user spec expects the VMM to provide the queue index as well.

Some more questions:

  1. Does the spec actually support just having a request queue?
  2. By backend I meant virtiofsd. If the daemon doesn't support having just 1 queue enabled then when would this edk2 implementation ever work? I always struggle with vhost-user related code but I couldn't see any logic in our code that cared about the meaning of specific queues.

@sboeuf
Copy link
Member

sboeuf commented Jul 19, 2022

I think setting min_queues (which is really the number of minimum enabled queues that the backend supports) to 1 is fine. But why do we, in Cloud Hypervisor care about which queue is which? Don't we just pass them to the backend... for it to decide?

Because the backend expect queue index 0 to be hiprio and queue index 1 to be request. And the vhost-user spec expects the VMM to provide the queue index as well.

Some more questions:

1. Does the spec actually support just having a request queue?

2. By backend I meant virtiofsd. If the daemon doesn't support having just 1 queue enabled then when would this edk2 implementation ever work?  I always struggle with vhost-user related code but I couldn't see any logic in our code that cared about the meaning of specific queues.

I'm wondering exactly the same thing! I have no idea if the backend is meant to work without the hiprio queue, and I don't think the spec gives some details about this.

@rbradford
Copy link
Member

(Architecturally I think preserving the queue index in a vector of tuples of (index, queue) is neat and might be useful elsewhere.)

@sboeuf
Copy link
Member

sboeuf commented Jul 19, 2022

(Architecturally I think preserving the queue index in a vector of tuples of (index, queue) is neat and might be useful elsewhere.)

Yes I agree I like this approach better since we know the list of queues is only about enabled queues. About the tuple, you're thinking about passing it everywhere? Or would you rather use a wrapping structure/type?

@rbradford
Copy link
Member

rbradford commented Jul 19, 2022

A tuple is cleaner as you can continue to nicely iterate over the vector.

@sboeuf
Copy link
Member

sboeuf commented Jul 19, 2022

Just a note that I've been investigating this issue all day and there are multiple aspects that makes it not work. The way the FW works, it polls to check if a descriptor has been used but unfortunately if we don't provide an eventfd to virtiofsd for the interrupt, the vring is never activated. So this is something to fix in virtiofsd. There's also the piece in Cloud Hypervisor about making sure we know the queue index associated with a queue (as mentioned previously in this thread).
And even after all these things, I still can't get the FW to complete its boot. I might spend more time on this issue, but in the short term, I really advise anyone using the FW with virtio-fs to simply remove the VirtioFs driver from both .dsc and .fdf files.

@DennisHung
Copy link
Author

I try @rbradford's change but the firmware boot stuck at virtiofs probing. @DennisHung you can check it either. For now, I don't know where the bug reside, clh or edk2. As we don't need virtiofs in firmware boot, remove the driver in edk2 part is reasonable for me. But we need also find the root cause.

@sboeuf @rbradford @jongwu I'm able to reproduce same outcomes as @jongwu tried. The only working workaround is disable virtiofs driver in edk2 firmware. The workaround is good to unblock my application hence I can use the workaround now then migrate correct solution later.

@sboeuf
Copy link
Member

sboeuf commented Jul 20, 2022

I've been able to boot with the EDK2 firmware (with the VirtioFs driver) by fixing things both in Cloud Hypervisor and vhost-user-backend.
The patches for Cloud Hypervisor have been submitted through #4341 and the patches for vhost-user-backend have been submitted through rust-vmm/vhost-user-backend#77
One more thing to note is that once the vhost-user-backend PR will be merged, the virtiofsd component will have to be updated so that it points to the latest version of vhost-user-backend crate.

@sboeuf sboeuf self-assigned this Jul 20, 2022
@DennisHung
Copy link
Author

I try @rbradford's change but the firmware boot stuck at virtiofs probing. @DennisHung you can check it either. For now, I don't know where the bug reside, clh or edk2. As we don't need virtiofs in firmware boot, remove the driver in edk2 part is reasonable for me. But we need also find the root cause.

@sboeuf @rbradford @jongwu I'm able to reproduce same outcomes as @jongwu tried. The only working workaround is disable virtiofs driver in edk2 firmware. The workaround is good to unblock my application hence I can use the workaround now then migrate correct solution later.

@sboeuf @jongwu , I saw another new issue after pull workaround from @jongwu . Pasting the steps and outcome from experiment in the followings and looking for suggestions. Thanks:

Experiment Steps:

  1. start virtiofsd:

sudo ./virtiofsd --socket-path /tmp/virtiofsd.socket --shared-dir /tmp/test1

  1. boot VM:
    sudo RUST_BACKTRACE=1 ./cloud-hypervisor -v
    --api-socket /tmp/cloud-hypervisor.socket
    --kernel ./CLOUDHV_EFI.fd
    --disk path=./focal-server-cloudimg-arm64.raw
    --cpus boot=1
    --fs tag=ip_tag,socket=/tmp/virtiofsd.socket
    --memory size=1024M,shared=true
    --serial tty --console off

  2. log in VM

  3. Check if any virtiofs volume mounted:
    df -aTh -t virtiofs
    df: no file systems processed

  4. create mount point and mount volume:
    mountPath=/mnt/test
    mkdir -p $mountPath
    mount -t virtiofs ip_tag $mountPath

  5. flush dummy message in mountPath
    echo "Hello" > $mountPath/test.log

  6. check if test.log in host
    ls -alh /tmp/test1
    -rw-r--r-- 1 root root xxx test.log

  7. Prepare to mount next volume, check if target tag mounted again:
    df -aTh -t virtiofs

9. VM stuck <-- This is an issue since I have to mount multiple volumes.
10. Click Ctrl+C to terminate, no help
11. Some new logs after hundreds seconds waiting:
INFO: task df:1614 blocked for more than 120 seconds.
Not tainted 5.4.0-122-generic 138-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

  1. Log in VM again from different terminal, dump call trace:
    Call trace:
    __switch_to+0x100/0x150
    __schedule+0x31c/0x7e0
    schedule+0x40/0xb8
    request_wait_answer+0x14c/0x288
    fuse_simple_request+0x1a8/0x2c0
    fuse_statfs+0xdc/0x120
    statfs_by_dentry+0x74/0xa0
    vfs_statfs+0x2c/0xd0
    user_statfs+0x68/0xc8
    __do_sys_statfs+0x28/0x68
    __arm64_sys_statfs+0x24/0x30
    el0_svc_common.constprop.0+0xf4/0x200
    el0_svc_handler+0x38/0xa8
    el0_svc+0x10/0x180

@jongwu
Copy link
Contributor

jongwu commented Jul 21, 2022

Hi @DennisHung -, I have not reproduced this new issue. I think it has no relationship with firmware as the kernel has controlled the whole thing. Can you test it for direct kernel boot use the same linux kernel as firmware boot?

@sboeuf
Copy link
Member

sboeuf commented Jul 21, 2022

@DennisHung I can't reproduce it either (I'm using the setup I've described), and here is the output I get:

$ df -aTh -t virtiofs
Filesystem     Type      Size  Used Avail Use% Mounted on
myfs           virtiofs  468G  296G  149G  67% /mnt

@rbradford
Copy link
Member

5.4.0 kernel is pretty old. It's possible that the virtio-fs support is not as mature as the current kernels.

@DennisHung
Copy link
Author

@DennisHung I can't reproduce it either (I'm using the setup I've described), and here is the output I get:

$ df -aTh -t virtiofs
Filesystem     Type      Size  Used Avail Use% Mounted on
myfs           virtiofs  468G  296G  149G  67% /mnt

@rbradford @sboeuf @jongwu I found issue is gone after update virtiofsd driver. Let's focus on the original issue now. What's the plan for release the solution?

@jongwu
Copy link
Contributor

jongwu commented Jul 22, 2022

@DennisHung -, if you mean "removing the virtio-fs driver" in edk2 for aarch64/CloudHv, I will send the change to upstream later.

@sboeuf
Copy link
Member

sboeuf commented Jul 22, 2022

@DennisHung @jongwu

Here is the status:

diff --git a/Cargo.toml b/Cargo.toml
index c2a72f4..f049f46 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -20,13 +20,14 @@ libc = "~0.2.120"
 log = "0.4"
 libseccomp-sys = "0.2"
 structopt = "0.3"
-vhost-user-backend = "0.5"
+vhost-user-backend = { git = "https://github.com/rust-vmm/vhost-user-backend" }
 vhost = "0.4"
 virtio-bindings = { version = "0.1", features = ["virtio-v5_0_0"] }
 vm-memory = { version = ">=0.7", features = ["backend-mmap", "backend-atomic"] }
 virtio-queue = "0.4"
-vmm-sys-util = "0.9"
+vmm-sys-util = "0.10"
 syslog = "6.0"
 
 [profile.release]
 lto = true

and running cargo build --release.

  • @jongwu please don't send your patch to upstream EDK2 since the driver works fine with the patches I mentioned

@jongwu
Copy link
Contributor

jongwu commented Jul 22, 2022

OK, thanks @sboeuf

@maburlik
Copy link

@DennisHung @jongwu

Here is the status:

diff --git a/Cargo.toml b/Cargo.toml
index c2a72f4..f049f46 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -20,13 +20,14 @@ libc = "~0.2.120"
 log = "0.4"
 libseccomp-sys = "0.2"
 structopt = "0.3"
-vhost-user-backend = "0.5"
+vhost-user-backend = { git = "https://github.com/rust-vmm/vhost-user-backend" }
 vhost = "0.4"
 virtio-bindings = { version = "0.1", features = ["virtio-v5_0_0"] }
 vm-memory = { version = ">=0.7", features = ["backend-mmap", "backend-atomic"] }
 virtio-queue = "0.4"
-vmm-sys-util = "0.9"
+vmm-sys-util = "0.10"
 syslog = "6.0"
 
 [profile.release]
 lto = true

and running cargo build --release.

  • @jongwu please don't send your patch to upstream EDK2 since the driver works fine with the patches I mentioned

@sboeuf is there a virtiofsd branch that has pulled in this change? If so, it would make it much easier to divert our CI to another commit/branch than a new private fork. I have forked a private repo to try out this change with the private build of virtiofsd & cloud hypervisor and it works. I am not seeing the queues issue anymore.

@sboeuf
Copy link
Member

sboeuf commented Jul 29, 2022

@DennisHung @jongwu
Here is the status:

diff --git a/Cargo.toml b/Cargo.toml
index c2a72f4..f049f46 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -20,13 +20,14 @@ libc = "~0.2.120"
 log = "0.4"
 libseccomp-sys = "0.2"
 structopt = "0.3"
-vhost-user-backend = "0.5"
+vhost-user-backend = { git = "https://github.com/rust-vmm/vhost-user-backend" }
 vhost = "0.4"
 virtio-bindings = { version = "0.1", features = ["virtio-v5_0_0"] }
 vm-memory = { version = ">=0.7", features = ["backend-mmap", "backend-atomic"] }
 virtio-queue = "0.4"
-vmm-sys-util = "0.9"
+vmm-sys-util = "0.10"
 syslog = "6.0"
 
 [profile.release]
 lto = true

and running cargo build --release.

  • @jongwu please don't send your patch to upstream EDK2 since the driver works fine with the patches I mentioned

@sboeuf is there a virtiofsd branch that has pulled in this change? If so, it would make it much easier to divert our CI to another commit/branch than a new private fork. I have forked a private repo to try out this change with the private build of virtiofsd & cloud hypervisor and it works. I am not seeing the queues issue anymore.

Not yet. We need a release to be cut on vhost-user-backend crate before we can send a patch to bump virtiofsd dependency.

@sboeuf
Copy link
Member

sboeuf commented Aug 1, 2022

Just a heads up that virtiofsd will be fixed as soon as https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/129 gets merged.

@sboeuf
Copy link
Member

sboeuf commented Aug 4, 2022

https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/129 has been merged, therefore the main branch of virtiofsd now contains all the patches you need.
I'm closing this issue now, feel free to reopen it if you think something is still not resolved yet.

@sboeuf sboeuf closed this as completed Aug 4, 2022
@rbradford rbradford added this to Done in Release 26.0 Aug 4, 2022
@maburlik
Copy link

https://gitlab.com/virtio-fs/virtiofsd/-/merge_requests/129 has been merged, therefore the main branch of virtiofsd now contains all the patches you need. I'm closing this issue now, feel free to reopen it if you think something is still not resolved yet.

Thank you @sboeuf, @rbradford, @jongwu, your efforts are much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

5 participants