Skip to content
Permalink
Xie-Yongji/Int…
Switch branches/tags

Commits on Jul 13, 2021

  1. Documentation: Add documentation for VDUSE

    VDUSE (vDPA Device in Userspace) is a framework to support
    implementing software-emulated vDPA devices in userspace. This
    document is intended to clarify the VDUSE design and usage.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  2. vduse: Introduce VDUSE - vDPA Device in Userspace

    This VDUSE driver enables implementing software-emulated vDPA
    devices in userspace. The vDPA device is created by
    ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
    interface (/dev/vduse/$NAME) is exported to userspace for device
    emulation.
    
    In order to make the device emulation more secure, the device's
    control path is handled in kernel. A message mechnism is introduced
    to forward some dataplane related control messages to userspace.
    
    And in the data path, the DMA buffer will be mapped into userspace
    address space through different ways depending on the vDPA bus to
    which the vDPA device is attached. In virtio-vdpa case, the MMU-based
    IOMMU driver is used to achieve that. And in vhost-vdpa case, the
    DMA buffer is reside in a userspace memory region which can be shared
    to the VDUSE userspace processs via transferring the shmfd.
    
    For more details on VDUSE design and usage, please see the follow-on
    Documentation commit.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  3. vduse: Implement an MMU-based IOMMU driver

    This implements an MMU-based IOMMU driver to support mapping
    kernel dma buffer into userspace. The basic idea behind it is
    treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set
    up MMU mapping instead of IOMMU mapping for the DMA transfer so
    that the userspace process is able to use its virtual address to
    access the dma buffer in kernel.
    
    And to avoid security issue, a bounce-buffering mechanism is
    introduced to prevent userspace accessing the original buffer
    directly.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  4. vdpa: Support transferring virtual addressing during DMA mapping

    This patch introduces an attribute for vDPA device to indicate
    whether virtual address can be used. If vDPA device driver set
    it, vhost-vdpa bus driver will not pin user page and transfer
    userspace virtual address instead of physical address during
    DMA mapping. And corresponding vma->vm_file and offset will be
    also passed as an opaque pointer.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  5. vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

    The upcoming patch is going to support VA mapping/unmapping.
    So let's factor out the logic of PA mapping/unmapping firstly
    to make the code more readable.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  6. vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()

    Add an opaque pointer for DMA mapping.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  7. vhost-iotlb: Add an opaque pointer for vhost IOTLB

    Add an opaque pointer for vhost IOTLB. And introduce
    vhost_iotlb_add_range_ctx() to accept it.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  8. virtio: Handle device reset failure in register_virtio_device()

    The device reset may fail in virtio-vdpa case now, so add checks to
    its return value and fail the register_virtio_device().
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  9. virtio-vdpa: Handle the failure of vdpa_reset()

    The vpda_reset() may fail now. This adds check to its return
    value and fail the virtio_vdpa_reset().
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  10. virtio_config: Add a return value to reset function

    This adds a return value to reset function so that we can
    handle the reset failure later. No functional changes.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  11. virtio: Don't set FAILED status bit on device index allocation failure

    We don't need to set FAILED status bit on device index allocation
    failure since the device initialization hasn't been started yet.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  12. vhost-vdpa: Handle the failure of vdpa_reset()

    The vdpa_reset() may fail now. This adds check to its return
    value and fail the vhost_vdpa_open().
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  13. vhost-vdpa: Fail the vhost_vdpa_set_status() on reset failure

    Re-read the device status to ensure it's set to zero during
    resetting. Otherwise, fail the vhost_vdpa_set_status() after timeout.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  14. vdpa: Fail the vdpa_reset() if fail to set device status to zero

    Re-read the device status to ensure it's set to zero during
    resetting. Otherwise, fail the vdpa_reset() after timeout.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  15. vdpa: Fix code indentation

    Use tabs to indent the code instead of spaces.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  16. file: Export receive_fd() to modules

    Export receive_fd() so that some modules can use
    it to pass file descriptor between processes without
    missing any security stuffs.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021
  17. iova: Export alloc_iova_fast() and free_iova_fast()

    Export alloc_iova_fast() and free_iova_fast() so that
    some modules can use it to improve iova allocation efficiency.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Jul 13, 2021

Commits on Jul 8, 2021

  1. virtio-mem: prioritize unplug from ZONE_MOVABLE in Big Block Mode

    Let's handle unplug in Big Block Mode similar to Sub Block Mode --
    prioritize memory blocks onlined to ZONE_MOVABLE.
    
    We won't care further about big blocks with mixed zones, as it's
    rather a corner case that won't matter in practice.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210602185720.31821-8-david@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    davidhildenbrand authored and mstsirkin committed Jul 8, 2021
  2. virtio-mem: simplify high-level unplug handling in Big Block Mode

    Let's simplify high-level big block selection when unplugging in
    Big Block Mode.
    
    Combine handling of offline and online blocks. We can get rid of
    virtio_mem_bbm_bb_is_offline() and simply use
    virtio_mem_bbm_offline_remove_and_unplug_bb(), as that already tolerates
    offline parts.
    
    We can race with concurrent onlining/offlining either way, so we don;t
    have to be super correct by failing if an offline big block we'd like to
    unplug just got (partially) onlined.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210602185720.31821-7-david@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    davidhildenbrand authored and mstsirkin committed Jul 8, 2021
  3. virtio-mem: prioritize unplug from ZONE_MOVABLE in Sub Block Mode

    Until now, memory provided by a single virtio-mem device was usually
    either onlined completely to ZONE_MOVABLE (online_movable) or to
    ZONE_NORMAL (online_kernel); however, that will change in the future.
    
    There are two reasons why we want to track to which zone a memory blocks
    belongs to and prioritize ZONE_MOVABLE blocks:
    
    1) Memory managed by ZONE_MOVABLE can more likely get unplugged, therefore,
       resulting in a faster memory hotunplug process. Further, we can more
       reliably unplug and remove complete memory blocks, removing metadata
       allocated for the whole memory block.
    
    2) We want to avoid corner cases where unplugging with the current scheme
       (highest to lowest address) could result in accidential zone imbalances,
       whereby we remove too much ZONE_NORMAL memory for ZONE_MOVABLE memory
       of the same device.
    
    Let's track the zone via memory block states and try unplug from
    ZONE_MOVABLE first. Rename VIRTIO_MEM_SBM_MB_ONLINE* to
    VIRTIO_MEM_SBM_MB_KERNEL* to avoid even longer state names.
    
    In commit 27f8527 ("virtio-mem: don't special-case ZONE_MOVABLE"),
    we removed slightly similar tracking for fully plugged memory blocks to
    support unplugging from ZONE_MOVABLE at all -- as we didn't allow partially
    plugged memory blocks in ZONE_MOVABLE before that. That commit already
    mentioned "In the future, we might want to remember the zone again and use
    the information when (un)plugging memory."
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210602185720.31821-6-david@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    davidhildenbrand authored and mstsirkin committed Jul 8, 2021
  4. virtio-mem: simplify high-level unplug handling in Sub Block Mode

    Let's simplify by introducing a new virtio_mem_sbm_unplug_any_sb(),
    similar to virtio_mem_sbm_plug_any_sb(), to simplify high-level memory
    block selection when unplugging in Sub Block Mode.
    
    Rename existing virtio_mem_sbm_unplug_any_sb() to
    virtio_mem_sbm_unplug_any_sb_raw().
    
    The only change is that we now temporarily unlock the hotplug mutex around
    cond_resched() when processing offline memory blocks, which doesn't
    make a real difference as we already have to temporarily unlock in
    virtio_mem_sbm_unplug_any_sb_offline() when removing a memory block.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210602185720.31821-5-david@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    davidhildenbrand authored and mstsirkin committed Jul 8, 2021
  5. virtio-mem: simplify high-level plug handling in Sub Block Mode

    Let's simplify high-level memory block selection when plugging in Sub
    Block Mode.
    
    No need for two separate loops when selecting memory blocks for plugging
    memory. Avoid passing the "online" state by simply obtaining the state
    in virtio_mem_sbm_plug_any_sb().
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210602185720.31821-4-david@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    davidhildenbrand authored and mstsirkin committed Jul 8, 2021
  6. virtio-mem: use page_zonenum() in virtio_mem_fake_offline()

    Let's use page_zonenum() instead of zone_idx(page_zone()).
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210602185720.31821-3-david@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    davidhildenbrand authored and mstsirkin committed Jul 8, 2021
  7. virtio-mem: don't read big block size in Sub Block Mode

    We are reading a Big Block Mode value while in Sub Block Mode
    when initializing. Fortunately, vm->bbm.bb_size maps to some counter
    in the vm->sbm.mb_count array, which is 0 at that point in time.
    
    No harm done; still, this was unintended and is not future-proof.
    
    Fixes: 4ba50cd ("virtio-mem: Big Block Mode (BBM) memory hotplug")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20210602185720.31821-2-david@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    davidhildenbrand authored and mstsirkin committed Jul 8, 2021
  8. virtio/vdpa: clear the virtqueue state during probe

    Clear the available index as part of the initialization process to
    clear and values that might be left from previous usage of the device.
    For example, if the device was previously used by vhost_vdpa and now
    probed by vhost_vdpa, you want to start with indices.
    
    Fixes: c043b4a ("virtio: introduce a vDPA based transport")
    Signed-off-by: Eli Cohen <elic@nvidia.com>
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210602021536.39525-5-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Eli Cohen <elic@nvidia.com>
    Eli Cohen authored and mstsirkin committed Jul 8, 2021
  9. vp_vdpa: allow set vq state to initial state after reset

    We used to fail the set_vq_state() since it was not supported yet by
    the virtio spec. But if the bus tries to set the state which is equal
    to the device initial state after reset, we can let it go.
    
    This is a must for virtio_vdpa() to set vq state during probe which is
    required for some vDPA parents.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210602021536.39525-4-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Eli Cohen <elic@nvidia.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  10. virtio-pci library: introduce vp_modern_get_driver_features()

    This patch introduce a helper to get driver/guest features from the
    device.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210602021536.39525-3-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Eli Cohen <elic@nvidia.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  11. vdpa: support packed virtqueue for set/get_vq_state()

    This patch extends the vdpa_vq_state to support packed virtqueue
    state which is basically the device/driver ring wrap counters and the
    avail and used index. This will be used for the virito-vdpa support
    for the packed virtqueue and the future vhost/vhost-vdpa support for
    the packed virtqueue.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210602021536.39525-2-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Eli Cohen <elic@nvidia.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  12. virtio-ring: store DMA metadata in desc_extra for split virtqueue

    For split virtqueue, we used to depend on the address, length and
    flags stored in the descriptor ring for DMA unmapping. This is unsafe
    for the case since the device can manipulate the behavior of virtio
    driver, IOMMU drivers and swiotlb.
    
    For safety, maintain the DMA address, DMA length, descriptor flags and
    next filed of the non indirect descriptors in vring_desc_state_extra
    when DMA API is used for virtio as we did for packed virtqueue and use
    those metadata for performing DMA operations. Indirect descriptors
    should be safe since they are using streaming mappings.
    
    With this the descriptor ring is write only form the view of the
    driver.
    
    This slight increase the footprint of the drive but it's not noticed
    through pktgen (64B) test and netperf test in the case of virtio-net.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210604055350.58753-8-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  13. virtio: use err label in __vring_new_virtqueue()

    Using error label for unwind in __vring_new_virtqueue. This is useful
    for future refacotring.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210604055350.58753-7-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  14. virtio_ring: introduce virtqueue_desc_add_split()

    This patch introduces a helper for storing descriptor in the
    descriptor table for split virtqueue.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210604055350.58753-6-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  15. virtio_ring: secure handling of mapping errors

    We should not depend on the DMA address, length and flag of descriptor
    table since they could be wrote with arbitrary value by the device. So
    this patch switches to use the stored one in desc_extra.
    
    Note that the indirect descriptors are fine since they are read-only
    streaming mappings.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210604055350.58753-5-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  16. virtio-ring: factor out desc_extra allocation

    A helper is introduced for the logic of allocating the descriptor
    extra data. This will be reused by split virtqueue.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210604055350.58753-4-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  17. virtio_ring: rename vring_desc_extra_packed

    Rename vring_desc_extra_packed to vring_desc_extra since the structure
    are pretty generic which could be reused by split virtqueue as well.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210604055350.58753-3-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
  18. virtio-ring: maintain next in extra state for packed virtqueue

    This patch moves next from vring_desc_state_packed to
    vring_desc_desc_extra_packed. This makes it simpler to let extra state
    to be reused by split virtqueue.
    
    Signed-off-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210604055350.58753-2-jasowang@redhat.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    jasowang authored and mstsirkin committed Jul 8, 2021
Older