Skip to content
Permalink
Xie-Yongji/Int…
Switch branches/tags

Commits on Aug 18, 2021

  1. Documentation: Add documentation for VDUSE

    VDUSE (vDPA Device in Userspace) is a framework to support
    implementing software-emulated vDPA devices in userspace. This
    document is intended to clarify the VDUSE design and usage.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  2. vduse: Introduce VDUSE - vDPA Device in Userspace

    This VDUSE driver enables implementing software-emulated vDPA
    devices in userspace. The vDPA device is created by
    ioctl(VDUSE_CREATE_DEV) on /dev/vduse/control. Then a char device
    interface (/dev/vduse/$NAME) is exported to userspace for device
    emulation.
    
    In order to make the device emulation more secure, the device's
    control path is handled in kernel. A message mechnism is introduced
    to forward some dataplane related control messages to userspace.
    
    And in the data path, the DMA buffer will be mapped into userspace
    address space through different ways depending on the vDPA bus to
    which the vDPA device is attached. In virtio-vdpa case, the MMU-based
    software IOTLB is used to achieve that. And in vhost-vdpa case, the
    DMA buffer is reside in a userspace memory region which can be shared
    to the VDUSE userspace processs via transferring the shmfd.
    
    For more details on VDUSE design and usage, please see the follow-on
    Documentation commit.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  3. vduse: Implement an MMU-based software IOTLB

    This implements an MMU-based software IOTLB to support mapping
    kernel dma buffer into userspace dynamically. The basic idea
    behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The
    software IOTLB will set up MMU mapping instead of IOMMU mapping
    for the DMA transfer so that the userspace process is able to
    use its virtual address to access the dma buffer in kernel.
    
    To avoid security issue, a bounce-buffering mechanism is
    introduced to prevent userspace accessing the original buffer
    directly which may contain other kernel data. During the mapping,
    unmapping, the software IOTLB will copy the data from the original
    buffer to the bounce buffer and back, depending on the direction
    of the transfer. And the bounce-buffer addresses will be mapped
    into the user address space instead of the original one.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  4. vdpa: Support transferring virtual addressing during DMA mapping

    This patch introduces an attribute for vDPA device to indicate
    whether virtual address can be used. If vDPA device driver set
    it, vhost-vdpa bus driver will not pin user page and transfer
    userspace virtual address instead of physical address during
    DMA mapping. And corresponding vma->vm_file and offset will be
    also passed as an opaque pointer.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  5. vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

    The upcoming patch is going to support VA mapping/unmapping.
    So let's factor out the logic of PA mapping/unmapping firstly
    to make the code more readable.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  6. vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()

    Add an opaque pointer for DMA mapping.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  7. vhost-iotlb: Add an opaque pointer for vhost IOTLB

    Add an opaque pointer for vhost IOTLB. And introduce
    vhost_iotlb_add_range_ctx() to accept it.
    
    Suggested-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  8. vhost-vdpa: Handle the failure of vdpa_reset()

    The vdpa_reset() may fail now. This adds check to its return
    value and fail the vhost_vdpa_open().
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  9. vdpa: Add reset callback in vdpa_config_ops

    This adds a new callback to support device specific reset
    behavior. The vdpa bus driver will call the reset function
    instead of setting status to zero during resetting if device
    driver supports the new callback.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  10. vdpa: Fix some coding style issues

    Fix some code indent issues and following checkpatch warning:
    
    WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
    371: FILE: include/linux/vdpa.h:371:
    +static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset,
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  11. file: Export receive_fd() to modules

    Export receive_fd() so that some modules can use
    it to pass file descriptor between processes without
    missing any security stuffs.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021
  12. iova: Export alloc_iova_fast() and free_iova_fast()

    Export alloc_iova_fast() and free_iova_fast() so that
    some modules can make use of the per-CPU cache to get
    rid of rbtree spinlock in alloc_iova() and free_iova()
    during IOVA allocation.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    YongjiXie authored and intel-lab-lkp committed Aug 18, 2021

Commits on Aug 11, 2021

  1. vdpa/mlx5: Fix queue type selection logic

    get_queue_type() comments that splict virtqueue is preferred, however,
    the actual logic preferred packed virtqueues. Since firmware has not
    supported packed virtqueues we ended up using split virtqueues as was
    desired.
    
    Since we do not advertise support for packed virtqueues, we add a check
    to verify split virtqueues are indeed supported.
    
    Fixes: 1a86b37 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
    Signed-off-by: Eli Cohen <elic@nvidia.com>
    Link: https://lore.kernel.org/r/20210811053759.66752-1-elic@nvidia.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Eli Cohen authored and mstsirkin committed Aug 11, 2021
  2. vdpa/mlx5: Avoid destroying MR on empty iotlb

    The current code treats an empty iotlb provdied in set_map() as a
    special case and destroy the memory region object. This must not be done
    since the virtqueue objects reference this MR. Doing so will cause the
    driver unload to emit errors and log timeouts caused by the firmware
    complaining on busy resources.
    
    This patch treats an empty iotlb as any other change of mapping. In this
    case, mlx5_vdpa_create_mr() will fail and the entire set_map() call to
    fail.
    
    This issue has not been encountered before but was seen to occur in a
    non-official version of qemu. Since qemu is a userspace program, the
    driver must protect against such case.
    
    Fixes: 94abbcc ("vdpa/mlx5: Add shared memory registration code")
    Signed-off-by: Eli Cohen <elic@nvidia.com>
    Link: https://lore.kernel.org/r/20210811053713.66658-1-elic@nvidia.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Eli Cohen authored and mstsirkin committed Aug 11, 2021
  3. tools/virtio: fix build

    We use a spinlock now so add a stub.
    Ignore bogus uninitialized variable warnings.
    
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    mstsirkin committed Aug 11, 2021
  4. virtio_ring: pull in spinlock header

    we use a spinlock now pull in the correct header to
    make virtio_ring.c self sufficient.
    
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    mstsirkin committed Aug 11, 2021
  5. vringh: pull in spinlock header

    we use a spinlock now pull in the correct header to
    make vring.h self sufficient.
    
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    mstsirkin committed Aug 11, 2021
  6. virtio-blk: Add validation for block size in config space

    An untrusted device might presents an invalid block size
    in configuration space. This tries to add validation for it
    in the validate callback and clear the VIRTIO_BLK_F_BLK_SIZE
    feature bit if the value is out of the supported range.
    
    And we also double check the value in virtblk_probe() in
    case that it's changed after the validation.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Link: https://lore.kernel.org/r/20210809101609.148-1-xieyongji@bytedance.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    YongjiXie authored and mstsirkin committed Aug 11, 2021
  7. vringh: Use wiov->used to check for read/write desc order

    As __vringh_iov() traverses a descriptor chain, it populates
    each descriptor entry into either read or write vring iov
    and increments that iov's ->used member. So, as we iterate
    over a descriptor chain, at any point, (riov/wriov)->used
    value gives the number of descriptor enteries available,
    which are to be read or written by the device. As all read
    iovs must precede the write iovs, wiov->used should be zero
    when we are traversing a read descriptor. Current code checks
    for wiov->i, to figure out whether any previous entry in the
    current descriptor chain was a write descriptor. However,
    iov->i is only incremented, when these vring iovs are consumed,
    at a later point, and remain 0 in __vringh_iov(). So, correct
    the check for read and write descriptor order, to use
    wiov->used.
    
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Link: https://lore.kernel.org/r/1624591502-4827-1-git-send-email-neeraju@codeaurora.org
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Neeraj Upadhyay authored and mstsirkin committed Aug 11, 2021
  8. virtio_vdpa: reject invalid vq indices

    Do not call vDPA drivers' callbacks with vq indicies larger than what
    the drivers indicate that they support.  vDPA drivers do not bounds
    check the indices.
    
    Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
    Link: https://lore.kernel.org/r/20210701114652.21956-1-vincent.whitchurch@axis.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    vwax authored and mstsirkin committed Aug 11, 2021
  9. vdpa: Add documentation for vdpa_alloc_device() macro

    The return value of vdpa_alloc_device() macro is not very
    clear, so that most of callers did the wrong check. Let's
    add some comments to better document it.
    
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Link: https://lore.kernel.org/r/20210715080026.242-4-xieyongji@bytedance.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    YongjiXie authored and mstsirkin committed Aug 11, 2021
  10. vDPA/ifcvf: Fix return value check for vdpa_alloc_device()

    The vdpa_alloc_device() returns an error pointer upon
    failure, not NULL. To handle the failure correctly, this
    replaces NULL check with IS_ERR() check and propagate the
    error upwards.
    
    Fixes: 5a2414b ("virtio: Intel IFC VF driver for VDPA")
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Link: https://lore.kernel.org/r/20210715080026.242-3-xieyongji@bytedance.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    YongjiXie authored and mstsirkin committed Aug 11, 2021
  11. vp_vdpa: Fix return value check for vdpa_alloc_device()

    The vdpa_alloc_device() returns an error pointer upon
    failure, not NULL. To handle the failure correctly, this
    replaces NULL check with IS_ERR() check and propagate the
    error upwards.
    
    Fixes: 64b9f64 ("vdpa: introduce virtio pci driver")
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Link: https://lore.kernel.org/r/20210715080026.242-2-xieyongji@bytedance.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    YongjiXie authored and mstsirkin committed Aug 11, 2021
  12. vdpa_sim: Fix return value check for vdpa_alloc_device()

    The vdpa_alloc_device() returns an error pointer upon
    failure, not NULL. To handle the failure correctly, this
    replaces NULL check with IS_ERR() check and propagate the
    error upwards.
    
    Fixes: 2c53d0f ("vdpasim: vDPA device simulator")
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Link: https://lore.kernel.org/r/20210715080026.242-1-xieyongji@bytedance.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    YongjiXie authored and mstsirkin committed Aug 11, 2021
  13. vhost: Fix the calculation in vhost_overflow()

    This fixes the incorrect calculation for integer overflow
    when the last address of iova range is 0xffffffff.
    
    Fixes: ec33d03 ("vhost: detect 32 bit integer wrap around")
    Reported-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210728130756.97-2-xieyongji@bytedance.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    YongjiXie authored and mstsirkin committed Aug 11, 2021

Commits on Aug 10, 2021

  1. vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()

    The "msg->iova + msg->size" addition can have an integer overflow
    if the iotlb message is from a malicious user space application.
    So let's fix it.
    
    Fixes: 1b48dc0 ("vhost: vdpa: report iova range")
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20210728130756.97-1-xieyongji@bytedance.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    YongjiXie authored and mstsirkin committed Aug 10, 2021
  2. virtio_pci: Support surprise removal of virtio pci device

    When a virtio pci device undergo surprise removal (aka async removal in
    PCIe spec), mark the device as broken so that any upper layer drivers can
    abort any outstanding operation.
    
    When a virtio net pci device undergo surprise removal which is used by a
    NetworkManager, a below call trace was observed.
    
    kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
    watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
    CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S      W I  L    5.13.0-hotplug+ torvalds#8
    Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
    Workqueue: events linkwatch_event
    RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
    Call Trace:
     virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
     ? __hw_addr_create_ex+0x85/0xc0
     __dev_mc_add+0x72/0x80
     igmp6_group_added+0xa7/0xd0
     ipv6_mc_up+0x3c/0x60
     ipv6_find_idev+0x36/0x80
     addrconf_add_dev+0x1e/0xa0
     addrconf_dev_config+0x71/0x130
     addrconf_notify+0x1f5/0xb40
     ? rtnl_is_locked+0x11/0x20
     ? __switch_to_asm+0x42/0x70
     ? finish_task_switch+0xaf/0x2c0
     ? raw_notifier_call_chain+0x3e/0x50
     raw_notifier_call_chain+0x3e/0x50
     netdev_state_change+0x67/0x90
     linkwatch_do_dev+0x3c/0x50
     __linkwatch_run_queue+0xd2/0x220
     linkwatch_event+0x21/0x30
     process_one_work+0x1c8/0x370
     worker_thread+0x30/0x380
     ? process_one_work+0x370/0x370
     kthread+0x118/0x140
     ? set_kthread_struct+0x40/0x40
     ret_from_fork+0x1f/0x30
    
    Hence, add the ability to abort the command on surprise removal
    which prevents infinite loop and system lockup.
    
    Signed-off-by: Parav Pandit <parav@nvidia.com>
    Link: https://lore.kernel.org/r/20210721142648.1525924-5-parav@nvidia.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    paravmellanox authored and mstsirkin committed Aug 10, 2021
  3. virtio: Protect vqs list access

    VQs may be accessed to mark the device broken while they are
    created/destroyed. Hence protect the access to the vqs list.
    
    Fixes: e2dcdfe ("virtio: virtio_break_device() to mark all virtqueues broken.")
    Signed-off-by: Parav Pandit <parav@nvidia.com>
    Link: https://lore.kernel.org/r/20210721142648.1525924-4-parav@nvidia.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    paravmellanox authored and mstsirkin committed Aug 10, 2021
  4. virtio: Keep vring_del_virtqueue() mirror of VQ create

    Keep the vring_del_virtqueue() mirror of the create routines.
    i.e. to delete list entry first as it is added last during the create
    routine.
    
    Signed-off-by: Parav Pandit <parav@nvidia.com>
    Link: https://lore.kernel.org/r/20210721142648.1525924-3-parav@nvidia.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    paravmellanox authored and mstsirkin committed Aug 10, 2021
  5. virtio: Improve vq->broken access to avoid any compiler optimization

    Currently vq->broken field is read by virtqueue_is_broken() in busy
    loop in one context by virtnet_send_command().
    
    vq->broken is set to true in other process context by
    virtio_break_device(). Reader and writer are accessing it without any
    synchronization. This may lead to a compiler optimization which may
    result to optimize reading vq->broken only once.
    
    Hence, force reading vq->broken on each invocation of
    virtqueue_is_broken() and also force writing it so that such
    update is visible to the readers.
    
    It is a theoretical fix that isn't yet encountered in the field.
    
    Signed-off-by: Parav Pandit <parav@nvidia.com>
    Link: https://lore.kernel.org/r/20210721142648.1525924-2-parav@nvidia.com
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    paravmellanox authored and mstsirkin committed Aug 10, 2021

Commits on Aug 8, 2021

  1. Linux 5.14-rc5

    torvalds committed Aug 8, 2021
  2. Merge tag 'timers-urgent-2021-08-08' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull timer fix from Thomas Gleixner:
     "A single timer fix:
    
       - Prevent a memory ordering issue in the timer expiry code which
         makes it possible to observe falsely that the callback has been
         executed already while that's not the case, which violates the
         guarantee of del_timer_sync()"
    
    * tag 'timers-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      timers: Move clearing of base::timer_running under base:: Lock
    torvalds committed Aug 8, 2021
  3. Merge tag 'sched-urgent-2021-08-08' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull scheduler fix from Thomas Gleixner:
     "A single scheduler fix:
    
       - Prevent a double enqueue caused by rt_effective_prio() being
         invoked twice in __sched_setscheduler()"
    
    * tag 'sched-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/rt: Fix double enqueue caused by rt_effective_prio
    torvalds committed Aug 8, 2021
  4. Merge tag 'perf-urgent-2021-08-08' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/tip/tip
    
    Pull perf fixes from Thomas Gleixner:
     "A set of perf fixes:
    
       - Correct the permission checks for perf event which send SIGTRAP to
         a different process and clean up that code to be more readable.
    
       - Prevent an out of bound MSR access in the x86 perf code which
         happened due to an incomplete limiting to the actually available
         hardware counters.
    
       - Prevent access to the AMD64_EVENTSEL_HOSTONLY bit when running
         inside a guest.
    
       - Handle small core counter re-enabling correctly by issuing an ACK
         right before reenabling it to prevent a stale PEBS record being
         kept around"
    
    * tag 'perf-urgent-2021-08-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf/x86/intel: Apply mid ACK for small core
      perf/x86/amd: Don't touch the AMD64_EVENTSEL_HOSTONLY bit inside the guest
      perf/x86: Fix out of bound MSR access
      perf: Refactor permissions check into perf_check_permission()
      perf: Fix required permissions if sigtrap is requested
    torvalds committed Aug 8, 2021
  5. Merge tag 'char-misc-5.14-rc5' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/gregkh/char-misc
    
    Pull char/misc driver fixes from Greg KH:
     "Here are some small char/misc driver fixes for 5.14-rc5.
    
      They resolve a few regressions that people reported:
    
       - acrn driver fix
    
       - fpga driver fix
    
       - interconnect tiny driver fixes
    
      All have been in linux-next for a while with no reported issues"
    
    * tag 'char-misc-5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
      interconnect: Fix undersized devress_alloc allocation
      interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate
      interconnect: qcom: icc-rpmh: Ensure floor BW is enforced for all nodes
      fpga: dfl: fme: Fix cpu hotplug issue in performance reporting
      virt: acrn: Do hcall_destroy_vm() before resource release
      interconnect: Always call pre_aggregate before aggregate
      interconnect: Zero initial BW after sync-state
    torvalds committed Aug 8, 2021
Older