Skip to content
Permalink
Xuan-Zhuo/virt…
Switch branches/tags

Commits on Jun 16, 2021

  1. virtio-net: xsk zero copy xmit kick by threshold

    After testing, the performance of calling kick every time is not stable.
    And if all the packets are sent and kicked again, the performance is not
    good. So add a module parameter to specify how many packets are sent to
    call a kick.
    
    8 is a relatively stable value with the best performance.
    
    Here is the pps of the test of xsk_kick_thr under different values (from
    1 to 64).
    
    thr  PPS             thr PPS             thr PPS
    1    2924116.74247 | 23  3683263.04348 | 45  2777907.22963
    2    3441010.57191 | 24  3078880.13043 | 46  2781376.21739
    3    3636728.72378 | 25  2859219.57656 | 47  2777271.91304
    4    3637518.61468 | 26  2851557.9593  | 48  2800320.56575
    5    3651738.16251 | 27  2834783.54408 | 49  2813039.87599
    6    3652176.69231 | 28  2847012.41472 | 50  3445143.01839
    7    3665415.80602 | 29  2860633.91304 | 51  3666918.01281
    8    3665045.16555 | 30  2857903.5786  | 52  3059929.2709
    9    3671023.2401  | 31  2835589.98963 | 53  2831515.21739
    10   3669532.23274 | 32  2862827.88706 | 54  3451804.07204
    11   3666160.37749 | 33  2871855.96696 | 55  3654975.92385
    12   3674951.44813 | 34  3434456.44816 | 56  3676198.3188
    13   3667447.57331 | 35  3656918.54177 | 57  3684740.85619
    14   3018846.0503  | 36  3596921.16722 | 58  3060958.8594
    15   2792773.84505 | 37  3603460.63903 | 59  2828874.57191
    16   3430596.3602  | 38  3595410.87666 | 60  3459926.11027
    17   3660525.85806 | 39  3604250.17819 | 61  3685444.47599
    18   3045627.69054 | 40  3596542.28428 | 62  3049959.0809
    19   2841542.94177 | 41  3600705.16054 | 63  2806280.04013
    20   2830475.97348 | 42  3019833.71191 | 64  3448494.3913
    21   2845655.55789 | 43  2752951.93264 |
    22   3450389.84365 | 44  2753107.27164 |
    
    It can be found that when the value of xsk_kick_thr is relatively small,
    the performance is not good, and when its value is greater than 13, the
    performance will be more irregular and unstable. It looks similar from 3
    to 13, I chose 8 as the default value.
    
    The test environment is qemu + vhost-net. I modified vhost-net to drop
    the packets sent by vm directly, so that the cpu of vm can run higher.
    By default, the processes in the vm and the cpu of softirqd are too low,
    and there is no obvious difference in the test data.
    
    During the test, the cpu of softirq reached 100%. Each xsk_kick_thr was
    run for 300s, the pps of every second was recorded, and the average of
    the pps was finally taken. The vhost process cpu on the host has also
    reached 100%.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  2. virtio-net: xsk direct xmit inside xsk wakeup

    Calling virtqueue_napi_schedule() in wakeup results in napi running on
    the current cpu. If the application is not busy, then there is no
    problem. But if the application itself is busy, it will cause a lot of
    scheduling.
    
    If the application is continuously sending data packets, due to the
    continuous scheduling between the application and napi, the data packet
    transmission will not be smooth, and there will be an obvious delay in
    the transmission (you can use tcpdump to see it). When pressing a
    channel to 100% (vhost reaches 100%), the cpu where the application is
    located reaches 100%.
    
    This patch sends a small amount of data directly in wakeup. The purpose
    of this is to trigger the tx interrupt. The tx interrupt will be
    awakened on the cpu of its affinity, and then trigger the operation of
    the napi mechanism, napi can continue to consume the xsk tx queue. Two
    cpus are running, cpu0 is running applications, cpu1 executes
    napi consumption data. The same is to press a channel to 100%, but the
    utilization rate of cpu0 is 12.7% and the utilization rate of cpu1 is
    2.9%.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  3. virtio-net: support AF_XDP zc rx

    Compared to the case of xsk tx, the case of xsk zc rx is more
    complicated.
    
    When we process the buf received by vq, we may encounter ordinary
    buffers, or xsk buffers. What makes the situation more complicated is
    that in the case of mergeable, when num_buffer > 1, we may still
    encounter the case where xsk buffer is mixed with ordinary buffer.
    
    Another thing that makes the situation more complicated is that when we
    get an xsk buffer from vq, the xsk bound to this xsk buffer may have
    been unbound.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  4. virtio-net: support AF_XDP zc tx

    AF_XDP(xdp socket, xsk) is a high-performance packet receiving and
    sending technology.
    
    This patch implements the binding and unbinding operations of xsk and
    the virtio-net queue for xsk zero copy xmit.
    
    The xsk zero copy xmit depends on tx napi. Because the actual sending
    of data is done in the process of tx napi. If tx napi does not
    work, then the data of the xsk tx queue will not be sent.
    So if tx napi is not true, an error will be reported when bind xsk.
    
    If xsk is active, it will prevent ethtool from modifying tx napi.
    
    When reclaiming ptr, a new type of ptr is added, which is distinguished
    based on the last two digits of ptr:
    00: skb
    01: xdp frame
    10: xsk xmit ptr
    
    All sent xsk packets share the virtio-net header of xsk_hdr. If xsk
    needs to support csum and other functions later, consider assigning xsk
    hdr separately for each sent packet.
    
    Different from other physical network cards, you can reinitialize the
    channel when you bind xsk. And vrtio does not support independent reset
    channel, you can only reset the entire device. I think it is not
    appropriate for us to directly reset the entire setting. So the
    situation becomes a bit more complicated. We have to consider how
    to deal with the buffer referenced in vq after xsk is unbind.
    
    I added the ring size struct virtnet_xsk_ctx when xsk been bind. Each xsk
    buffer added to vq corresponds to a ctx. This ctx is used to record the
    page where the xsk buffer is located, and add a page reference. When the
    buffer is recycling, reduce the reference to page. When xsk has been
    unbind, and all related xsk buffers have been recycled, release all ctx.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  5. virtio-net: move to virtio_net.h

    Move some structure definitions and inline functions into the
    virtio_net.h file.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  6. virtio-net: independent directory

    Create a separate directory for virtio-net. AF_XDP support will be added
    later, and a separate xsk.c file will be added.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  7. virtio-net: virtnet_poll_tx support budget check

    virtnet_poll_tx() check the work done like other network card drivers.
    
    When work < budget, napi_poll() in dev.c will exit directly. And
    virtqueue_napi_complete() will be called to close napi. If closing napi
    fails or there is still data to be processed, virtqueue_napi_complete()
    will make napi schedule again, and no conflicts with the logic of
    napi_poll().
    
    When work == budget, virtnet_poll_tx() will return the var 'work', and
    the napi_poll() in dev.c will re-add napi to the queue.
    
    The purpose of this patch is to support xsk xmit in virtio_poll_tx for
    subsequent patch.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  8. virtio-net: split the receive_mergeable function

    receive_mergeable() is too complicated, so this function is split here.
    One is to make the function more readable. On the other hand, the two
    independent functions will be called separately in subsequent patches.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  9. virtio-net: standalone virtnet_aloc_frag function

    This logic is used by small and merge when adding buf, and the
    subsequent patch will also use this logic, so it is separated as an
    independent function.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  10. virtio-net: unify the code for recycling the xmit ptr

    Now there are two types of "skb" and "xdp frame" during recycling old
    xmit.
    
    There are two completely similar and independent implementations. This
    is inconvenient for the subsequent addition of new types. So extract a
    function from this piece of code and call this function uniformly to
    recover old xmit ptr.
    
    Rename free_old_xmit_skbs() to free_old_xmit().
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  11. virtio: support virtqueue_detach_unused_buf_ctx

    Supports returning ctx while recycling unused buf, which helps to
    release buf in different ways for different bufs.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  12. xsk: XDP_SETUP_XSK_POOL support option IFF_NOT_USE_DMA_ADDR

    Some devices, such as virtio-net, do not directly use dma addr. These
    devices do not initialize dma after completing the xsk setup, so the dma
    check is skipped here.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
    Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  13. virtio-net: add priv_flags IFF_NOT_USE_DMA_ADDR

    virtio-net not use dma addr directly. So add this priv_flags
    IFF_NOT_USE_DMA_ADDR.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  14. netdevice: add priv_flags IFF_NOT_USE_DMA_ADDR

    Some driver devices, such as virtio-net, do not directly use dma addr.
    For upper-level frameworks such as xdp socket, that need to be aware of
    this. So add a new priv_flag IFF_NOT_USE_DMA_ADDR.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  15. netdevice: priv_flags extend to 64bit

    The size of priv_flags is 32 bits, and the number of flags currently
    available has reached 32. It is time to expand the size of priv_flags to
    64 bits.
    
    Here the priv_flags is modified to 8 bytes, but the size of struct
    net_device has not changed, it is still 2176 bytes. It is because _tx is
    aligned based on the cache line. But there is a 4-byte hole left here.
    
    Since the fields before and after priv_flags are read mostly, I did not
    adjust the order of the fields here.
    
    Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
    fengidri authored and intel-lab-lkp committed Jun 16, 2021
  16. net: chelsio: cxgb4: use eth_zero_addr() to assign zero address

    Using eth_zero_addr() to assign zero address insetad of
    inefficient copy from an array.
    
    Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Yang Yingliang authored and davem330 committed Jun 16, 2021
  17. Merge branch 'cosa-cleanups'

    Peng Li says:
    
    ====================
    net: cosa: clean up some code style issues
    
    This patchset clean up some code style issues.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jun 16, 2021
  18. net: cosa: remove redundant spaces

    According to the chackpatch.pl,
    no spaces is necessary at the start of a line,
    no space is necessary after a cast.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  19. net: cosa: remove trailing whitespaces

    This patch removes trailing whitespaces.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  20. net: cosa: add some required spaces

    Add space required before the open parenthesis '(' and '{'.
    Add space required after that close brace '}' and ','
    Add spaces required around that '=' , '&', '*', '|', '+', '/' and '-'.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  21. net: cosa: fix the code style issue about trailing statements

    Trailing statements should be on next line.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  22. net: cosa: fix the alignment issue

    Alignment should match open parenthesis.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  23. net: cosa: use BIT macro

    This patch uses the BIT macro for setting individual bits,
    to fix the following checkpatch.pl issue:
    CHECK: Prefer using the BIT macro.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  24. net: cosa: add necessary () to macro argument

    Macro argument 'cosa' may be better as '(cosa)' to avoid
    precedence issues.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  25. net: cosa: remove redundant braces {}

    This patch removes redundant braces {}, to fix the
    checkpatch.pl warning:
    "braces {} are not necessary for single statement blocks".
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  26. net: cosa: add braces {} to all arms of the statement

    Braces {} should be used on all arms of this statement.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  27. net: cosa: fix the comments style issue

    Networking block comments don't use an empty /* line,
    use /* Comment...
    
    Block comments use * on subsequent lines.
    Block comments use a trailing */ on a separate line.
    
    This patch fixes the comments style issues.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  28. net: cosa: move out assignment in if condition

    Should not use assignment in if condition.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  29. net: cosa: replace comparison to NULL with "!chan->rx_skb"

    According to the chackpatch.pl, comparison to NULL could
    be written "!chan->rx_skb".
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  30. net: cosa: fix the code style issue about "foo* bar"

    Fix the checkpatch error as "foo* bar" should be "foo *bar".
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  31. net: cosa: add blank line after declarations

    This patch fixes the checkpatch error about missing a blank line
    after declarations.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  32. net: cosa: remove redundant blank lines

    This patch removes some redundant blank lines.
    
    Signed-off-by: Peng Li <lipeng321@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    321lipeng authored and davem330 committed Jun 16, 2021
  33. net: iosm: add missing MODULE_DEVICE_TABLE

    This patch adds missing MODULE_DEVICE_TABLE definition which generates
    correct modalias for automatic loading of this driver when it is built
    as an external module.
    
    Reported-by: Hulk Robot <hulkci@huawei.com>
    Signed-off-by: Zou Wei <zou_wei@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    SamuelZOU authored and davem330 committed Jun 16, 2021
  34. qlcnic: Use list_for_each_entry() to simplify code in qlcnic_main.c

    Convert list_for_each() to list_for_each_entry() where
    applicable. This simplifies the code.
    
    Reported-by: Hulk Robot <hulkci@huawei.com>
    Signed-off-by: Wang Hai <wanghai38@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Wang Hai authored and davem330 committed Jun 16, 2021
  35. ethtool: add a stricter length check

    There has been a few errors in the ethtool reply size calculations,
    most of those are hard to trigger during basic testing because of
    skb size rounding up and netdev names being shorter than max.
    Add a more precise check.
    
    This change will affect the value of payload length displayed in
    case of -EMSGSIZE but that should be okay, "payload length" isn't
    a well defined term here.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Jakub Kicinski authored and davem330 committed Jun 16, 2021
Older