Xuan-Zhuo/virt…
Commits on Jun 16, 2021
-
virtio-net: xsk zero copy xmit kick by threshold
After testing, the performance of calling kick every time is not stable. And if all the packets are sent and kicked again, the performance is not good. So add a module parameter to specify how many packets are sent to call a kick. 8 is a relatively stable value with the best performance. Here is the pps of the test of xsk_kick_thr under different values (from 1 to 64). thr PPS thr PPS thr PPS 1 2924116.74247 | 23 3683263.04348 | 45 2777907.22963 2 3441010.57191 | 24 3078880.13043 | 46 2781376.21739 3 3636728.72378 | 25 2859219.57656 | 47 2777271.91304 4 3637518.61468 | 26 2851557.9593 | 48 2800320.56575 5 3651738.16251 | 27 2834783.54408 | 49 2813039.87599 6 3652176.69231 | 28 2847012.41472 | 50 3445143.01839 7 3665415.80602 | 29 2860633.91304 | 51 3666918.01281 8 3665045.16555 | 30 2857903.5786 | 52 3059929.2709 9 3671023.2401 | 31 2835589.98963 | 53 2831515.21739 10 3669532.23274 | 32 2862827.88706 | 54 3451804.07204 11 3666160.37749 | 33 2871855.96696 | 55 3654975.92385 12 3674951.44813 | 34 3434456.44816 | 56 3676198.3188 13 3667447.57331 | 35 3656918.54177 | 57 3684740.85619 14 3018846.0503 | 36 3596921.16722 | 58 3060958.8594 15 2792773.84505 | 37 3603460.63903 | 59 2828874.57191 16 3430596.3602 | 38 3595410.87666 | 60 3459926.11027 17 3660525.85806 | 39 3604250.17819 | 61 3685444.47599 18 3045627.69054 | 40 3596542.28428 | 62 3049959.0809 19 2841542.94177 | 41 3600705.16054 | 63 2806280.04013 20 2830475.97348 | 42 3019833.71191 | 64 3448494.3913 21 2845655.55789 | 43 2752951.93264 | 22 3450389.84365 | 44 2753107.27164 | It can be found that when the value of xsk_kick_thr is relatively small, the performance is not good, and when its value is greater than 13, the performance will be more irregular and unstable. It looks similar from 3 to 13, I chose 8 as the default value. The test environment is qemu + vhost-net. I modified vhost-net to drop the packets sent by vm directly, so that the cpu of vm can run higher. By default, the processes in the vm and the cpu of softirqd are too low, and there is no obvious difference in the test data. During the test, the cpu of softirq reached 100%. Each xsk_kick_thr was run for 300s, the pps of every second was recorded, and the average of the pps was finally taken. The vhost process cpu on the host has also reached 100%. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
-
virtio-net: xsk direct xmit inside xsk wakeup
Calling virtqueue_napi_schedule() in wakeup results in napi running on the current cpu. If the application is not busy, then there is no problem. But if the application itself is busy, it will cause a lot of scheduling. If the application is continuously sending data packets, due to the continuous scheduling between the application and napi, the data packet transmission will not be smooth, and there will be an obvious delay in the transmission (you can use tcpdump to see it). When pressing a channel to 100% (vhost reaches 100%), the cpu where the application is located reaches 100%. This patch sends a small amount of data directly in wakeup. The purpose of this is to trigger the tx interrupt. The tx interrupt will be awakened on the cpu of its affinity, and then trigger the operation of the napi mechanism, napi can continue to consume the xsk tx queue. Two cpus are running, cpu0 is running applications, cpu1 executes napi consumption data. The same is to press a channel to 100%, but the utilization rate of cpu0 is 12.7% and the utilization rate of cpu1 is 2.9%. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
virtio-net: support AF_XDP zc rx
Compared to the case of xsk tx, the case of xsk zc rx is more complicated. When we process the buf received by vq, we may encounter ordinary buffers, or xsk buffers. What makes the situation more complicated is that in the case of mergeable, when num_buffer > 1, we may still encounter the case where xsk buffer is mixed with ordinary buffer. Another thing that makes the situation more complicated is that when we get an xsk buffer from vq, the xsk bound to this xsk buffer may have been unbound. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
virtio-net: support AF_XDP zc tx
AF_XDP(xdp socket, xsk) is a high-performance packet receiving and sending technology. This patch implements the binding and unbinding operations of xsk and the virtio-net queue for xsk zero copy xmit. The xsk zero copy xmit depends on tx napi. Because the actual sending of data is done in the process of tx napi. If tx napi does not work, then the data of the xsk tx queue will not be sent. So if tx napi is not true, an error will be reported when bind xsk. If xsk is active, it will prevent ethtool from modifying tx napi. When reclaiming ptr, a new type of ptr is added, which is distinguished based on the last two digits of ptr: 00: skb 01: xdp frame 10: xsk xmit ptr All sent xsk packets share the virtio-net header of xsk_hdr. If xsk needs to support csum and other functions later, consider assigning xsk hdr separately for each sent packet. Different from other physical network cards, you can reinitialize the channel when you bind xsk. And vrtio does not support independent reset channel, you can only reset the entire device. I think it is not appropriate for us to directly reset the entire setting. So the situation becomes a bit more complicated. We have to consider how to deal with the buffer referenced in vq after xsk is unbind. I added the ring size struct virtnet_xsk_ctx when xsk been bind. Each xsk buffer added to vq corresponds to a ctx. This ctx is used to record the page where the xsk buffer is located, and add a page reference. When the buffer is recycling, reduce the reference to page. When xsk has been unbind, and all related xsk buffers have been recycled, release all ctx. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
-
virtio-net: move to virtio_net.h
Move some structure definitions and inline functions into the virtio_net.h file. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
virtio-net: independent directory
Create a separate directory for virtio-net. AF_XDP support will be added later, and a separate xsk.c file will be added. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
virtio-net: virtnet_poll_tx support budget check
virtnet_poll_tx() check the work done like other network card drivers. When work < budget, napi_poll() in dev.c will exit directly. And virtqueue_napi_complete() will be called to close napi. If closing napi fails or there is still data to be processed, virtqueue_napi_complete() will make napi schedule again, and no conflicts with the logic of napi_poll(). When work == budget, virtnet_poll_tx() will return the var 'work', and the napi_poll() in dev.c will re-add napi to the queue. The purpose of this patch is to support xsk xmit in virtio_poll_tx for subsequent patch. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
-
virtio-net: split the receive_mergeable function
receive_mergeable() is too complicated, so this function is split here. One is to make the function more readable. On the other hand, the two independent functions will be called separately in subsequent patches. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
virtio-net: standalone virtnet_aloc_frag function
This logic is used by small and merge when adding buf, and the subsequent patch will also use this logic, so it is separated as an independent function. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
virtio-net: unify the code for recycling the xmit ptr
Now there are two types of "skb" and "xdp frame" during recycling old xmit. There are two completely similar and independent implementations. This is inconvenient for the subsequent addition of new types. So extract a function from this piece of code and call this function uniformly to recover old xmit ptr. Rename free_old_xmit_skbs() to free_old_xmit(). Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
virtio: support virtqueue_detach_unused_buf_ctx
Supports returning ctx while recycling unused buf, which helps to release buf in different ways for different bufs. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
xsk: XDP_SETUP_XSK_POOL support option IFF_NOT_USE_DMA_ADDR
Some devices, such as virtio-net, do not directly use dma addr. These devices do not initialize dma after completing the xsk setup, so the dma check is skipped here. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
-
virtio-net: add priv_flags IFF_NOT_USE_DMA_ADDR
virtio-net not use dma addr directly. So add this priv_flags IFF_NOT_USE_DMA_ADDR. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
netdevice: add priv_flags IFF_NOT_USE_DMA_ADDR
Some driver devices, such as virtio-net, do not directly use dma addr. For upper-level frameworks such as xdp socket, that need to be aware of this. So add a new priv_flag IFF_NOT_USE_DMA_ADDR. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
netdevice: priv_flags extend to 64bit
The size of priv_flags is 32 bits, and the number of flags currently available has reached 32. It is time to expand the size of priv_flags to 64 bits. Here the priv_flags is modified to 8 bytes, but the size of struct net_device has not changed, it is still 2176 bytes. It is because _tx is aligned based on the cache line. But there is a 4-byte hole left here. Since the fields before and after priv_flags are read mostly, I did not adjust the order of the fields here. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
-
net: chelsio: cxgb4: use eth_zero_addr() to assign zero address
Using eth_zero_addr() to assign zero address insetad of inefficient copy from an array. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peng Li says: ==================== net: cosa: clean up some code style issues This patchset clean up some code style issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: remove redundant spaces
According to the chackpatch.pl, no spaces is necessary at the start of a line, no space is necessary after a cast. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: remove trailing whitespaces
This patch removes trailing whitespaces. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: add some required spaces
Add space required before the open parenthesis '(' and '{'. Add space required after that close brace '}' and ',' Add spaces required around that '=' , '&', '*', '|', '+', '/' and '-'. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> -
net: cosa: fix the code style issue about trailing statements
Trailing statements should be on next line. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: fix the alignment issue
Alignment should match open parenthesis. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
This patch uses the BIT macro for setting individual bits, to fix the following checkpatch.pl issue: CHECK: Prefer using the BIT macro. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: add necessary () to macro argument
Macro argument 'cosa' may be better as '(cosa)' to avoid precedence issues. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: remove redundant braces {}
This patch removes redundant braces {}, to fix the checkpatch.pl warning: "braces {} are not necessary for single statement blocks". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> -
net: cosa: add braces {} to all arms of the statement
Braces {} should be used on all arms of this statement. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> -
net: cosa: fix the comments style issue
Networking block comments don't use an empty /* line, use /* Comment... Block comments use * on subsequent lines. Block comments use a trailing */ on a separate line. This patch fixes the comments style issues. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: move out assignment in if condition
Should not use assignment in if condition. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: replace comparison to NULL with "!chan->rx_skb"
According to the chackpatch.pl, comparison to NULL could be written "!chan->rx_skb". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: fix the code style issue about "foo* bar"
Fix the checkpatch error as "foo* bar" should be "foo *bar". Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: add blank line after declarations
This patch fixes the checkpatch error about missing a blank line after declarations. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: cosa: remove redundant blank lines
This patch removes some redundant blank lines. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: iosm: add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this driver when it is built as an external module. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
qlcnic: Use list_for_each_entry() to simplify code in qlcnic_main.c
Convert list_for_each() to list_for_each_entry() where applicable. This simplifies the code. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
ethtool: add a stricter length check
There has been a few errors in the ethtool reply size calculations, most of those are hard to trigger during basic testing because of skb size rounding up and netdev names being shorter than max. Add a more precise check. This change will affect the value of payload length displayed in case of -EMSGSIZE but that should be okay, "payload length" isn't a well defined term here. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>