Ong-Boon-Leong…
Commits on Apr 12, 2021
-
net: stmmac: Add TX via XDP zero-copy socket
We add the support of XDP ZC TX submission and cleaning into stmmac_tx_clean(). The function is made to clean as many TX complete frames as possible, i.e. limit by priv->dma_tx_size instead of NAPI budget. For TX ring that is associated with XSK pool, the function stmmac_xdp_xmit_zc() is introduced to TX frame buffers from XSK pool by using xsk_tx_peek_desc(). To make stmmac_tx_clean() support the cleaning of XSK TX frames, STMMAC_TXBUF_T_XSK_TX TX buffer type is introduced. As stmmac_tx_clean() uses the return value to cue whether NAPI function should continue to poll, we augment the caller of stmmac_tx_clean() to pass NAPI budget instead of priv->dma_tx_size through 'budget' input and made stmmac_tx_clean() to always clean up-to the TX ring size instead. This allows us to use the return boolean status of stmmac_xdp_xmit_zc() to decide if XSK TX work is done or not: If true, set 'xmits' to return 'budget - 1' so that NAPI poll may exit. Else, set 'xmits' to return 'budget' to make NAPI poll continue to poll since XSK TX work is not done. Finally, at the end of stmmac_tx_clean(), the function now take a maximum value between 'count' and 'xmits' so that status from both TX cleaning and XSK TX (only for XDP ZC) is considered. This patch adds a new NAPI poll called stmmac_napi_poll_rxtx() that is meant to be enabled/disabled for RX and TX ring that are bound to XSK pool. This NAPI poll function starts with cleaning TX ring, then submits XSK TX frames to TX ring before proceed to perform RX operations, i.e. , receiving RX frames and replenishing RX ring with RX free buffers obtained from XSK pool. Therefore, during XSK RX and TX setup, the driver enables stmmac_napi_poll_rxtx() for RX and TX operations, then during XSK RX and TX pool tear-down, the driver reenables the exisiting independent NAPI poll functions accordingly: stmmac_napi_poll_rx() and stmmac_napi_poll_tx(). Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
-
net: stmmac: Enable RX via AF_XDP zero-copy
This patch adds the support for receiving packet via AF_XDP zero-copy mechanism. XDP ZC uses 1:1 mapping of XDP buffer to receive packet, therefore the use of split header is not used currently. The 'xdp_buff' is declared as union together with a struct that contains 'page', 'addr' and 'page_offset' that are associated with primary buffer. RX buffers are now allocated either via page_pool or xsk pool. For RX buffers from xsk_pool they are allocated and deallocated using below functions: * stmmac_alloc_rx_buffers_zc(struct stmmac_priv *priv, u32 queue) * dma_free_rx_xskbufs(struct stmmac_priv *priv, u32 queue) With above functions now available, we then extend the following driver functions to support XDP ZC: * stmmac_reinit_rx_buffers() * __init_dma_rx_desc_rings() * init_dma_rx_desc_rings() * __free_dma_rx_desc_resources() Note: stmmac_alloc_rx_buffers_zc() may return -ENOMEM due to RX XDP buffer pool is not allocated (e.g. samples/bpf/xdpsock TX-only). But, it is still ok to let TX XDP ZC to continue, therefore, the -ENOMEM is silently ignored to let the driver succcessfully transition to XDP ZC mode for the said RX and TX queue. As XDP ZC buffer size is different, the DMA buffer size is required to be reprogrammed accordingly for RX DMA/Queue that is populated with XDP buffer from XSK pool. Next, to add or remove per-queue XSK pool, stmmac_xdp_setup_pool() will call stmmac_xdp_enable_pool() or stmmac_xdp_disable_pool() that in-turn coordinates the tearing down and setting up RX ring via RX buffers and descriptors removal and reallocation through stmmac_disable_rx_queue() and stmmac_enable_rx_queue(). In addition, stmmac_xsk_wakeup() is added to initiate XDP RX buffer replenishing by signalling user application to add available XDP frames back to FILL queue. For RX processing using XDP zero-copy buffer, stmmac_rx_zc() is introduced which is implemented with the assumption that RX split header is disabled. For XDP verdict is XDP_PASS, the XDP buffer is copied into a sk_buff allocated through stmmac_construct_skb_zc() and sent to Linux network GRO inside stmmac_dispatch_skb_zc(). Free RX buffers are then replenished using stmmac_rx_refill_zc() Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
-
net: stmmac: Refactor __stmmac_xdp_run_prog for XDP ZC
Prepare stmmac_xdp_run_prog() for AF_XDP zero-copy support which will be added by upcoming patches by splitting out the XDP verdict processing into __stmmac_xdp_run_prog() and it callable for XDP ZC path which does not need to verify bpf_prog is not NULL. The stmmac_xdp_run_prog() is used for regular XDP Rx path which requires bpf_prog to be verified. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
-
net: stmmac: rearrange RX and TX desc init into per-queue basis
Below functions are made to be per-queue in preparation of XDP ZC: __init_dma_rx_desc_rings(struct stmmac_priv *priv, u32 queue, gfp_t flags) __init_dma_tx_desc_rings(struct stmmac_priv *priv, u32 queue) The original functions below are stay maintained for all queue usage: init_dma_rx_desc_rings(struct net_device *dev, gfp_t flags) init_dma_tx_desc_rings(struct net_device *dev) Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
-
net: stmmac: refactor stmmac_init_rx_buffers for stmmac_reinit_rx_buf…
…fers The per-queue RX buffer allocation in stmmac_reinit_rx_buffers() can be made to use stmmac_alloc_rx_buffers() by merging the page_pool alloc checks for "buf->page" and "buf->sec_page" in stmmac_init_rx_buffers(). This is in preparation for XSK pool allocation later. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
-
net: stmmac: introduce dma_recycle_rx_skbufs for stmmac_reinit_rx_buf…
…fers Rearrange RX buffer page_pool recycling logics into dma_recycle_rx_skbufs, so that we prepare stmmac_reinit_rx_buffers() for XSK pool expansion. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
-
net: stmmac: rearrange RX buffer allocation and free functions
This patch restructures the per RX queue buffer allocation from page_pool to stmmac_alloc_rx_buffers(). We also rearrange dma_free_rx_skbufs() so that it can be used in init_dma_rx_desc_rings() during freeing of RX buffer in the event of page_pool allocation failure to replace the more efficient method earlier. The replacement is needed to make the RX buffer alloc and free method scalable to XDP ZC xsk_pool alloc and free later. Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Commits on Apr 11, 2021
-
Alex Elder says: ==================== net: ipa: support two more platforms This series adds IPA support for two more Qualcomm SoCs. The first patch updates the DT binding to add compatible strings. The second temporarily disables checksum offload support for IPA version 4.5 and above. Changes are required to the RMNet driver to support the "inline" checksum offload used for IPA v4.5+, and once those are present this capability will be enabled for IPA. The third and fourth patches add configuration data for IPA versions 4.5 (used for the SDX55 SoC) and 4.11 (used for the SD7280 SoC). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: ipa: add IPA v4.11 configuration data
Add support for the SC7280 SoC, which includes IPA version 4.11. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: ipa: add IPA v4.5 configuration data
Add support for the SDX55 SoC, which includes IPA version 4.5. Starting with IPA v4.5, a few of the memory regions have a different number of "canary" values; update comments in the where the region identifers are defined to accurately reflect that. I'll note three differences in SDX55 versus the other two existing platforms (SDM845 and SC7180): - SDX55 uses a 32-bit Linux kernel - SDX55 has four interconnects rather than three - SDX55 uses IPA v4.5, which uses inline checksum offload Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: ipa: disable checksum offload for IPA v4.5+
Checksum offload for IPA v4.5+ is implemented differently, using "inline" offload (which uses a common header format for both upload and download offload). The IPA hardware must be programmed to enable MAP checksum offload, but the RMNet driver is responsible for interpreting checksum metadata supplied with messages. Currently, the RMNet driver does not support inline checksum offload. This support is imminent, but until it is available, do not allow newer versions of IPA to specify checksum offload for endpoints. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
dt-bindings: net: qcom,ipa: add some compatible strings
Add existing supported platform "qcom,sc7180-ipa" to the set of IPA compatible strings. Also add newly-supported "qcom,sdx55-ipa", "qcom,sc7280-ipa". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
ehea: add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this driver when it is built as an external module. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Qiheng Lin <linqiheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Paolo Abeni says: ==================== veth: allow GRO even without XDP This series allows the user-space to enable GRO/NAPI on a veth device even without attaching an XDP program. It does not change the default veth behavior (no NAPI, no GRO), except that the GRO feature bit on top of this series will be effectively off by default on veth devices. Note that currently the GRO bit is on by default, but GRO never takes place in absence of XDP. On top of this series, setting the GRO feature bit enables NAPI and allows the GRO to take place. The TSO features on the peer device are preserved. The main goal is improving UDP forwarding performances for containers in a typical virtual network setup: (container) veth -> veth peer -> bridge/ovs -> vxlan -> NIC Enabling the NAPI threaded mode, GRO the NETIF_F_GRO_UDP_FWD feature on the veth peer improves the UDP stream performance with not void netfilter configuration by 2x factor with no measurable overhead for TCP traffic: some heuristic ensures that TCP will not go through the additional NAPI/GRO layer. Some self-tests are added to check the expected behavior in the default configuration, with XDP and with plain GRO enabled. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Add some basic veth tests, that verify the expected flags and aggregation with different setups (default, xdp, etc...) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
After the previous patch, when enabling GRO, locally generated TCP traffic experiences some measurable overhead, as it traverses the GRO engine without any chance of aggregation. This change refine the NAPI receive path admission test, to avoid unnecessary GRO overhead in most scenarios, when GRO is enabled on a veth peer. Only skbs that are eligible for aggregation enter the GRO layer, the others will go through the traditional receive path. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
veth: allow enabling NAPI even without XDP
Currently the veth device has the GRO feature bit set, even if no GRO aggregation is possible with the default configuration, as the veth device does not hook into the GRO engine. Flipping the GRO feature bit from user-space is a no-op, unless XDP is enabled. In such scenario GRO could actually take place, but TSO is forced to off on the peer device. This change allow user-space to really control the GRO feature, with no need for an XDP program. The GRO feature bit is now cleared by default - so that there are no user-visible behavior changes with the default configuration. When the GRO bit is set, the per-queue NAPI instances are initialized and registered. On xmit, when napi instances are available, we try to use them. Some additional checks are in place to ensure we initialize/delete NAPIs only when needed in case of overlapping XDP and GRO configuration changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
veth: use skb_orphan_partial instead of skb_orphan
As described by commit 9c4c325 ("skbuff: preserve sock reference when scrubbing the skb."), orphaning a skb in the TX path will cause OoO. Let's use skb_orphan_partial() instead of skb_orphan(), so that we keep the sk around for queue's selection sake and we still avoid the problem fixed with commit 4bf9ffa ("veth: Orphan skb before GRO") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Moshe Shemesh says: ==================== ethtool: Extend module EEPROM dump API Ethtool supports module EEPROM dumps via the `ethtool -m <dev>` command. But in current state its functionality is limited - offset and length parameters, which are used to specify a linear desired region of EEPROM data to dump, is not enough, considering emergence of complex module EEPROM layouts such as CMIS 4.0. Moreover, CMIS 4.0 extends the amount of pages that may be accessible by introducing another parameter for page addressing - banks. Besides, currently module EEPROM is represented as a chunk of concatenated pages, where lower 128 bytes of all pages, except page 00h, are omitted. Offset and length are used to address parts of this fake linear memory. But in practice drivers, which implement get_module_info() and get_module_eeprom() ethtool ops still calculate page number and set I2C address on their own. This series tackles these issues by adding ethtool op, which allows to pass page number, bank number and I2C address in addition to offset and length parameters to the driver, adds corresponding netlink infrastructure and implements the new interface in mlx5 driver. This allows to extend userspace 'ethtool -m' CLI by adding new parameters - page, bank and i2c. New command line format: ethtool -m <dev> [hex on|off] [raw on|off] [offset N] [length N] [page N] [bank N] [i2c N] The consequence of this series is a possibility to dump arbitrary EEPROM page at a time, in contrast to dumps of concatenated pages. Therefore, offset and length change their semantics and may be used only to specify a part of data within half page boundary, which size is currently limited to 128 bytes. As for drivers that support legacy get_module_info() and get_module_eeprom() pair, the series addresses it by implementing a fallback mechanism. As mentioned earlier, such drivers derive a page number from 'global' offset, so this can be done vice versa without their involvement thanks to standardization. If kernel netlink handler of 'ethtool -m' command detects that new ethtool op is not supported by the driver, it calculates offset from given page number and page offset and calls old ndos, if they are available. ==================== \Signed-off-by: David S. Miller <davem@davemloft.net>
-
ethtool: wire in generic SFP module access
If the device has a sfp bus attached, call its sfp_get_module_eeprom_by_page() function, otherwise use the ethtool op for the device. This follows how the IOCTL works. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
phy: sfp: add netlink SFP support to generic SFP code
The new netlink API for reading SFP data requires a new op to be implemented. The idea of the new netlink SFP code is that userspace is responsible to parsing the EEPROM data and requesting pages, rather than have the kernel decide what pages are interesting and returning them. This allows greater flexibility for newer formats. Currently the generic SFP code only supports simple SFPs. Allow i2c address 0x50 and 0x51 to be accessed with page and bank must always be 0. This interface will later be extended when for example QSFP support is added. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
ethtool: Add fallback to get_module_eeprom from netlink command
In case netlink get_module_eeprom_by_page() callback is not implemented by the driver, try to call old get_module_info() and get_module_eeprom() pair. Recalculate parameters to get_module_eeprom() offset and len using page number and their sizes. Return error if this can't be done. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: ethtool: Export helpers for getting EEPROM info
There are two ways to retrieve information from SFP EEPROMs. Many devices make use of the common code, and assign the sfp_bus pointer in the netdev to point to the bus holding the SFP device. Some MAC drivers directly implement ops in there ethool structure. Export within net/ethtool the two helpers used to call these methods, so that they can also be used in the new netlink code. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net/mlx5: Add support for DSFP module EEPROM dumps
Allow the driver to recognise DSFP transceiver module ID and therefore allow its EEPROM dumps using ethtool. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net/mlx5: Implement get_module_eeprom_by_page()
Implement ethtool_ops::get_module_eeprom_by_page() to enable support of new SFP standards. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net/mlx5: Refactor module EEPROM query
Prepare for ethtool_ops::get_module_eeprom_data() implementation by extracting common part of mlx5_query_module_eeprom() into a separate function. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
ethtool: Allow network drivers to dump arbitrary EEPROM data
Define get_module_eeprom_by_page() ethtool callback and implement netlink infrastructure. get_module_eeprom_by_page() allows network drivers to dump a part of module's EEPROM specified by page and bank numbers along with offset and length. It is effectively a netlink replacement for get_module_info() and get_module_eeprom() pair, which is needed due to emergence of complex non-linear EEPROM layouts. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commits on Apr 10, 2021
-
Merge branch 'net-ipa-a-few-small-fixes'
Alex Elder says: ==================== net: ipa: a few small fixes This series implements some minor bug fixes or improvements. The first patch removes an apparently unnecessary restriction, which results in an error on a 32-bit ARM build. The second makes a definition used for SDM845 match what is used in the downstream code. The third just ensures two netdev pointers are only non-null when valid. The fourth simplifies a little code, knowing that a called function never returns an error. The fifth and sixth just remove some empty/place holder functions. And the last patch fixes a comment, makes a function private, and removes an unnecessary double-negation of a Boolean variable. This patch produces a warning from checkpatch, indicating that a pair of parentheses is unnecessary. I agree with that advice, but it conflicts with a suggestion from the compiler. I left the "problem" in place to avoid the compiler warning. ==================== Link: https://lore.kernel.org/r/20210409180722.1176868-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski committedApr 10, 2021 -
Some time ago changes were made to stop referring to clearing the hardware pipeline as a "tag process." Fix a comment to use the newer terminology. Get rid of a pointless double-negation of the Boolean toward_ipa flag in ipa_endpoint_config(). make ipa_endpoint_exit_one() private; it's only referenced inside "ipa_endpoint.c". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: ipa: get rid of empty GSI functions
There are place holder functions in the GSI code that do nothing. Remove these, knowing we can add something back in their place if they're really needed someday. Some of these are inverse functions (such as teardown to match setup). Explicitly comment that there is no inverse in these cases. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: ipa: get rid of empty IPA functions
There are place holder functions in the IPA code that do nothing. For the most part these are inverse functions, for example, once the routing or filter tables are set up there is no need to perform any matching teardown activity at shutdown, or in the case of an error. These can be safely removed, resulting in some code simplification. Add comments in these spots making it explicit that there is no inverse. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: ipa: ipa_stop() does not return an error
In ipa_modem_stop(), if the modem netdev pointer is non-null we call ipa_stop(). We check for an error and if one is returned we handle it. But ipa_stop() never returns an error, so this extra handling is unnecessary. Simplify the code in ipa_modem_stop() based on the knowledge no error handling is needed at this spot. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: ipa: only set endpoint netdev pointer when in use
In ipa_modem_start(), we set endpoint netdev pointers before the network device is registered. If registration fails, we don't undo those assignments. Instead, wait to assign the netdev pointer until after registration succeeds. Set these endpoint netdev pointers to NULL in ipa_modem_stop() before unregistering the network device. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: ipa: update sequence type for modem TX endpoint
On IPA v3.5.1, the sequencer type for the modem TX endpoint does not define the replication portion in the same way the downstream code does. This difference doesn't affect the behavior of the upstream code, but I'd prefer the two code bases use the same configuration value here. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: ipa: relax pool entry size requirement
I no longer know why a validation check ensured the size of an entry passed to gsi_trans_pool_init() was restricted to be a multiple of 8. For 32-bit builds, this condition doesn't always hold, and for DMA pools, the size is rounded up to a power of 2 anyway. Remove this restriction. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>