Alex-Elder/net…
Commits on Mar 15, 2021
-
net: qualcomm: rmnet: don't use C bit-fields in rmnet checksum header
Replace the use of C bit-fields in the rmnet_map_ul_csum_header structure with a single two-byte (big endian) structure member, and use masks to encode or get values within it. The content of these fields can be accessed using simple bitwise AND and OR operations on the (host byte order) value of the new structure member. Previously rmnet_map_ipv4_ul_csum_header() would update C bit-field values in host byte order, then forcibly fix their byte order using a combination of byte swap operations and types. Instead, just compute the value that needs to go into the new structure member and save it with a simple byte-order conversion. Make similar simplifications in rmnet_map_ipv6_ul_csum_header(). Finally, in rmnet_map_checksum_uplink_packet() a set of assignments zeroes every field in the upload checksum header. Replace that with a single memset() operation. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> -- v5: - Assign checksum start offset a little later (with csum_info) v4: - Don't use u16_get_bits() to access the checksum field offset v3: - Use BIT(x) and don't use u16_get_bits() for single-bit flags v2: - Fixed to use u16_encode_bits() instead of be16_encode_bits() .../ethernet/qualcomm/rmnet/rmnet_map_data.c | 38 +++++++------------ include/linux/if_rmnet.h | 21 +++++----- 2 files changed, 23 insertions(+), 36 deletions(-) iff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c ndex c336c17e01fe4..0ac2ff828320c 100644 -- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c ++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c @ -197,20 +197,16 @@ rmnet_map_ipv4_ul_csum_header(void *iphdr, struct rmnet_map_ul_csum_header *ul_header, struct sk_buff *skb) { __be16 *hdr = (__be16 *)ul_header; struct iphdr *ip4h = iphdr; u16 val; ul_header->csum_start_offset = htons(skb_network_header_len(skb)); ul_header->csum_insert_offset = skb->csum_offset; ul_header->csum_enabled = 1; val = MAP_CSUM_UL_ENABLED_FLAG; if (ip4h->protocol == IPPROTO_UDP) ul_header->udp_ind = 1; else ul_header->udp_ind = 0; val |= MAP_CSUM_UL_UDP_FLAG; val |= skb->csum_offset & MAP_CSUM_UL_OFFSET_MASK; - /* Changing remaining fields to network order */ - hdr++; - *hdr = htons((__force u16)*hdr); + ul_header->csum_start_offset = htons(skb_network_header_len(skb)); + ul_header->csum_info = htons(val); skb->ip_summed = CHECKSUM_NONE; @@ -237,21 +233,16 @@ rmnet_map_ipv6_ul_csum_header(void *ip6hdr, struct rmnet_map_ul_csum_header *ul_header, struct sk_buff *skb) { - __be16 *hdr = (__be16 *)ul_header; struct ipv6hdr *ip6h = ip6hdr; + u16 val; - ul_header->csum_start_offset = htons(skb_network_header_len(skb)); - ul_header->csum_insert_offset = skb->csum_offset; - ul_header->csum_enabled = 1; - + val = MAP_CSUM_UL_ENABLED_FLAG; if (ip6h->nexthdr == IPPROTO_UDP) - ul_header->udp_ind = 1; - else - ul_header->udp_ind = 0; + val |= MAP_CSUM_UL_UDP_FLAG; + val |= skb->csum_offset & MAP_CSUM_UL_OFFSET_MASK; - /* Changing remaining fields to network order */ - hdr++; - *hdr = htons((__force u16)*hdr); + ul_header->csum_start_offset = htons(skb_network_header_len(skb)); + ul_header->csum_info = htons(val); skb->ip_summed = CHECKSUM_NONE; @@ -419,10 +410,7 @@ void rmnet_map_checksum_uplink_packet(struct sk_buff *skb, } sw_csum: - ul_header->csum_start_offset = 0; - ul_header->csum_insert_offset = 0; - ul_header->csum_enabled = 0; - ul_header->udp_ind = 0; + memset(ul_header, 0, sizeof(*ul_header)); priv->stats.csum_sw++; } -
net: qualcomm: rmnet: don't use C bit-fields in rmnet checksum trailer
Replace the use of C bit-fields in the rmnet_map_dl_csum_trailer structure with a single one-byte field, using constant field masks to encode or get at embedded values. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
-
net: qualcomm: rmnet: use masks instead of C bit-fields
The actual layout of bits defined in C bit-fields (e.g. int foo : 3) is implementation-defined. Structures defined in <linux/if_rmnet.h> address this by specifying all bit-fields twice, to cover two possible layouts. I think this pattern is repetitive and noisy, and I find the whole notion of compiler "bitfield endianness" to be non-intuitive. Stop using C bit-fields for the command/data flag and the pad length fields in the rmnet_map structure, and define a single-byte flags field instead. Define a mask for the single-bit "command" flag, and another mask for the encoded pad length. The content of both fields can be accessed using a simple bitwise AND operation. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
-
net: qualcomm: rmnet: kill RMNET_MAP_GET_*() accessor macros
The following macros, defined in "rmnet_map.h", assume a socket buffer is provided as an argument without any real indication this is the case. RMNET_MAP_GET_MUX_ID() RMNET_MAP_GET_CD_BIT() RMNET_MAP_GET_PAD() RMNET_MAP_GET_CMD_START() RMNET_MAP_GET_LENGTH() What they hide is pretty trivial accessing of fields in a structure, and it's much clearer to see this if we do these accesses directly. So rather than using these accessor macros, assign a local variable of the map header pointer type to the socket buffer data pointer, and derereference that pointer variable. In "rmnet_map_data.c", use sizeof(object) rather than sizeof(type) in one spot. Also, there's no need to byte swap 0; it's all zeros irrespective of endianness. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> -
net: qualcomm: rmnet: simplify some byte order logic
In rmnet_map_ipv4_ul_csum_header() and rmnet_map_ipv6_ul_csum_header() the offset within a packet at which checksumming should commence is calculated. This calculation involves byte swapping and a forced type conversion that makes it hard to understand. Simplify this by computing the offset in host byte order, then converting the result when assigning it into the header field. Signed-off-by: Alex Elder <elder@linaro.org>
-
net: qualcomm: rmnet: mark trailer field endianness
The fields in the checksum trailer structure used for QMAP protocol RX packets are all big-endian format, so define them that way. It turns out these fields are never actually used by the RMNet code. The start offset is always assumed to be zero, and the length is taken from the other packet headers. So making these fields explicitly big endian has no effect on the behavior of the code. Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Commits on Mar 14, 2021
-
Merge branch 'psample-Add-additional-metadata-attributes'
Ido Schimmel says: ==================== psample: Add additional metadata attributes This series extends the psample module to expose additional metadata to user space for packets sampled via act_sample. The new metadata (e.g., transit delay) can then be consumed by applications such as hsflowd [1] for better network observability. netdevsim is extended with a dummy psample implementation that periodically reports "sampled" packets to the psample module. In addition to testing of the psample module, it enables the development and demonstration of user space applications (e.g., hsflowd) that are interested in the new metadata even without access to specialized hardware (e.g., Spectrum ASIC) that can provide it. mlxsw is also extended to provide the new metadata to psample. A Wireshark dissector for psample netlink packets [2] will be submitted upstream after the kernel patches are accepted. In addition, a libpcap capture module for psample is currently in the works. Eventually, users should be able to run: # tshark -i psample In order to consume sampled packets along with their metadata. Series overview: Patch #1 makes it easier to extend the metadata provided to psample Patch #2 adds the new metadata attributes to psample Patch #3 extends netdevsim to periodically report "sampled" packets to psample. Various debugfs knobs are added to control the reporting Patch #4 adds a selftest over netdevsim Patches #5-torvalds#10 gradually add support for the new metadata in mlxsw Patch torvalds#11 adds a selftest over mlxsw [1] https://sflow.org/draft4_sflow_transit.txt [2] https://gitlab.com/amitcohen1/wireshark/-/commit/3d711143024e032aef1b056dd23f0266c54fab56 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
selftests: mlxsw: Add tc sample tests
Test that packets are sampled when tc-sample is used and that reported metadata is correct. Two sets of hosts (with and without LAG) are used, since metadata extraction in mlxsw is a bit different when LAG is involved. # ./tc_sample.sh TEST: tc sample rate (forward) [ OK ] TEST: tc sample rate (local receive) [ OK ] TEST: tc sample maximum rate [ OK ] TEST: tc sample group conflict test [ OK ] TEST: tc sample iif [ OK ] TEST: tc sample lag iif [ OK ] TEST: tc sample oif [ OK ] TEST: tc sample lag oif [ OK ] TEST: tc sample out-tc [ OK ] TEST: tc sample out-tc-occ [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
mlxsw: spectrum: Report extra metadata to psample module
Make use of the previously added metadata and report it to the psample module. The metadata is read from the skb's control block, which was initialized by the bus driver (i.e., 'mlxsw_pci') after decoding the packet's Completion Queue Element (CQE). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
mlxsw: spectrum: Remove mlxsw_sp_sample_receive()
The function resolves the psample sampling group from the Rx port because this is the only form of sampling the driver currently supports. Subsequent patches are going to add support for Tx-based and policy-based sampling, in which case the sampling group would not be resolved from the Rx port. Therefore, move this code to the Rx-specific sampling listener. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
mlxsw: spectrum: Remove unnecessary RCU read-side critical section
Since commit 7d8e8f3 ("mlxsw: core: Increase scope of RCU read-side critical section"), all Rx handlers are called from an RCU read-side critical section. Remove the unnecessary rcu_read_lock() / rcu_read_unlock(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
mlxsw: pci: Set extra metadata in skb control block
Packets that are mirrored / sampled to the CPU have extra metadata encoded in their corresponding Completion Queue Element (CQE). Retrieve this metadata from the CQE and set it in the skb control block so that it could be accessed by the switch driver (i.e., 'mlxsw_spectrum'). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
mlxsw: Create dedicated field for Rx metadata in skb control block
Next patch will need to encode more Rx metadata in the skb control block, so create a dedicated field for it and move the cookie index there. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
mlxsw: pci: Add more metadata fields to CQEv2
The Completion Queue Element version 2 (CQEv2) includes various metadata fields for packets that are mirrored / sampled to the CPU. Add these fields so that they could be used by a later patch. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
selftests: netdevsim: Test psample functionality
Test various aspects of psample functionality over netdevsim and in particular test that the psample module correctly reports the provided metadata. Example: # ./psample.sh TEST: psample enable / disable [ OK ] TEST: psample group number [ OK ] TEST: psample metadata [ OK ] Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
netdevsim: Add dummy psample implementation
Allow netdevsim to report "sampled" packets to the psample module by periodically generating packets from a work queue. The behavior can be enabled / disabled (default) and the various meta data attributes can be controlled via debugfs knobs. This implementation enables both testing of the psample module with all the optional attributes as well as development of user space applications on top of psample such as hsflowd and a Wireshark dissector for psample generic netlink packets. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
psample: Add additional metadata attributes
Extend psample to report the following attributes when available: * Output traffic class as a 16-bit value * Output traffic class occupancy in bytes as a 64-bit value * End-to-end latency of the packet in nanoseconds resolution * Software timestamp in nanoseconds resolution (always available) * Packet's protocol. Needed for packet dissection in user space (always available) Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
psample: Encapsulate packet metadata in a struct
Currently, callers of psample_sample_packet() pass three metadata attributes: Ingress port, egress port and truncated size. Subsequent patches are going to add more attributes (e.g., egress queue occupancy), which also need an indication whether they are valid or not. Encapsulate packet metadata in a struct in order to keep the number of arguments reasonable. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Merge branch 'skbuff-micro-optimize-flow-dissection'
Alexander Lobakin says: ==================== skbuff: micro-optimize flow dissection This little number makes all of the flow dissection functions take raw input data pointer as const (1-5) and shuffles the branches in __skb_header_pointer() according to their hit probability. The result is +20 Mbps per flow/core with one Flow Dissector pass per packet. This affects RPS (with software hashing), drivers that use eth_get_headlen() on their Rx path and so on. From v2 [1]: - reword some commit messages as a potential fix for NIPA; - no functional changes. From v1 [0]: - rebase on top of the latest net-next. This was super-weird, but I double-checked that the series applies with no conflicts, and then on Patchwork it didn't; - no other changes. [0] https://lore.kernel.org/netdev/20210312194538.337504-1-alobakin@pm.me [1] https://lore.kernel.org/netdev/20210313113645.5949-1-alobakin@pm.me ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
skbuff: micro-optimize {,__}skb_header_pointer()
{,__}skb_header_pointer() helpers exist mainly for preventing accesses-beyond-end of the linear data. In the vast majorify of cases, they bail out on the first condition. All code going after is mostly a fallback. Mark the most common branch as 'likely' one to move it in-line. Also, skb_copy_bits() can return negative values only when the input arguments are invalid, e.g. offset is greater than skb->len. It can be safely marked as 'unlikely' branch, assuming that hotpath code provides sane input to not fail here. These two bump the throughput with a single Flow Dissector pass on every packet (e.g. with RPS or driver that uses eth_get_headlen()) on 20 Mbps per flow/core. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net> -
ethernet: constify eth_get_headlen()'s data argument
It's used only for flow dissection, which now takes constant data pointers. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
linux/etherdevice.h: misc trailing whitespace cleanup
Caught by the text editor. Fix it separately from the actual changes. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
flow_dissector: constify raw input data argument
Flow Dissector code never modifies the input buffer, neither skb nor raw data. Make 'data' argument const for all of the Flow dissector's functions. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
skbuff: make __skb_header_pointer()'s data argument const
The function never modifies the input buffer, so 'data' argument can be marked as const. This implies one harmless cast-away. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
flow_dissector: constify bpf_flow_dissector's data pointers
BPF Flow dissection programs are read-only and don't touch input buffers. Mark 'data' and 'data_end' in struct bpf_flow_dissector as const in preparation for global input constifying. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Merge branch 'gro-micro-optimize-dev_gro_receive'
Alexander Lobakin says: ==================== gro: micro-optimize dev_gro_receive() This random series addresses some of suboptimal constructions used in the main GRO entry point. The main body is gro_list_prepare() simplification and pointer usage optimization in dev_gro_receive() itself. Being mostly cosmetic, it gives like +10 Mbps on my setup to both TCP and UDP (both single- and multi-flow). Since v1 [0]: - drop the replacement of bucket index calculation with reciprocal_scale() since it makes absolutely no sense (Eric); - improve stack usage in dev_gro_receive() (Eric); - reverse the order of patches to avoid changes superseding. [0] https://lore.kernel.org/netdev/20210312162127.239795-1-alobakin@pm.me ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
gro: give 'hash' variable in dev_gro_receive() a less confusing name
'hash' stores not the flow hash, but the index of the GRO bucket corresponding to it. Change its name to 'bucket' to avoid confusion while reading lines like '__set_bit(hash, &napi->gro_bitmask)'. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
gro: consistentify napi->gro_hash[x] access in dev_gro_receive()
GRO bucket index doesn't change through the entire function. Store a pointer to the corresponding bucket instead of its member and use it consistently through the function. It is performance-safe since &gro_list->list == gro_list. Misc: remove superfluous braces around single-line branches. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
gro: simplify gro_list_prepare()
gro_list_prepare() always returns &napi->gro_hash[bucket].list, without any variations. Moreover, it uses 'napi' argument only to have access to this list, and calculates the bucket index for the second time (firstly it happens at the beginning of dev_gro_receive()) to do that. Given that dev_gro_receive() already has an index to the needed list, just pass it as the first argument to eliminate redundant calculations, and make gro_list_prepare() return void. Also, both arguments of gro_list_prepare() can be constified since this function can only modify the skbs from the bucket list. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
-
net: dsa: bcm_sf2: Fill in BCM4908 CFP entries
The BCM4908 switch has 256 CFP entrie, update that setting so CFP can be used. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
hv_netvsc: Add a comment clarifying batching logic
The batching logic in netvsc_send is non-trivial, due to a combination of the Linux API and the underlying hypervisor interface. Add a comment explaining why the code is written this way. Signed-off-by: Shachar Raindel <shacharr@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Merge branch 'pktgen-scripts-improvements'
Igor Russkikh says: ==================== pktgen: scripts improvements Please consider small improvements to pktgen scripts we use in our environment. Adding delay parameter through command line, Adding new -a (append) parameter to make flex runs v3: change us to ns in docs v2: Review comments from Jesper CC: Jesper Dangaard Brouer <brouer@redhat.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
samples: pktgen: new append mode
To configure various complex flows we for sure can create custom pktgen init scripts, but sometimes thats not that easy. New "-a" (append) option in all the existing sample scripts allows to append more "devices" into pktgen threads. The most straightforward usecases for that are: - using multiple devices. We have to generate full linerate on all physical functions (ports) of our multiport device. - pushing multiple flows (with different packet options) Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
samples: pktgen: allow to specify delay parameter via new opt
DELAY may now be explicitly specified via common parameter -w Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
docs: net: add missing devlink health cmd - trigger
Documentation is missing and it's not very clear what this callback is for - presumably testing the recovery? Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>