Ansuel-Smith/A…
Commits on Dec 8, 2021
-
net: dsa: qca8k: cache lo and hi for mdio write
From Documentation, we can cache lo and hi the same way we do with the page. This massively reduce the mdio write as 3/4 of the time we only require to write the lo or hi part for a mdio write. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
-
net: dsa: qca8k: Add support for mdio read/write in Ethernet packet
Add qca8k side support for mdio read/write in Ethernet packet. qca8k supports some specially crafted Ethernet packet that can be used for mdio read/write instead of the legacy method uart/internal mdio. This add support for the qca8k side to craft the packet and enqueue it. Each port and the qca8k_priv have a special struct to put data in it. The completion API is used to wait for the packet to be received back with the requested data. The various steps are: 1. Craft the special packet with the qca hdr set to mdio read/write mode. 2. Set the lock in the dedicated mdio struct. 3. Reinit the completion. 4. Enqueue the packet. 5. Wait the packet to be received. 6. Use the data set by the tagger to complete the mdio operation. If the completion timeouts or the ack value is not true, the legacy mdio way is used. It has to be considered that in the initial setup mdio is still used and mdio is still used until DSA is ready to accept and tag packet. tag_proto_connect() is used to fill the required handler for the tagger to correctly parse and elaborate the special Ethernet mdio packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
-
net: dsa: tag_qca: Add support for handling Ethernet mdio and MIB packet
Add connect/disconnect helper to assign private struct to the cpu port dsa priv. Add support for Ethernet mdio packet and MIB packet if the dsa driver provide an handler to correctly parse and elaborate the data. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
-
net: dsa: tag_qca: add define for mdio read/write in ethernet packet
Add all the required define to prepare support for mdio read/write in Ethernet packet. Any packet of this type has to be dropped as the only use of these special packet is receive ack for an mdio write request or receive data for an mdio read request. A struct is used that emulates the Ethernet header but is used for a different purpose. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
-
net: dsa: tag_qca: move define to include linux/dsa
Move tag_qca define to include dir linux/dsa as the qca8k require access to the tagger define to support in-band mdio read/write using ethernet packet. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
-
net: dsa: tag_qca: convert to FIELD macro
Convert driver to FIELD macro to drop redundant define. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
-
net: dsa: Permit dsa driver to configure additional tagger data
Permit a dsa driver to configure additional tagger data for the current active tagger. A new ops is introduced tag_proto_connect() that will be called on every tagger bind event using the DSA_NOTIFIER_TAG_PROTO_CONNECT event. This is used if the driver require to set additional driver or some handler that the tagger should use to handle special packet. The dsa driver require to provide explicit support for the current tagger and to understand the current private data set in the dsa ports. tag_proto_connect() should parse the tagger proto, check if it does support it and do the required task to each ports. An example of this is a dsa driver that supports Ethernet mdio and require to provide to the tagger a handler function to parse these packets. Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
-
net: das: Introduce support for tagger private data control
Introduce 2 new function for the tagger ops to permit the tagger to allocate private data. This is useful for case where the tagger receive some data that should be by the switch driver or require some special handling for some special packet (example Ethernet mdio packet) The tagger will use the dsa port priv to store his priv data. connect() is used to allocate the private data. It's the tagger choice how to allocate the data to the different ports. disconnect() will free the priv data in the dsa port. On switch setup the connect() ops is called. On tagger change the disconnect() is called, the tagger will free the priv data in dsa port and a connect() is called to allocate the new priv data (if the new tagger requires it) Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Commits on Dec 7, 2021
-
Merge branch 'mptcp-new-features-for-mptcp-sockets-and-netlink-pm'
Mat Martineau says: ==================== mptcp: New features for MPTCP sockets and netlink PM This collection of patches adds MPTCP socket support for a few socket options, ioctls, and one ancillary data type (specifics for each are listed below). There's also a patch modifying the netlink MPTCP path manager API to allow setting the backup flag on a configured interface using the endpoint ID instead of the full IP address. Patches 1 & 2: TCP_INQ cmsg and selftests. Patches 2 & 3: SIOCINQ, OUTQ, and OUTQNSD ioctls and selftests. Patch 5: Change backup flag using endpoint ID. Patches 6 & 7: IP_TOS socket option and selftests. Patches 8-10: TCP_CORK and TCP_NODELAY socket options. Includes a tcp change to expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use by MPTCP. ==================== Link: https://lore.kernel.org/r/20211203223541.69364-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski committedDec 7, 2021 -
mptcp: support TCP_CORK and TCP_NODELAY
First, add cork and nodelay fields to the mptcp_sock structure so they can be used in sync_socket_options(), and fill them on setsockopt while holding the msk socket lock. Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk while holding the ssk socket lock. tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending() being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle. Also add getsockopt support for TCP_CORK and TCP_NODELAY. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
mptcp: expose mptcp_check_and_set_pending
Expose the mptcp_check_and_set_pending() function for use inside MPTCP sockopt code. The next patch will call it when TCP_CORK is cleared or TCP_NODELAY is set on the MPTCP socket in order to push pending data from mptcp_release_cb(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
tcp: expose __tcp_sock_set_cork and __tcp_sock_set_nodelay
Expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use in MPTCP setsockopt code -- namely for syncing MPTCP socket options with subflows inside sync_socket_options() while already holding the subflow socket lock. Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
selftests: mptcp: check IP_TOS in/out are the same
Check that getsockopt(IP_TOS) returns what setsockopt(IP_TOS) did set right before. Also check that socklen_t == 0 and -1 input values match those of normal tcp sockets. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal authored and Jakub Kicinski committedDec 7, 2021 -
mptcp: getsockopt: add support for IP_TOS
earlier patch added IP_TOS setsockopt support, this allows to get the value set by earlier setsockopt. Extends mptcp_put_int_option to handle u8 input/output by adding required cast. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal authored and Jakub Kicinski committedDec 7, 2021 -
mptcp: allow changing the "backup" bit by endpoint id
a non-zero 'id' is sufficient to identify MPTCP endpoints: allow changing the value of 'backup' bit by simply specifying the endpoint id. Link: multipath-tcp/mptcp_net-next#158 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
selftests: mptcp: add inq test case
client & server use a unix socket connection to communicate outside of the mptcp connection. This allows the consumer to know in advance how many bytes have been (or will be) sent by the peer. This allows stricter checks on the bytecounts reported by TCP_INQ cmsg. Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal authored and Jakub Kicinski committedDec 7, 2021 -
mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
Allows to query in-sequence data ready for read(), total bytes in write queue and total bytes in write queue that have not yet been sent. v2: remove unneeded READ_ONCE() (Paolo Abeni) v3: check for new data unconditionally in SIOCINQ ioctl (Mat Martineau) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal authored and Jakub Kicinski committedDec 7, 2021 -
selftests: mptcp: add TCP_INQ support
Do checks on the returned inq counter. Fail on: 1. Huge value (> 1 kbyte, test case files are 1 kb) 2. last hint larger than returned bytes when read was short 3. erronenous indication of EOF. 3) happens when a hint of X bytes reads X-1 on next call but next recvmsg returns more data (instead of EOF). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal authored and Jakub Kicinski committedDec 7, 2021 -
mptcp: add TCP_INQ cmsg support
Support the TCP_INQ setsockopt. This is a boolean that tells recvmsg path to include the remaining in-sequence bytes in the cmsg data. v2: do not use CB(skb)->offset, increment map_seq instead (Paolo Abeni) v3: adjust CB(skb)->map_seq when taking skb from ofo queue (Paolo Abeni) Closes: multipath-tcp/mptcp_net-next#224 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Florian Westphal authored and Jakub Kicinski committedDec 7, 2021 -
vrf: use dev_replace_track() for better tracking
vrf_rt6_release() and vrf_rtable_release() changes dst->dev Instead of dev_hold(ndev); dev_put(odev); We should use dev_replace_track(odev, ndev, &dst->dev_tracker, GFP_KERNEL); If we do not transfer dst->dev_tracker to the new device, we will get warnings from ref_tracker_dir_exit() when odev is finally dismantled. Fixes: 9038c32 ("net: dst: add net device refcount tracking to dst_entry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211207055603.1926372-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net/smc: Clear memory when release and reuse buffer
Currently, buffers are cleared when smc connections are created and buffers are reused. This slows down the speed of establishing new connections. In most cases, the applications want to establish connections as quickly as possible. This patch moves memset() from connection creation path to release and buffer unuse path, this trades off between speed of establishing and release. Test environments: - CPU Intel Xeon Platinum 8 core, mem 32 GiB, nic Mellanox CX4 - socket sndbuf / rcvbuf: 16384 / 131072 bytes - w/o first round, 5 rounds, avg, 100 conns batch per round - smc_buf_create() use bpftrace kprobe, introduces extra latency Latency benchmarks for smc_buf_create(): w/o patch : 19040.0 ns w/ patch : 1932.6 ns ratio : 10.2% (-89.8%) Latency benchmarks for socket create and connect: w/o patch : 143.3 us w/ patch : 102.2 us ratio : 71.3% (-28.7%) The latency of establishing connections is reduced by 28.7%. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20211203113331.2818873-1-kgraul@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Lu authored and Jakub Kicinski committedDec 7, 2021 -
Revert "net: hns3: add void before function which don't receive ret"
This reverts commit 5ac4f18. Sorry for taking no notice that the function devlink_register() has been already declared as void, so it is needs to revert this patch. Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Link: https://lore.kernel.org/r/20211204012448.51360-1-huangguangbin2@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guangbin Huang authored and Jakub Kicinski committedDec 7, 2021 -
net: prestera: replace zero-length array with flexible-array member
One-element and zero-length arrays are deprecated and should be replaced with flexible-array members: https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Replace zero-length array with flexible-array member and make use of the struct_size() helper. Link: KSPP#78 Signed-off-by: José Expósito <jose.exposito89@gmail.com> Reviewed-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Tested-by: Volodymyr Mytnyk <vmytnyk@marvell.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20211204171349.22776-1-jose.exposito89@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: wwan: iosm: select CONFIG_RELAY
The iosm driver started using relayfs, but is missing the Kconfig logic to ensure it's built into the kernel: x86_64-linux-ld: drivers/net/wwan/iosm/iosm_ipc_trace.o: in function `ipc_trace_create_buf_file_handler': iosm_ipc_trace.c:(.text+0x16): undefined reference to `relay_file_operations' x86_64-linux-ld: drivers/net/wwan/iosm/iosm_ipc_trace.o: in function `ipc_trace_subbuf_start_handler': iosm_ipc_trace.c:(.text+0x31): undefined reference to `relay_buf_full' x86_64-linux-ld: drivers/net/wwan/iosm/iosm_ipc_trace.o: in function `ipc_trace_ctrl_file_write': iosm_ipc_trace.c:(.text+0xd5): undefined reference to `relay_flush' x86_64-linux-ld: drivers/net/wwan/iosm/iosm_ipc_trace.o: in function `ipc_trace_port_rx': Fixes: 00ef325 ("net: wwan: iosm: device trace collection using relayfs") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Link: https://lore.kernel.org/r/20211204174033.950528-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Vladimir reported csum issues after my recent change in skb_postpull_rcsum() Issue here is the following: initial skb->csum is the csum of [part to be pulled][rest of packet] Old code: skb->csum = csum_sub(skb->csum, csum_partial(pull, pull_length, 0)); New code: skb->csum = ~csum_partial(pull, pull_length, ~skb->csum); This is broken if the csum of [pulled part] happens to be equal to skb->csum, because end result of skb->csum is 0 in new code, instead of being 0xffffffff David Laight suggested to use skb->csum = -csum_partial(pull, pull_length, -skb->csum); I based my patches on existing code present in include/net/seg6.h, update_csum_diff4() and update_csum_diff16() which might need a similar fix. I guess that my tests, mostly pulling 40 bytes of IPv6 header were not providing enough entropy to hit this bug. v2: added wsum_negate() to make sparse happy. Fixes: 29c3002 ("net: optimize skb_postpull_rcsum()") Fixes: 0bd2847 ("gro: optimize skb_gro_postpull_rcsum()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Suggested-by: David Laight <David.Laight@ACULAB.COM> Cc: David Lebrun <dlebrun@google.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211204045356.3659278-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
Merge branch 'net-add-preliminary-netdev-refcount-tracking'
Eric Dumazet says: ==================== net: add preliminary netdev refcount tracking Two first patches add a generic infrastructure, that will be used to get tracking of refcount increments/decrements. The general idea is to be able to precisely pair each decrement with a corresponding prior increment. Both share a cookie, basically a pointer to private data storing stack traces. The third patch adds dev_hold_track() and dev_put_track() helpers (CONFIG_NET_DEV_REFCNT_TRACKER) Then a series of 20 patches converts some dev_hold()/dev_put() pairs to new hepers : dev_hold_track() and dev_put_track(). Hopefully this will be used by developpers and syzbot to root cause bugs that cause netdevice dismantles freezes. With CONFIG_PCPU_DEV_REFCNT=n option, we were able to detect some class of bugs, but too late (when too many dev_put() were happening). Another series will be sent after this one is merged. v3: moved NET_DEV_REFCNT_TRACKER to net/Kconfig.debug added "depends on DEBUG_KERNEL && STACKTRACE_SUPPORT" to hopefully get rid of kbuild reports for ARCH=nios2 Reworded patch 3 changelog. Added missing htmldocs (Jakub) v2: added four additional patches, added netdev_tracker_alloc() and netdev_tracker_free() addressed build error (kernel bots), use GFP_ATOMIC in test_ref_tracker_timer_func() ==================== Link: https://lore.kernel.org/r/20211205042217.982127-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>Jakub Kicinski committedDec 7, 2021 -
netpoll: add net device refcount tracker to struct netpoll
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
ipmr, ip6mr: add net device refcount tracker to struct vif_device
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: failover: add net device refcount tracker
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: linkwatch: add net device refcount tracker
Add a netdevice_tracker inside struct net_device, to track the self reference when a device is in lweventlist. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net/sched: add net device refcount tracker to struct Qdisc
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
ipv4: add net device refcount tracker to struct in_device
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
ipv6: add net device refcount tracker to struct inet6_dev
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: add net device refcount tracker to struct netdev_adjacent
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-
net: add net device refcount tracker to struct neigh_parms
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>