New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IPv6 BIG TCP support #20349
Add IPv6 BIG TCP support #20349
Conversation
|
/test |
|
I accidentally pushed the wrong auto-generated model file, sorry about that. |
|
/test |
c5a2d6e
to
672ebad
Compare
|
/test |
|
/test Job 'Cilium-PR-K8s-1.23-kernel-4.19' failed: Click to show.Test NameFailure OutputIf it is a flake and a GitHub issue doesn't already exist to track it, comment |
|
/test |
|
/test Job 'Cilium-PR-K8s-GKE' failed: Click to show.Test NameFailure OutputIf it is a flake and a GitHub issue doesn't already exist to track it, comment |
|
/test Job 'Cilium-PR-K8s-GKE' failed: Click to show.Test NameFailure OutputIf it is a flake and a GitHub issue doesn't already exist to track it, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the vendor and API changes 👍
Update vishvananda/netlink to support setting IFLA_GRO_MAX_SIZE and IFLA_GSO_MAX_SIZE which is needed for BIG TCP. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add basic IPv6 BIG TCP infrastructure which allows to change a device's GRO and GSO max sizes depending on their current value and a new enable-ipv6-big-tcp option. BIG TCP is used to allow the network stack to cook bigger TSO/GRO packets (512k is the new maximum, 64k was the old), it can be used only with IPv6 and improves latency and performance due to the reduced number of packets that traverse the stack. It works by adding a new temporary Hop by Hop header right after the IPv6 header. Note that may break some eBPF programs which assume TCP header follows immediately the IPv6 header. The defaults (65536) are set only if necessary (i.e. if the GSO/GRO max sizes were previously changed), when the option is enabled they get set to 196608. The bpf_dynptr_data helper is used to probe for 5.19+ kernel. BIG TCP can be used only in BPF host routing mode, without encryption and tunneling, there are checks that enforce compatibility. If BIG TCP cannot tune all external interfaces then it will revert back to the defaults. Currently it is supported only on mlx4/mlx5 and veth devices, also any device that inherits gro/gso maximum sizes from others (e.g. bonding should work). If BIG TCP is enabled and initialized it can be seen in the logs: level=info msg="Setting up IPv6 BIG TCP" subsys=big-tcp level=info msg="Setting gso_max_size to 196608 and gro_max_size to 196608" device=eth0 subsys=big-tcp Benchmarks: - TCP_RR MIN_LATENCY P90_LATENCY P99_LATENCY THROUGHPUT (trans/sec) BIG TCP Disabled 60 91 136 13087.53 BIG-TCP Enabled 38 76 109 15629.73 - TCP_STREAM 16386.21 (BIG TCP disabled) 23181.41 (BIG TCP enabled) Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add new GRO/GSO max size fields to the daemon configuration status API. We need to expose them so they can be configured on pod veth devices. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add daemon support for the new option, initialize it in its NewDaemon call and expose the initialized GRO/GSO max sizes (BIG TCP config) through the daemon configuration status. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Allow configuring GRO/GSO max sizes when setting up veth devices. These are needed to enable BIG TCP support. They are configured only if > 0. Pass the configured values when setting up veths. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add Helm setting for IPv6 BIG TCP (enableIPv6BIGTCP) which defaults to false. Used "make -C install/kubernetes cilium/values.yaml" to autogenerate the values.yaml file. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add current BIG TCP state to the Status Response model and expose it in "cilium status". The struct naming (IPV6BigTCP) is due to the automatic generation. Output looks like: $ kubectl -n kube-system exec cilium-rmxzw -- cilium status ... IPv6 BIG TCP: Enabled ... Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add a new entry explaining the BIG TCP feature, its requirements, how to enable it and how to validate if it was successfully enabled. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add BIG TCP's kernel requirements to "Required Kernel Versions for Advanced Features" Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
Add standalone BIG TCP tests. They use a Kind cluster and setup Cilium with BIG TCP enabled. Then verify that gso_max_size is set properly (not verifying gro_max_size due to availability in iproute2, gso_max_size has been supported for a long time and if it was set properly that means gro max size was also set). And lastly perform a netperf TCP_RR test between the client and server netperf pods. The test needs veth devices for BIG TCP support and Kind is the natural choice, when we add e2e Kind testing infra these tests can be moved and integrated (left a TODO note in the test as well). Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com>
|
/test Job 'Cilium-PR-K8s-GKE' failed: Click to show.Test NameFailure OutputIf it is a flake and a GitHub issue doesn't already exist to track it, comment |
|
the ConformanceAKS test failure is unrelated to BIG TCP. It fails at Cilium Cleanup, details: |
Add a new IPv6 BIG TCP[1] option and infrastructure which allows to change a device's GRO and GSO max sizes depending on their current values and the new option (enable-ipv6-big-tcp). BIG TCP is used to allow the network stack to cook bigger TSO/GRO packets (512k is the new maximum, 64k was the old), it can be used only with IPv6 and improves latency and performance due to the reduced number of packets that traverse the stack. It works by adding a new temporary Hop by Hop header right after the IPv6 header, it is stripped before the packet is sent on the wire.
The defaults (65536) are set only if necessary (i.e. if the GSO/GRO max sizes were previously changed), when the option is enabled they get set to 196608. The bpf_dynptr_data helper is used to probe for 5.19+ kernel. BIG TCP can be used only in BPF host routing mode, without encryption and tunneling, there are checks that enforce compatibility. If BIG TCP cannot tune all external interfaces then it will revert back to the defaults. Currently it is supported only on mlx4/mlx5 and veth devices, also any device that inherits gro/gso maximum sizes from others (e.g. bonding should work).
LWN also has a nice BIG TCP summary available here
If the new BIG TCP Cilium option is enabled and initialized it can be seen in the logs:
level=info msg="Setting up IPv6 BIG TCP" subsys=big-tcp
level=info msg="Setting gso_max_size to 196608 and gro_max_size to 196608" device=eth0 subsys=big-tcp
Netperf benchmarks:
Standalone tests which use a Kind cluster are added. BIG TCP currently only works with veth and mellanox devices
so we need a Kind cluster to test it properly, I've left a TODO note that it should be migrated to the new infra ones it
lands.
Note that pods need to be restarted when the option is changed due to the veth configuration which gets applied when the
devices are created and before they're put in the target netns.
Patch-set overview:
Patch 01 - adds basic BIG TCP infrastructure allowing to enable/disable it and check for requirements
Patch 02 - adds GRO/GSO max sizes to the daemon configuration status so they can be exposed for veth device configuration
Patch 03 - adds the new "--enable-ipv6-big-tcp" option to the daemon
Patch 04 - uses the exposed GRO/GSO max sizes to configure the newly created pod veth devices
Patch 05 - adds helm support for the new option (enableIPv6BIGTCP)
Patch 06 - exposes the IPv6 BIG TCP state in "cilium status" (IPv6 BIG TCP: Enabled/Disabled)
Patch 07 - documents the new feature in the tuning docs
Patch 08 - adds BIG TCP tests which provision a Kind cluster, verify the option is properly enabled and run netperf
Requirements to enable:
Thanks,
Nik
[1] https://lore.kernel.org/netdev/20220513183408.686447-3-eric.dumazet@gmail.com/T/