New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2021-06-22 #16614
v1.10 backports 2021-06-22 #16614
Conversation
[ upstream commit 286a900 ] The script got broken with the introduction of CRD alphav1 which contains another occurrence of the schema version. To handle this, the script will take into account the first occurrence of the schema version under 'pkg/k8s'. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 0c9d55e ] Following cilium/metallb#4, Cilium is now tracking the code from the v0.9.6 branch of cilium/metallb: https://github.com/cilium/metallb/tree/v0.9.6 This was done in a backwards-compatible way to ensure that older versions of Cilium can still build by avoiding the invalidation of the previous commit SHA (40d425d20241). Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 8b3f009 ] Fixes: cilium#16549 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit db06a64 ] Log the correct field for HostIP. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 27122d4 ] Example trace seen in dmesg: [...] [ 7710.165608] enp10s0f0np0: hw csum failure [ 7710.165621] skb len=84 headroom=78 headlen=84 tailroom=30 mac=(64,14) net=(78,20) trans=98 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0) hash(0x14006e3a sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 [ 7710.165631] dev name=enp10s0f0np0 feat=0x0x0032b18217514ba9 [ 7710.165635] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165638] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165641] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165644] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165646] skb headroom: 00000040: b8 ce f6 05 e7 62 b8 ce f6 05 e7 76 08 00 [ 7710.165649] skb linear: 00000000: 45 00 00 54 8a 07 00 00 40 01 84 e8 c0 a8 a0 04 [ 7710.165652] skb linear: 00000010: 0a 9a 00 73 00 00 23 57 00 f8 15 db cd 74 d0 60 [ 7710.165654] skb linear: 00000020: 00 00 00 00 5c 2d 0d 00 00 00 00 00 10 11 12 13 [ 7710.165657] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 [ 7710.165660] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 [ 7710.165663] skb linear: 00000050: 34 35 36 37 [ 7710.165665] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165668] skb tailroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165672] CPU: 26 PID: 0 Comm: swapper/26 Not tainted 5.13.0-rc3+ #174 [ 7710.165674] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F22 08/20/2020 [ 7710.165676] Call Trace: [ 7710.165677] <IRQ> [ 7710.165680] dump_stack+0x7d/0x9c [ 7710.165683] netdev_rx_csum_fault.part.0+0x41/0x45 [ 7710.165686] netdev_rx_csum_fault.cold+0xb/0x10 [ 7710.165687] __skb_checksum_complete+0xdd/0xf0 [ 7710.165690] ? skb_send_sock_locked+0x20/0x20 [ 7710.165692] ? reqsk_fastopen_remove+0x190/0x190 [ 7710.165693] nf_ip_checksum+0x5b/0x120 [ 7710.165697] nf_conntrack_icmpv4_error+0x112/0x160 [nf_conntrack] [ 7710.165706] nf_conntrack_in.cold+0x1d/0x74 [nf_conntrack] [ 7710.165714] ? nft_do_chain_inet_ingress+0x280/0x2e0 [nf_tables] [ 7710.165722] ipv4_conntrack_in+0x14/0x20 [nf_conntrack] [ 7710.165731] nf_hook_slow+0x44/0xb0 [ 7710.165733] nf_hook_slow_list+0x71/0xf0 [ 7710.165735] ip_sublist_rcv+0x1d1/0x1f0 [ 7710.165737] ? ip_sublist_rcv+0x1f0/0x1f0 [ 7710.165739] ip_list_rcv+0xf5/0x120 [ 7710.165741] __netif_receive_skb_list_core+0x228/0x250 [ 7710.165745] netif_receive_skb_list_internal+0x1a1/0x2b0 [ 7710.165747] napi_complete_done+0x7a/0x1b0 [ 7710.165749] mlx5e_napi_poll+0x16e/0x730 [mlx5_core] [ 7710.165795] __napi_poll+0x31/0x170 [ 7710.165796] net_rx_action+0x22f/0x280 [ 7710.165798] __do_softirq+0xce/0x281 [ 7710.165800] irq_exit_rcu+0xa2/0xd0 [ 7710.165803] common_interrupt+0x8d/0xa0 [ 7710.165805] </IRQ> [ 7710.165806] asm_common_interrupt+0x1e/0x40 [ 7710.165808] RIP: 0010:cpuidle_enter_state+0xcc/0x360 [...] The trace was only reproducible with NICs using CHECKSUM_COMPLETE as csum type for inbound packets. It has been observed with mlx5, for example. The hw csum failure was only reproducible under the following conditions: - Protocol is ICMP, e.g. triggered by Cilium health probe packets - Pod from one node was pinging a remote node address - BPF based masquerading was used to SNAT Pod IP to node IP - BPF NAT engine found a collision in the NAT table such that it was forced to select a different ICMP id, and hence caused L4 rewrites In the case of ICMPv4 the bug was that BPF_F_PSEUDO_HDR was used for updating the L4 checksum. However, ICMPv4 does not have a pseudo header, only ICMPv6. The packet based csum was okay either way, but the flag caused to have a buggy skb->csum. Setting flag to 0 for ICMPv4 stopped the hw csum traces. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 128f0f8 ] As for some cases WaitUntil() is a DoS tool. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 8260f9d ] The test became notoriously flaky. It seems that some goroutines were lagging behind with the updates and they were overwritting the new MAC addr entry with the obsolete. To fix this, retry multiple times until the correct entry is found. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 4c4a5dc ] The change is probably noop, but itshould improve the last ping timestamp precision. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit d42614e ] Five minutes after IPsec key rotations, we cleanup the old IPsec state and print the following message: level=info msg="New encryption keys reclaiming SPI" spi=0 subsys=ipsec Unfortunately, due to a bug the SPI was always 0 in that log message. This commit changes it and also logs the old SPI value if we have it: level=info msg="New encryption keys reclaiming SPI" SPI=7 oldSPI=0 subsys=ipsec Fixes: 3f12fb6 ("cilium: ipsec, add cleanup xfrm routine") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit a7d73e4 ] Previously, we were restoring the original clusterIP service even when the service was deleted. Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 92d851d ] The `deletePolicyService` function was previously common to both delete policy and delete service callbacks. Refactor the logic to pass the policy config directly, thereby skip config look up. Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit a75599d ] Make IdentitySelectionUpdated() callbacks lock-free by queueing them while still holding selectorcache lock (to keep FIFO order) and calling from a goroutine not holding any locks. This prevents deadlocks caused by the implementation of IdentitySelectionUpdated() taking locks such as endpoint or selectorcache locks. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
[ upstream commit 876e9db ] If hubble-ca-secret already exists, then certgen is going to update it. To let certgen do its job, we need to configure update verb in the binded ClusterRole, otherwise it will fail with cannot update resource \"secrets\" in API group message. Fixes: cilium#16508 Signed-off-by: Alex Szakaly <alex.szakaly@gmail.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>
test-backport-1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my PR, thanks.
test-backport-1.10 https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-5.4/440/console
|
test-backport-1.10 |
test-backport-1.10
|
Marking as ready to merge given tests are passing and we have several reviews already. |
@errordeveloper Don't forget to run -
|
@aditighag thanks for the reminder, I done this just now, sorry for taking so long... I wonder if we can teach @ciliumbot to do this somehow, any thoughts? |
Skipped due to conflicts -
Once this PR is merged, you can update the PR labels via: