v1.11 backports 2022-04-19 #19481

tklauser · 2022-04-19T13:23:44Z

pull skb data at the entrance of from-containter #19308 -- pull skb data at the entrance of from-containter (@liuyuan10)
- Merge conflicts resolved according to
  pull skb data at the entrance of from-containter #19308 (comment)
docs: improve guide to setup Cilium overlay on EKS #19207 -- docs: improve guide to setup Cilium overlay on EKS (@oliwave)
k8s: keep KVStore CiliumNode labels synced with Node object #19375 -- k8s: keep KVStore CiliumNode labels synced with Node object (@jibi)
- Had to resolve a merge conflict, please review carefully.
jenkinsfiles: Increase VM boot timeout #19458 -- jenkinsfiles: Increase VM boot timeout (@pchaigno)
iptables: Fatal when IPv6 is enabled but corresponding kernel modules are missing #18941 -- iptables: Fatal when IPv6 is enabled but corresponding kernel modules are missing (@vadorovsky)
Use identity labels for selector matching for Egress NAT Gateway #19194 -- Use identity labels for selector matching for Egress NAT Gateway (@blzhao-0)
- Had to resolve a merge conflict due to different import group ordering in v1.11 vs. master.

Once this PR is merged, you can update the PR labels via:

$ for pr in 19308 19207 19375 19458 18941 19194; do contrib/backporting/set-labels.py $pr done 1.11; done

[ upstream commit 2876483 ] When the skb is non-linear, revalidate_data drops the packet immediately. For example, This can happen when pods are using AF_PACKET + mmap ring buffer. calling revalidate_data_pull (bpf_skb_pull_data) helps to "Pull in non-linear data in case the skb is non-linear and not all of len are part of the linear section." Fixes cilium#18951 Signed-off-by: Yuan Liu <liuyuan@google.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>

[ upstream commit a0dabcd ] I've added several documentation steps to properly install Cilium overlay mode on EKS, since removing the `aws-node` daemonSet will not clear the iptables rules and the leftover rules will affect routing and filtering decisions. The issue I'm having is when the source pod is set to `hostNework` mode, the destination pod unable to reply packets through `cilium_vxlan` because of the following AWS CNI residual iptables rules. And also the nodes cannot access pods on different nodes. ``` -A PREROUTING -i eni+ -m comment --comment "AWS, outbound connections" -m state --state NEW -j AWS-CONNMARK-CHAIN-0 -A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80 -A POSTROUTING -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-0 -A AWS-CONNMARK-CHAIN-0 ! -d 10.2.0.0/16 -m comment --comment "AWS CONNMARK CHAIN, VPC CIDR" -j AWS-CONNMARK-CHAIN-1 -A AWS-CONNMARK-CHAIN-1 -m comment --comment "AWS, CONNMARK" -j CONNMARK --set-xmark 0x80/0x80 -A AWS-SNAT-CHAIN-0 ! -d 10.2.0.0/16 -m comment --comment "AWS SNAT CHAIN" -j AWS-SNAT-CHAIN-1 -A AWS-SNAT-CHAIN-1 ! -o vlan+ -m comment --comment "AWS, SNAT" -m addrtype ! --dst-type LOCAL -j SNAT --to-source 10.2.2.245 --random-fully ``` Signed-off-by: Oliver Wang <a0924100192@gmail.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>

[ upstream commit 47e936e ] This commit ensures that any update to the labels of a k8s Node object is reflected in the corresponding CiliumNode KVStore entry. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>

[ upstream commit cfec27a ] This commit increases the VM boot timeout while decreasing the overall timeout :mindblown: We currently run the vagrant-ci-start.sh script with a 15m timeout and retry twice if it fails. That takes up to 45m in total if all attempts fail, as in frequently happening in CI right now. In particular, if the script simply fails because it's taking on average more than 15m then it is likely to fail all three times. This commit instead increases the timeout from 15m to 25m and removes the retries. The goal is obviously to succeed on the first try :p Ideally, we would investigate why it is now taking longer to start the VM. But this issue has been happening for a long time. And because of the retries, we probably didn't even notice the increase at the beginning: if it takes on average 15min, it might fail half the time and the test might still succeed most of the time. That is, the retries participate to hide the increase. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io>

… are missing [ upstream commit 976e1c0 ] Before this change, when Cilium was running with --enable-ipv6 option, we were only logging a warning, but then the rest of iptables.go module was inserting ip6tables rules anyway. That resulted in errors, because inserting such rules is impossible without IPv6 netfilter presence in the kernel. This change fixes that by a fatal error in situation when IPv6 is enabled in Cilium, but not supported by the kernel. In such situations, users should either disable IPv6 in Cilium or load the needed kernel modules. Fixes: cilium#18904 Signed-off-by: Michal Rostecki <vadorovsky@gmail.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>

[ upstream commit 30478b7 ] With the introduction of cilium endpoint slices (CES), pod labels are no longer in the cilium endpoint struct, breaking the matching logic for egress NAT gateway. This change fetches the labels from the identify cache and use identity labels to do the same matching. It makes Egress NAT Gateway compatible with CES. FIXES cilium#17669 Signed-off-by: Bolun Zhao <blzhao@google.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>

tklauser · 2022-04-19T13:24:54Z

/test-backport-1.11

Job 'Cilium-PR-K8s-1.20-kernel-4.9' failed:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.20-kernel-4.9 so I can create one.

jibi

My PR looks good, thanks!

pchaigno

My changes look good. Thanks!

tklauser · 2022-04-19T15:31:27Z

/test-gke

Looks like CI images were not ready.

tklauser · 2022-04-20T07:26:19Z

/test-1.20-4.9

liuyuan10 and others added 6 commits April 19, 2022 15:13

tklauser requested a review from a team as a code owner April 19, 2022 13:23

tklauser added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Apr 19, 2022

tklauser requested review from jibi, pchaigno and vadorovsky April 19, 2022 13:24

jibi approved these changes Apr 19, 2022

View reviewed changes

pchaigno approved these changes Apr 19, 2022

View reviewed changes

vadorovsky approved these changes Apr 19, 2022

View reviewed changes

tklauser merged commit 029c089 into cilium:v1.11 Apr 20, 2022

tklauser mentioned this pull request Apr 20, 2022

Update stable releases #19503

Merged

aanm mentioned this pull request May 10, 2022

Prepare for release v1.11.5 #19756

Merged

tklauser deleted the pr/v1.11-backport-2022-04-19 branch April 20, 2023 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.11 backports 2022-04-19 #19481

v1.11 backports 2022-04-19 #19481

tklauser commented Apr 19, 2022

tklauser commented Apr 19, 2022 •

edited by maintainer-s-little-helper bot

Test Name

Failure Output

jibi left a comment

pchaigno left a comment

tklauser commented Apr 19, 2022 •

edited

tklauser commented Apr 20, 2022

v1.11 backports 2022-04-19 #19481

v1.11 backports 2022-04-19 #19481

Conversation

tklauser commented Apr 19, 2022

tklauser commented Apr 19, 2022 • edited by maintainer-s-little-helper bot

Test Name

Failure Output

jibi left a comment

Choose a reason for hiding this comment

pchaigno left a comment

Choose a reason for hiding this comment

tklauser commented Apr 19, 2022 • edited

tklauser commented Apr 20, 2022

tklauser commented Apr 19, 2022 •

edited by maintainer-s-little-helper bot

tklauser commented Apr 19, 2022 •

edited