-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 backports 2022-01-26 #18630
v1.11 backports 2022-01-26 #18630
Conversation
[ upstream commit 9708e5d ] Commit 49cb220 ("iptables: Don't masquerade traffic to cluster nodes") introduced ipsets in our iptables rules, to skip masquerading traffic to cluster nodes when iptables-based masquerading is used. BPF-based masquerading already implemented this logic. In practice, traffic to both Internal and External node IPs would skip masquerading. However, Kubernetes doesn't state that External Node IPs are routable within the cluster. Masquerading may thus be required for External Node IPs, as reported by several users. Hence, this commit changes the logic to only skip masquerading in iptables for Internal node IPs. This commit doesn't affect the BPF-based masquerading logic, as unfortunately we can't easily distinguish between Internal and External node IPs in our datapath today (both are identified as host or remote-node). [ Backport notes: This commit introduces a typo that is going to be addresses by a backport of 9b5b964 in the same series ] Fixes: 49cb220 ("iptables: Don't masquerade traffic to cluster nodes") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit aaf0a75 ] This commit documents the difference in handling traffic to External IPs of cluster nodes between the BPF-based and iptables-based implementations of masquerading in Cilium. Suggested-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 87bdb30 ] Commit 2bede03 ("release: Generate helm values docs") broke the release scripts for branches v1.10 and earlier, as it introduced a dependency on a new Makefile target that has not been backported. Grep for the dependency and skip it silently if it's not available. Fixes: 2bede03 ("release: Generate helm values docs") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 8f1979d ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 966cc4c ] This commit adds the kernel requirements for advanced features such as IPsec, iptables-based masquerading, and the bandwidth manager. Those kernel configuration options were determined by making a somewhat-minimal kernel configuration for our CI. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 2fdfd91 ] The db5300d commit normalized the KPR initialisation routines by making them to return an error. Unfortunately, in some error returns it forgot to add actual errors. This makes debugging of KPR init difficult. For instance: "Detected devices" devices="[enp0s9 eth0]" <..> "failed to finalise LB initialization: Cannot retrieve enp0s9 link" The actual error for the link retrieval is missing. Fixes: db5300d ("choir: normalize error handling in kube_proxy_replacement.go") Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit ffe59b3 ] When host firewall feature is enable, to-stack trace can be generated even if the packets are rejected by host firewall policies. Generate trace only when the packets are actually delivered to the stack. Fixes: cilium#12562 Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 97c473f ] When we use JSON output of cilium monitor, it would be useful if we could filter the output by the tools like JQ. However, since some of the non-JSON messages are generated through stdout, it ends up to the JSON parse error. Fix `cilium monitor` command to generate non-JSON messages to stderr. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 92bf6c6 ] Previously, the PCAP recorder [1] was enabled only when running the lb-only mode. However, there was an ask from users, who are running KPR in the XDP mode, to have means to observe the LB traffic (tcpdump cannot be used for XDP progs). Why we didn't allow it before? Our main concern was potential verifier complexity issues. But considering that the opt is disabled by default, it's up to a user to take the potential risk. Tested manually on the 5.16 kernel with the following cmds: ./daemon/cilium-agent --debug=true --enable-ipv4=true --enable-ipv6=false --disable-envoy-version-check=true --tunnel=disabled --k8s-kubeconfig-path=/home/vagrant/.kube/config --kube-proxy-replacement=strict --native-routing-cidr=10.154.0.0/16 --enable-ipv4-masquerade=true --auto-direct-node-routes=true --enable-hubble=true --enable-recorder=true --bpf-lb-acceleration=native --bpf-lb-acceleration=testing-only # create nginx pod and expose it via NodePort service hubble record --server=unix:///var/run/cilium/hubble.sock "0.0.0.0/0 0 192.168.34.11/32 31894 TCP" # access the svc from outside 2022-01-24T10:25:03Z Status: 14 packets (1106 bytes) written [1]: https://cilium.io/blog/2021/05/20/cilium-110#pcap Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit f9a6011 ] In lookupDefaultRoute(), to determine the default route for a given IP family we list all routes filtering just by the 0.0.0.0 destination. On some EKS setups where multilple NICs are attached to an instance, this is not enough to identify a single route, since each interface has its own default route. The following example illustrates the problem: $ ks exec ds/cilium -- ip route list default via 10.100.1.1 dev eth0 default via 10.100.255.65 dev eth1 metric 10001 169.254.169.254 dev eth0 This will cause the agent to report a fatal error, as it can't find a single gateway: cilium-agent level=fatal msg="Error while creating daemon" error="unable to install ip rule for ENI multi-node NodePort: failed to find interface with default route: Found default routes with different netdev ifindices: 2 vs 3" subsys=daemon To address this issue, this commit changes the logic of lookupDefaultRoute() to pick the one with the lowest priority. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 4145b6a ] In the current implementation of HostToNetwork{16,32} and NetworkToHost{16,32} for big endian architectures the Go compiler isn't able to convert these operations into native assembly instructions, e.g. BSWAP on x86_64. Use ReverseBytes{16,32} from the math/bits stdlib package to implement these functions which allows the Go compiler to emit the respective assembly instructions which generally yield better performance. The Go toolchain test suite checks that the correct native assembly instructions are generated for the respective math/bits functions: https://github.com/golang/go/blob/16d6a5233a183be7264295c66167d35c689f9372/test/codegen/mathbits.go#L194-L208 Example output of `go tool compile -S byteorder_littleendian.go` on x86_64 for the HostToNetwork{16,32} functions below. NetworkToHost{16,32} look the same. Note the ROLW and BSWAPL instructions emitted after this change. Before: ``` "".HostToNetwork16 STEXT nosplit size=20 args=0x8 locals=0x0 funcid=0x0 0x0000 00000 (byteorder_littleendian.go:13) TEXT "".HostToNetwork16(SB), NOSPLIT|ABIInternal, $0-8 0x0000 00000 (byteorder_littleendian.go:13) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:13) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:13) FUNCDATA $5, "".HostToNetwork16.arginfo1(SB) 0x0000 00000 (<unknown line number>) NOP 0x0000 00000 (byteorder_littleendian.go:13) MOVL AX, CX 0x0002 00002 (byteorder_littleendian.go:19) ANDL $-256, AX 0x0007 00007 (byteorder_littleendian.go:19) SHRW $8, AX 0x000b 00011 (byteorder_littleendian.go:19) MOVBLZX CL, CX 0x000e 00014 (byteorder_littleendian.go:19) SHLL $8, CX 0x0011 00017 (byteorder_littleendian.go:19) ORL CX, AX 0x0013 00019 (byteorder_littleendian.go:13) RET 0x0000 89 c1 25 00 ff ff ff 66 c1 e8 08 0f b6 c9 c1 e1 ..%....f........ 0x0010 08 09 c8 c3 .... "".HostToNetwork32 STEXT nosplit size=45 args=0x8 locals=0x0 funcid=0x0 0x0000 00000 (byteorder_littleendian.go:14) TEXT "".HostToNetwork32(SB), NOSPLIT|ABIInternal, $0-8 0x0000 00000 (byteorder_littleendian.go:14) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:14) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:14) FUNCDATA $5, "".HostToNetwork32.arginfo1(SB) 0x0000 00000 (<unknown line number>) NOP 0x0000 00000 (byteorder_littleendian.go:14) MOVL AX, CX 0x0002 00002 (byteorder_littleendian.go:23) ANDL $-16777216, AX 0x0007 00007 (byteorder_littleendian.go:23) SHRL $24, AX 0x000a 00010 (byteorder_littleendian.go:23) MOVL CX, DX 0x000c 00012 (byteorder_littleendian.go:23) ANDL $16711680, CX 0x0012 00018 (byteorder_littleendian.go:23) SHRL $8, CX 0x0015 00021 (byteorder_littleendian.go:23) ORL CX, AX 0x0017 00023 (byteorder_littleendian.go:23) MOVL DX, CX 0x0019 00025 (byteorder_littleendian.go:23) ANDL $65280, DX 0x001f 00031 (byteorder_littleendian.go:23) SHLL $8, DX 0x0022 00034 (byteorder_littleendian.go:23) ORL DX, AX 0x0024 00036 (byteorder_littleendian.go:23) MOVBLZX CL, CX 0x0027 00039 (byteorder_littleendian.go:23) SHLL $24, CX 0x002a 00042 (byteorder_littleendian.go:23) ORL CX, AX 0x002c 00044 (byteorder_littleendian.go:14) RET 0x0000 89 c1 25 00 00 00 ff c1 e8 18 89 ca 81 e1 00 00 ..%............. 0x0010 ff 00 c1 e9 08 09 c8 89 d1 81 e2 00 ff 00 00 c1 ................ 0x0020 e2 08 09 d0 0f b6 c9 c1 e1 18 09 c8 c3 ............. ``` After: ``` "".HostToNetwork16 STEXT nosplit size=5 args=0x8 locals=0x0 funcid=0x0 0x0000 00000 (byteorder_littleendian.go:16) TEXT "".HostToNetwork16(SB), NOSPLIT|ABIInternal, $0-8 0x0000 00000 (byteorder_littleendian.go:16) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:16) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:16) FUNCDATA $5, "".HostToNetwork16.arginfo1(SB) 0x0000 00000 (<unknown line number>) NOP 0x0000 00000 (byteorder_littleendian.go:16) ROLW $8, AX 0x0004 00004 (byteorder_littleendian.go:16) RET 0x0000 66 c1 c0 08 c3 f.... "".HostToNetwork32 STEXT nosplit size=3 args=0x8 locals=0x0 funcid=0x0 0x0000 00000 (byteorder_littleendian.go:17) TEXT "".HostToNetwork32(SB), NOSPLIT|ABIInternal, $0-8 0x0000 00000 (byteorder_littleendian.go:17) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:17) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (byteorder_littleendian.go:17) FUNCDATA $5, "".HostToNetwork32.arginfo1(SB) 0x0000 00000 (byteorder_littleendian.go:17) BSWAPL AX 0x0002 00002 (byteorder_littleendian.go:17) RET 0x0000 0f c8 c3 ... ``` Fixes: 332b16f ("byteorder: Simplify byte order functions") Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 9b5b964 ] Due to a typo in the condition for the removal of IP addresses from the node ipset in commit 9708e5d ("node: Don't skip masquerading for External node IPs"), the ipsets would have never been cleaned up. This commit fixes it. Fixes: 9708e5d ("node: Don't skip masquerading for External node IPs") Reported-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit c7d9836 ] This commit enables CI for all feature branches based on the master one. The naming convention for the base feature branch is: ft/master/<feature> [ Backkport note: for v1.11, the feature branch is ft/v1.11/<feature> ] Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit cc85af8 ] Currently CI will run only on the base cilium/cilium repo, due to a check in most workflows in the form of: if: ${{ github.repository == 'cilium/cilium' }} The original intent of this check was to avoid running CI on forks, since workflow secrets are usually not configured, resulting in multiple failures. It turned out that the ability to run CI on forks is something useful to have, and as a bonus point it would ease the development and maintenance of new and existing workflows. Because of this, this commit removes the aforementioned check. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 7818a5f ] This commit is to expose xfrm stats via prometheus metrics if IPSec is enabled. Fixes: cilium#14725 Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit b1940f2 ] Calls to OpenMap register the resulting map in a global array. This behaviour is only intended in some cases, so implement our own version of LoadPinnedMap that takes care of populating the pin path field in our custom Map struct. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 3b00e46 ] Wrap errors from InitMaglevMaps to make the code easier to navigate. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit ca134d4 ] While working on a solution to cilium#18037, I encountered a few minor issues with the Maglev code. This commit aims to make the code a bit more idiomatic. - Fix a missing ipv6 (or ipv4) Maglev map from breaking`bpf lb maglev list` by always performing both dumps and ignoring missing maps. - Reduced the length of the 'get' handler in particular by carving the logic out into separate functions. - Implemented more logic as methods instead of free functions wherever it made sense. - Added sanity checks to prevent e.g. double-initialization of the Maglev maps, since their handles are stored in global variables. (this should be addressed at some point) - Store the Maglev table size intended by the user in a global var to avoid having multiple copies littered throughout outer/inner map structs. - Use errors to convey error states to the caller instead of opaque catch-all 'UnknownMaglevTableSize' return. - Provide helpers like MaglevOuterMap.UpdateService() that are friendlier to work with and somewhat easier to reason about. - Provide dump methods on both outer and inner map types. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 8789c95 ] To better reflect what it does (loading a map from a bpffs map pin and registering it in the 'mapRegister' global), rename this to something that doesn't look as innocuous as 'open a map'. This can leak (potentially closed) map handles if not used with care, and should probably be changed later on. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
/test-backport-1.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good 👍 Thanks Kornilios!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good for my change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️ for my changes, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good for my commits 💯
@@ -14,6 +14,7 @@ | |||
/pkg/byteorder/ @cilium/bpf @cilium/api | |||
/pkg/client @cilium/api | |||
/pkg/k8s/apis/cilium.io/v2/ @cilium/api | |||
/pkg/datapath/linux/ipsec/xfrm_collector* @cilium/metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like the original list is following alphabetical order.
Thanks Kornilios, looks good! |
Thanks Kornilios my changes looks good to me! |
/test-backport-1.11 |
CODEOWNERS
bpf lb maglev list
#18469 -- Ignore empty Maglev map from breakingbpf lb maglev list
(@ti-mo)Once this PR is merged, you can update the PR labels via:
Conflicts
CODEOWNERS