-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.9 backports 2020-11-17 #14060
v1.9 backports 2020-11-17 #14060
Conversation
[ upstream commit 238262f ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 6e4b475 ] When executing ginkgo according to the instructions[0], the only error output was: FAIL: Cannot connect to vmName 'k8s1-1.18' After this change, the logs show the underlying error: cat: ssh-config: No such file or directory [0] https://docs.cilium.io/en/v1.8/contributing/testing/e2e/#running-end-to-end-tests-in-other-environments-via-ssh Signed-off-by: Manuel Buil <mbuil@suse.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 3ca315d ] Update the tag for the checkpatch image in order to benefit from the latest changes when running the GitHub action: - Fix and update the issues reported by checkpatch. - Allow for passing arguments directly to checkpatch.pl (not just the bash script), such as --fix-inplace. See cilium/image-tools#85. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 871e7e1 ] Only count the number of IPs for each FQDN selector/rule when storing rules for restoration, rather than ignoring later rules on a port after previous rules have hit the maximum number of IPs. Make the maximum number of IPs per restored rule configurable with the new option '--tofqdns-max-ips-per-restored-rule' (default 1000). Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit c9810bf ] This commit adds a mechanism to remove orphan SNAT entries. We call an SNAT entry orphan if it does not have either a corresponding CT entry or an SNAT entry in a reverse order. Both cases can happen due to LRU eviction heuristics (both CT and NAT maps are of the LRU type). The mechanism for the removal is based on the GC signaling in the datapath. When the datapath SNAT routine fails to find a free mapping after SNAT_SIGNAL_THRES attempts, it sends the signal via the perf ring buffer. The consumer of the buffer is the daemon. After receiving the signal it invokes the CT GC. The newly implemented GC addition iterates over all SNAT entries and checks whether a corresponding CT entry is found, and if not, it tries to remove both SNAT entries (for original and reverse flows). For now, I didn't add GC of orphan SNAT entries created by DSR to keep complexity of changes as low as possible. This will come as a follow up. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 0c83c28 ] Previously, after receiving the signal from the datapath, we iterated NAT map twice: first to compare against CT TCP map, second - against CT any map. Obviously, doing the iterations two times was inefficient. This commit fixes that by passing both CT {TCP,any} maps to the NAT GC routine. This allows the NAT GC to iterate once. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit b115c54 ] GetRules() will not process more than 1000 rules per port. Print how many are the total rules in the message. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 0d514ea ] README.md contained outdated instructions, so replace its contents with a link to the more up-to-date backporting guide. Signed-off-by: Tom Payne <tom@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit df9f66e ] PR-12865 attempted to accomplish this by proxying the context object without modifying it, which is incorrect. The incoming and outgoing metadata keys are actually different and must be explicitly set in order for the metadata to be properly proxied. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit d69045a ] This commit was necessary to prevent an import cycle in a future commit. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 6d9cfe4 ] This commit refactors the limits package. The biggest change is consolidating the limits map and its mutex into a single type. Functions with "limit" in the name have been removed because it is now redundant since these functions live under the limits package. The reason why this commit wasn't squashed with the previous is because git wouldn't consider the new file as a rename from the old, likely because of too many refactoring changes. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 623dcd3 ] In (*Node).syncToAPIServer(), there are two main loops (operations) which update the CiliumNode status and spec, respectively. The code was exactly the same except for this difference. This commit refactors the logic to consolidate the code when syncing to the CiliumNode resource to the apiserver. In addition, this refactor allows these two operations to have the same erorr handling flow [see commit a7451f8 ("ipam: Warn when failing to update CN status")]. The main motivation for that commit is to fix the swallowing of the error when update fails and the subsequent Get succeeds. Before this commit, the error handling flow was different between the two operations (update spec & status). Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 379ce1e ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit f0f948d ] This commit fixes the ENI IPAM mode when the instance types are very limited [1]. The problem was that Cilium by default was attempting to allocate 8 IPs (PreAllocate). However, on smaller instance types such as "t3a.micro" or similar, the maximum number of IPs that can be attached to one ENI device is 4. On top of that, Cilium by default does not use eth0 as an ENI device (default interface index is 1), which could further restrict itself from having enough ENIs for pods in the cluster. This commit fixes this by checking the instance type limits and adjusting the PreAllocate value. Additionally, we likely also need to include eth0 as an ENI device to give more buffer for Cilium to allocate IPs by setting the first interface index to 0. [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html?shortFooter=true#AvailableIpPerENI Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit ada413f ] Similarly to what is being done in upstream kube-proxy [1], but unfortunately without explaining why, loadBalancerSourceRanges might contain spaces which prevents the CIDR from being parsed correctly. [1] kubernetes/kubernetes#94107 Fixes: 3195681 ("k8s: Add and parse LoadBalancerSourceRanges field") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my trivial changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my changes, thanks.
6e8e2d2
to
cd26b38
Compare
[ upstream commit 659da4b ] Related to https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/ Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 7dc5a57 ] Hubble Relay is no more in-progress and is declared stable with the release of Cilium v1.9 thus remove the "work in progress" note Also update the Hubble internals doc section for Hubble Relay and complete the section with technical information. While here, fix a few typos in the whole doc section. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 7ffff79 ] In the new version [1], the client sends a request to the server every 0.5s over the long-lived connection, and it expects a reply from the server in 1s. The change can help to catch intermittent connection failures during Cilium upgrades. [1]: cilium/test-connection-disruption@edc628b Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 6b53c86 ] Commit a695f53 ("Endpoint for host") changed the logic for the HOST_ID check to ... [...] #ifndef ENABLE_EXTRA_HOST_DEV - if (sec_label != HOST_ID) + if (from_host && sec_label != HOST_ID) #endif [...] ... with the rationale of: [...] ENABLE_EXTRA_HOST_DEV was only defined in init.sh for the from_host path. It is now defined from the Go side for all paths, and we check whether from_host is true in the C code instead. [...] Lets review the situation before a695f53: The reason we added the ENABLE_EXTRA_HOST_DEV define came from back when ipvlan was added via 7fbfe1c (cilium, ipvlan: implement base host connectivity). The logic was placed in handle_ipv4() of bpf_netdev and invoked via from-netdev. When in direct routing or ipvlan mode, we would load the latter via 'bpf_load $NATIVE_DEV "$OPTS" "ingress" bpf_netdev.c bpf_netdev.o from-netdev $CALLS_MAP' to the native netdev facing external world and 'bpf_load $HOST_DEV1 "$OPTS" "egress" bpf_netdev.c bpf_host.o from-netdev $CALLS_MAP' to the cilium_host dev and in case of ipvlan where cilium_host is a ipvlan slave dev in hostns, the latter additionally has ENABLE_EXTRA_HOST_DEV set. This means for the veth case 'sec_label != HOST_ID' was always present, and for the ipvlan case 'sec_label != HOST_ID' was only compiled in for cilium_host / egress path. The change into 'from_host && sec_label != HOST_ID' has two issues after a695f53 transformation: i) The 'srcid_from_ipcache = *sec_label' assignment under this check is never invoked for traffic in case of veth datapath which is !from_host, that is, for outside world traffic arriving on the node. ii) Given after a695f53 the ENABLE_EXTRA_HOST_DEV is now defined from the Go side for all paths for ipvlan, the 'from_host && sec_label != HOST_ID' test is now compiled out and external traffic could wrongly assign HOST_ID. Meaning, for external traffic arriving on the node, the 'sec_label != HOST_ID' guard needs to stay intact. Fixes: a695f53 ("Endpoint for host") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
…irect [ upstream commit 8b0c892 ] We're in a similar situation as ipvlan datapath here in that we must derive the secid for policy enforcement via SECCTX_FROM_IPCACHE. This is needed as now we do not push the packet up the stack anymore where it will take the tc egress path of the bpf_lxc dev where it would resolve the secid, but instead we pass it onwards via ipv{4,6}_local_delivery() from bpf_host given we do not have skip_redirect. So in the latter this gets encoded via CB_SRC_LABEL before tail calling into ep->lxc_id. In bpf_host resolve_srcid_ipv4() was always picking WORLD (2) which will fail CIDR-based enforcement, e.g. hubble logs revealed this: {"time":"2020-11-13T13:53:09.636444980Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"0a:4b:c4:b6:2d:4b","destination":"92:79:4f:8e:96:4f"},"IP":{"source":"192.168.36.13","destination":"10.0.1.190","ipVersion":"IPv4"},"l4":{"TCP":{"source_port":56228,"destination_port":80,"flags":{"SYN":true}}},"source":{"identity":2,"labels":["reserved:world"]},"destination":{"ID":1091,"identity":41849,"namespace":"default","labels":["k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=default","k8s:io.kubernetes.pod.namespace=default","k8s:zgroup=testDS"],"pod_name":"testds-944zc"},"Type":"L3_L4","node_name":"k8s2","event_type":{"type":5},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"TCP Flags: SYN"} The source identity in this case should have been 16777217 as per ipcache dump of ... [...] 192.168.36.13/32 16777217 0 0.0.0.0 [...] ... and thus it failed CI test 'Suite-k8s-1.12.K8sPolicyTest Multi-node policy test validates ingress CIDR-dependent L4 connectivity is restored after importing ingress policy' where a non-Cilium managed node (192.168.36.13) made a request to the backend Pod directly via curl. It succeeded before policy to allow 192.168.36.13 to port 80 was installed but failed after the latter was set in place due to secid mismatch. Fix it by enabling SECCTX_FROM_IPCACHE so that it can assign correct identity in resolve_srcid_ipv{4,6}(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 92edccf ] Refactor current redirect_neigh() code into redirect_direct_{v4,v6}() and add multi-device support. The latter performs a route lookup and only calls into redirect_neigh() if L2 addresses must be resolved. It also passes the GW information from the fib_lookup() to redirect_neigh() to avoid a second lookup for the latter. This now enabled to use the fast redirect in the more general case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit bbd6886 ] ... since both need to go up the stack for packet handling. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 3ce28c0 ] Allow this knob to be configured for Helm users. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit d170ce8 ] Dump info on whether we use ktime or jiffies in BPF datapath. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 7436d3c ] Add a Cilium status dump on this datapath feature in order to allow for easier introspection. Clock source will show up under --verbose. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 9153a87 ] Add note to upgrade guide with regards to host routing probing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my changes
test-backport-1.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my commit
Ignored by this PR:
Once this PR is merged, you can update the PR labels via: