v1.9 backports 2020-11-17 #14060

kaworu · 2020-11-17T15:18:12Z

ci: log in to docker in vagrant boxes #13969 -- ci: log in to docker in vagrant boxes (@nebril)
test: Switch from Cilium test logger to Ginkgo #13754 -- test: Switch from Cilium test logger to Ginkgo (@manuelbuil)
checkpatch: update image tag to latest #13976 -- checkpatch: update image tag to latest (@qmonnet)
fqdn: Make maximum number of IPs per restored rule configurable #13992 -- fqdn: Make maximum number of IPs per restored rule configurable (@jrajahalme)
ctmap: GC orphan SNAT entries #13912 -- ctmap: GC orphan SNAT entries (@brb)
dnsproxy: print total number of rules if too many #13991 -- dnsproxy: print total number of rules if too many (@kkourt)
docs: Replace outdated backporting docs with link #13986 -- docs: Replace outdated backporting docs with link (@twpayne)
Hubble-Relay: proxy metadata from originating client #13994 -- Hubble-Relay: proxy metadata from originating client (@nathanjsweet)
helm/hubble-ui: fixed ingress configuration on EKS clusters #14023 -- helm/hubble-ui: fixed ingress configuration on EKS clusters (@mvisonneau)
ipam: Fix ENI IPAM on smaller instance types #13865 -- ipam: Fix ENI IPAM on smaller instance types (@christarazi)
pkg/k8s: trim spaces from loadBalancerSourceRanges #13996 -- pkg/k8s: trim spaces from loadBalancerSourceRanges (@aanm)
ci/github: Replace set-env command by echo #14053 -- ci/github: Replace set-env command by echo (@sayboras)
doc/hubble-internals: update Hubble Relay section to reflect current state #14042 -- doc/hubble-internals: update Hubble Relay section to reflect current state (@rolinh)
test: Bump migrate-svc-test image #14044 -- test: Bump migrate-svc-test image (@brb)
cilium: fix redirect limits on multi dev case #13884 -- cilium: fix redirect limits on multi dev case (@borkmann)

Ignored by this PR:

bpf: add metrics for fragmented ipv4 packets #13347 -- bpf: add metrics for fragmented ipv4 packets (@jibi)

Once this PR is merged, you can update the PR labels via:

$ for pr in 13969 13754 13976 13992 13912 13991 13986 13994 14023 13865 13996 14053 14042 14044 13884; do contrib/backporting/set-labels.py $pr done 1.9; done

[ upstream commit 238262f ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 6e4b475 ] When executing ginkgo according to the instructions[0], the only error output was: FAIL: Cannot connect to vmName 'k8s1-1.18' After this change, the logs show the underlying error: cat: ssh-config: No such file or directory [0] https://docs.cilium.io/en/v1.8/contributing/testing/e2e/#running-end-to-end-tests-in-other-environments-via-ssh Signed-off-by: Manuel Buil <mbuil@suse.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 3ca315d ] Update the tag for the checkpatch image in order to benefit from the latest changes when running the GitHub action: - Fix and update the issues reported by checkpatch. - Allow for passing arguments directly to checkpatch.pl (not just the bash script), such as --fix-inplace. See cilium/image-tools#85. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 871e7e1 ] Only count the number of IPs for each FQDN selector/rule when storing rules for restoration, rather than ignoring later rules on a port after previous rules have hit the maximum number of IPs. Make the maximum number of IPs per restored rule configurable with the new option '--tofqdns-max-ips-per-restored-rule' (default 1000). Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit c9810bf ] This commit adds a mechanism to remove orphan SNAT entries. We call an SNAT entry orphan if it does not have either a corresponding CT entry or an SNAT entry in a reverse order. Both cases can happen due to LRU eviction heuristics (both CT and NAT maps are of the LRU type). The mechanism for the removal is based on the GC signaling in the datapath. When the datapath SNAT routine fails to find a free mapping after SNAT_SIGNAL_THRES attempts, it sends the signal via the perf ring buffer. The consumer of the buffer is the daemon. After receiving the signal it invokes the CT GC. The newly implemented GC addition iterates over all SNAT entries and checks whether a corresponding CT entry is found, and if not, it tries to remove both SNAT entries (for original and reverse flows). For now, I didn't add GC of orphan SNAT entries created by DSR to keep complexity of changes as low as possible. This will come as a follow up. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 0c83c28 ] Previously, after receiving the signal from the datapath, we iterated NAT map twice: first to compare against CT TCP map, second - against CT any map. Obviously, doing the iterations two times was inefficient. This commit fixes that by passing both CT {TCP,any} maps to the NAT GC routine. This allows the NAT GC to iterate once. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit b115c54 ] GetRules() will not process more than 1000 rules per port. Print how many are the total rules in the message. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 0d514ea ] README.md contained outdated instructions, so replace its contents with a link to the more up-to-date backporting guide. Signed-off-by: Tom Payne <tom@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit df9f66e ] PR-12865 attempted to accomplish this by proxying the context object without modifying it, which is incorrect. The incoming and outgoing metadata keys are actually different and must be explicitly set in order for the metadata to be properly proxied. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit f65bee5 ] fixes #14018 Signed-off-by: Maxime VISONNEAU <maxime.visonneau@gmail.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit d69045a ] This commit was necessary to prevent an import cycle in a future commit. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 6d9cfe4 ] This commit refactors the limits package. The biggest change is consolidating the limits map and its mutex into a single type. Functions with "limit" in the name have been removed because it is now redundant since these functions live under the limits package. The reason why this commit wasn't squashed with the previous is because git wouldn't consider the new file as a rename from the old, likely because of too many refactoring changes. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 623dcd3 ] In (*Node).syncToAPIServer(), there are two main loops (operations) which update the CiliumNode status and spec, respectively. The code was exactly the same except for this difference. This commit refactors the logic to consolidate the code when syncing to the CiliumNode resource to the apiserver. In addition, this refactor allows these two operations to have the same erorr handling flow [see commit a7451f8 ("ipam: Warn when failing to update CN status")]. The main motivation for that commit is to fix the swallowing of the error when update fails and the subsequent Get succeeds. Before this commit, the error handling flow was different between the two operations (update spec & status). Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 379ce1e ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit f0f948d ] This commit fixes the ENI IPAM mode when the instance types are very limited [1]. The problem was that Cilium by default was attempting to allocate 8 IPs (PreAllocate). However, on smaller instance types such as "t3a.micro" or similar, the maximum number of IPs that can be attached to one ENI device is 4. On top of that, Cilium by default does not use eth0 as an ENI device (default interface index is 1), which could further restrict itself from having enough ENIs for pods in the cluster. This commit fixes this by checking the instance type limits and adjusting the PreAllocate value. Additionally, we likely also need to include eth0 as an ENI device to give more buffer for Cilium to allocate IPs by setting the first interface index to 0. [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html?shortFooter=true#AvailableIpPerENI Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit ada413f ] Similarly to what is being done in upstream kube-proxy [1], but unfortunately without explaining why, loadBalancerSourceRanges might contain spaces which prevents the CIDR from being parsed correctly. [1] kubernetes/kubernetes#94107 Fixes: 3195681 ("k8s: Add and parse LoadBalancerSourceRanges field") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

kaworu · 2020-11-17T15:22:46Z

@jibi please take a careful look at your patch set from #13347, as I had to resolve conflicts. More specifically theses commits (I've updated the commit message to locate the conflicts):

938b494 via 4b95484 ("bpf: add metrics for fragmented IPv4 packets")
a78f75e via 217ebb5 ("bpf: reduce complexity of logic to handle IPv4 fragments")

twpayne

LGTM for my trivial changes.

qmonnet

#13976 -- checkpatch: update image tag to latest (@qmonnet)

Looks good, thank you!

brb

LGTM for my changes, thanks.

kaworu · 2020-11-17T16:58:40Z

Removed #13347 patch set at @jibi's request to be backported later on.

[ upstream commit 659da4b ] Related to https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/ Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 7dc5a57 ] Hubble Relay is no more in-progress and is declared stable with the release of Cilium v1.9 thus remove the "work in progress" note Also update the Hubble internals doc section for Hubble Relay and complete the section with technical information. While here, fix a few typos in the whole doc section. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 7ffff79 ] In the new version [1], the client sends a request to the server every 0.5s over the long-lived connection, and it expects a reply from the server in 1s. The change can help to catch intermittent connection failures during Cilium upgrades. [1]: cilium/test-connection-disruption@edc628b Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 6b53c86 ] Commit a695f53 ("Endpoint for host") changed the logic for the HOST_ID check to ... [...] #ifndef ENABLE_EXTRA_HOST_DEV - if (sec_label != HOST_ID) + if (from_host && sec_label != HOST_ID) #endif [...] ... with the rationale of: [...] ENABLE_EXTRA_HOST_DEV was only defined in init.sh for the from_host path. It is now defined from the Go side for all paths, and we check whether from_host is true in the C code instead. [...] Lets review the situation before a695f53: The reason we added the ENABLE_EXTRA_HOST_DEV define came from back when ipvlan was added via 7fbfe1c (cilium, ipvlan: implement base host connectivity). The logic was placed in handle_ipv4() of bpf_netdev and invoked via from-netdev. When in direct routing or ipvlan mode, we would load the latter via 'bpf_load $NATIVE_DEV "$OPTS" "ingress" bpf_netdev.c bpf_netdev.o from-netdev $CALLS_MAP' to the native netdev facing external world and 'bpf_load $HOST_DEV1 "$OPTS" "egress" bpf_netdev.c bpf_host.o from-netdev $CALLS_MAP' to the cilium_host dev and in case of ipvlan where cilium_host is a ipvlan slave dev in hostns, the latter additionally has ENABLE_EXTRA_HOST_DEV set. This means for the veth case 'sec_label != HOST_ID' was always present, and for the ipvlan case 'sec_label != HOST_ID' was only compiled in for cilium_host / egress path. The change into 'from_host && sec_label != HOST_ID' has two issues after a695f53 transformation: i) The 'srcid_from_ipcache = *sec_label' assignment under this check is never invoked for traffic in case of veth datapath which is !from_host, that is, for outside world traffic arriving on the node. ii) Given after a695f53 the ENABLE_EXTRA_HOST_DEV is now defined from the Go side for all paths for ipvlan, the 'from_host && sec_label != HOST_ID' test is now compiled out and external traffic could wrongly assign HOST_ID. Meaning, for external traffic arriving on the node, the 'sec_label != HOST_ID' guard needs to stay intact. Fixes: a695f53 ("Endpoint for host") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

…irect [ upstream commit 8b0c892 ] We're in a similar situation as ipvlan datapath here in that we must derive the secid for policy enforcement via SECCTX_FROM_IPCACHE. This is needed as now we do not push the packet up the stack anymore where it will take the tc egress path of the bpf_lxc dev where it would resolve the secid, but instead we pass it onwards via ipv{4,6}_local_delivery() from bpf_host given we do not have skip_redirect. So in the latter this gets encoded via CB_SRC_LABEL before tail calling into ep->lxc_id. In bpf_host resolve_srcid_ipv4() was always picking WORLD (2) which will fail CIDR-based enforcement, e.g. hubble logs revealed this: {"time":"2020-11-13T13:53:09.636444980Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"0a:4b:c4:b6:2d:4b","destination":"92:79:4f:8e:96:4f"},"IP":{"source":"192.168.36.13","destination":"10.0.1.190","ipVersion":"IPv4"},"l4":{"TCP":{"source_port":56228,"destination_port":80,"flags":{"SYN":true}}},"source":{"identity":2,"labels":["reserved:world"]},"destination":{"ID":1091,"identity":41849,"namespace":"default","labels":["k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=default","k8s:io.kubernetes.pod.namespace=default","k8s:zgroup=testDS"],"pod_name":"testds-944zc"},"Type":"L3_L4","node_name":"k8s2","event_type":{"type":5},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"TCP Flags: SYN"} The source identity in this case should have been 16777217 as per ipcache dump of ... [...] 192.168.36.13/32 16777217 0 0.0.0.0 [...] ... and thus it failed CI test 'Suite-k8s-1.12.K8sPolicyTest Multi-node policy test validates ingress CIDR-dependent L4 connectivity is restored after importing ingress policy' where a non-Cilium managed node (192.168.36.13) made a request to the backend Pod directly via curl. It succeeded before policy to allow 192.168.36.13 to port 80 was installed but failed after the latter was set in place due to secid mismatch. Fix it by enabling SECCTX_FROM_IPCACHE so that it can assign correct identity in resolve_srcid_ipv{4,6}(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 92edccf ] Refactor current redirect_neigh() code into redirect_direct_{v4,v6}() and add multi-device support. The latter performs a route lookup and only calls into redirect_neigh() if L2 addresses must be resolved. It also passes the GW information from the fib_lookup() to redirect_neigh() to avoid a second lookup for the latter. This now enabled to use the fast redirect in the more general case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit bbd6886 ] ... since both need to go up the stack for packet handling. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 3ce28c0 ] Allow this knob to be configured for Helm users. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit d170ce8 ] Dump info on whether we use ktime or jiffies in BPF datapath. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 7436d3c ] Add a Cilium status dump on this datapath feature in order to allow for easier introspection. Clock source will show up under --verbose. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

[ upstream commit 9153a87 ] Add note to upgrade guide with regards to host routing probing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

christarazi

LGTM for my commits

aanm

LGTM for my changes

kaworu · 2020-11-17T19:00:32Z

test-backport-1.9

sayboras

Looks good for my commit

nebril and others added 16 commits November 17, 2020 16:07

ci: log in to docker in vagrant boxes

3869995

[ upstream commit 238262f ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

helm/hubble-ui: fixed ingress configuration on EKS clusters

dcb35ce

[ upstream commit f65bee5 ] fixes #14018 Signed-off-by: Maxime VISONNEAU <maxime.visonneau@gmail.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

eni: Move limits to seperate package

c870ef3

[ upstream commit d69045a ] This commit was necessary to prevent an import cycle in a future commit. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

ipam: Clarify purpose of (*Node).syncToAPIServer()

7890dd6

[ upstream commit 379ce1e ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

kaworu requested review from a team as code owners November 17, 2020 15:18

kaworu requested review from ti-mo and jibi November 17, 2020 15:18

maintainer-s-little-helper bot added backport/1.9 kind/backports This PR provides functionality previously merged into master. labels Nov 17, 2020

maintainer-s-little-helper bot assigned jibi and ti-mo and unassigned jibi Nov 17, 2020

kaworu requested review from christarazi, qmonnet, aanm, borkmann and brb November 17, 2020 15:20

rolinh approved these changes Nov 17, 2020

View reviewed changes

borkmann approved these changes Nov 17, 2020

View reviewed changes

kkourt approved these changes Nov 17, 2020

View reviewed changes

twpayne approved these changes Nov 17, 2020

View reviewed changes

qmonnet approved these changes Nov 17, 2020

View reviewed changes

brb approved these changes Nov 17, 2020

View reviewed changes

kaworu force-pushed the pr/v1.9-backport-2020-11-17 branch from 6e8e2d2 to cd26b38 Compare November 17, 2020 16:58

kaworu removed the request for review from jibi November 17, 2020 16:59

kaworu unassigned jibi Nov 17, 2020

sayboras and others added 11 commits November 17, 2020 18:00

ci/github: Replace set-env command by echo

25ca158

[ upstream commit 659da4b ] Related to https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/ Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

cilium: reenable host routing on endpoint routes and ipsec

06a22e0

[ upstream commit bbd6886 ] ... since both need to go up the stack for packet handling. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

cilium, helm: add helm option for host routing setting

a150063

[ upstream commit 3ce28c0 ] Allow this knob to be configured for Helm users. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

cilium, api: extend openapi to dump clock source status

f93bba4

[ upstream commit d170ce8 ] Dump info on whether we use ktime or jiffies in BPF datapath. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

cilium, docs: add note to upgrade guide on host routing

cd26b38

[ upstream commit 9153a87 ] Add note to upgrade guide with regards to host routing probing. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexandre Perrin <alex@kaworu.ch>

christarazi approved these changes Nov 17, 2020

View reviewed changes

aanm approved these changes Nov 17, 2020

View reviewed changes

nathanjsweet approved these changes Nov 17, 2020

View reviewed changes

sayboras approved these changes Nov 18, 2020

View reviewed changes

borkmann merged commit a01784d into v1.9 Nov 18, 2020

borkmann deleted the pr/v1.9-backport-2020-11-17 branch November 18, 2020 08:10

aanm mentioned this pull request Dec 4, 2020

Prepare for release v1.9.1 #14280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.9 backports 2020-11-17 #14060

v1.9 backports 2020-11-17 #14060

kaworu commented Nov 17, 2020 •

edited

kaworu commented Nov 17, 2020

twpayne left a comment

qmonnet left a comment

brb left a comment

kaworu commented Nov 17, 2020

christarazi left a comment

aanm left a comment

kaworu commented Nov 17, 2020

sayboras left a comment

v1.9 backports 2020-11-17 #14060

v1.9 backports 2020-11-17 #14060

Conversation

kaworu commented Nov 17, 2020 • edited

kaworu commented Nov 17, 2020

twpayne left a comment

Choose a reason for hiding this comment

qmonnet left a comment

Choose a reason for hiding this comment

brb left a comment

Choose a reason for hiding this comment

kaworu commented Nov 17, 2020

christarazi left a comment

Choose a reason for hiding this comment

aanm left a comment

Choose a reason for hiding this comment

kaworu commented Nov 17, 2020

sayboras left a comment

Choose a reason for hiding this comment

kaworu commented Nov 17, 2020 •

edited