Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.9 backports 2020-11-17 #14060

Merged
merged 27 commits into from
Nov 18, 2020
Merged

v1.9 backports 2020-11-17 #14060

merged 27 commits into from
Nov 18, 2020

Conversation

kaworu
Copy link
Member

@kaworu kaworu commented Nov 17, 2020

Ignored by this PR:

Once this PR is merged, you can update the PR labels via:

$ for pr in 13969 13754 13976 13992 13912 13991 13986 13994 14023 13865 13996 14053 14042 14044 13884; do contrib/backporting/set-labels.py $pr done 1.9; done

nebril and others added 16 commits November 17, 2020 16:07
[ upstream commit 238262f ]

Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 6e4b475 ]

When executing ginkgo according to the instructions[0], the only error
output was:

  FAIL: Cannot connect to vmName 'k8s1-1.18'

After this change, the logs show the underlying error:

  cat: ssh-config: No such file or directory

[0] https://docs.cilium.io/en/v1.8/contributing/testing/e2e/#running-end-to-end-tests-in-other-environments-via-ssh

Signed-off-by: Manuel Buil <mbuil@suse.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 3ca315d ]

Update the tag for the checkpatch image in order to benefit from the
latest changes when running the GitHub action:

- Fix and update the issues reported by checkpatch.
- Allow for passing arguments directly to checkpatch.pl (not just the
  bash script), such as --fix-inplace.

See cilium/image-tools#85.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 871e7e1 ]

Only count the number of IPs for each FQDN selector/rule when storing
rules for restoration, rather than ignoring later rules on a port
after previous rules have hit the maximum number of IPs.

Make the maximum number of IPs per restored rule configurable with the
new option '--tofqdns-max-ips-per-restored-rule' (default 1000).

Signed-off-by: Jarno Rajahalme <jarno@covalent.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit c9810bf ]

This commit adds a mechanism to remove orphan SNAT entries.

We call an SNAT entry orphan if it does not have either a corresponding
CT entry or an SNAT entry in a reverse order. Both cases can happen due
to LRU eviction heuristics (both CT and NAT maps are of the LRU type).

The mechanism for the removal is based on the GC signaling in the
datapath. When the datapath SNAT routine fails to find a free mapping
after SNAT_SIGNAL_THRES attempts, it sends the signal via the perf ring
buffer. The consumer of the buffer is the daemon. After receiving the
signal it invokes the CT GC.

The newly implemented GC addition iterates over all SNAT entries and
checks whether a corresponding CT entry is found, and if not, it tries
to remove both SNAT entries (for original and reverse flows).

For now, I didn't add GC of orphan SNAT entries created by DSR to keep
complexity of changes as low as possible. This will come as a follow up.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 0c83c28 ]

Previously, after receiving the signal from the datapath, we iterated
NAT map twice: first to compare against CT TCP map, second - against CT
any map.

Obviously, doing the iterations two times was inefficient. This commit
fixes that by passing both CT {TCP,any} maps to the NAT GC routine. This
allows the NAT GC to iterate once.

Suggested-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit b115c54 ]

GetRules() will not process more than 1000 rules per port.
Print how many are the total rules in the message.

Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 0d514ea ]

README.md contained outdated instructions, so replace its contents with
a link to the more up-to-date backporting guide.

Signed-off-by: Tom Payne <tom@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit df9f66e ]

PR-12865 attempted to accomplish this by proxying the
context object without modifying it, which is incorrect.
The incoming and outgoing metadata keys are actually
different and must be explicitly set in order for the
metadata to be properly proxied.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit f65bee5 ]

fixes #14018

Signed-off-by: Maxime VISONNEAU <maxime.visonneau@gmail.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit d69045a ]

This commit was necessary to prevent an import cycle in a future commit.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 6d9cfe4 ]

This commit refactors the limits package. The biggest change is
consolidating the limits map and its mutex into a single type. Functions
with "limit" in the name have been removed because it is now redundant
since these functions live under the limits package.

The reason why this commit wasn't squashed with the previous is because
git wouldn't consider the new file as a rename from the old, likely
because of too many refactoring changes.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 623dcd3 ]

In (*Node).syncToAPIServer(), there are two main loops (operations)
which update the CiliumNode status and spec, respectively. The code was
exactly the same except for this difference. This commit refactors the
logic to consolidate the code when syncing to the CiliumNode resource to
the apiserver.

In addition, this refactor allows these two operations to have the same
erorr handling flow [see commit a7451f8 ("ipam: Warn when failing to
update CN status")]. The main motivation for that commit is to fix the
swallowing of the error when update fails and the subsequent Get
succeeds. Before this commit, the error handling flow was different
between the two operations (update spec & status).

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 379ce1e ]

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit f0f948d ]

This commit fixes the ENI IPAM mode when the instance types are very
limited [1].

The problem was that Cilium by default was attempting to allocate 8 IPs
(PreAllocate). However, on smaller instance types such as "t3a.micro" or
similar, the maximum number of IPs that can be attached to one ENI
device is 4. On top of that, Cilium by default does not use eth0 as an
ENI device (default interface index is 1), which could further restrict
itself from having enough ENIs for pods in the cluster.

This commit fixes this by checking the instance type limits and
adjusting the PreAllocate value. Additionally, we likely also need to
include eth0 as an ENI device to give more buffer for Cilium to allocate
IPs by setting the first interface index to 0.

[1]:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html?shortFooter=true#AvailableIpPerENI

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit ada413f ]

Similarly to what is being done in upstream kube-proxy [1], but
unfortunately without explaining why, loadBalancerSourceRanges might
contain spaces which prevents the CIDR from being parsed correctly.

[1] kubernetes/kubernetes#94107

Fixes: 3195681 ("k8s: Add and parse LoadBalancerSourceRanges field")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@kaworu kaworu requested review from a team as code owners November 17, 2020 15:18
@kaworu kaworu requested review from ti-mo and jibi November 17, 2020 15:18
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.9 kind/backports This PR provides functionality previously merged into master. labels Nov 17, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot assigned jibi and ti-mo and unassigned jibi Nov 17, 2020
@kaworu
Copy link
Member Author

kaworu commented Nov 17, 2020

@jibi please take a careful look at your patch set from #13347, as I had to resolve conflicts. More specifically theses commits (I've updated the commit message to locate the conflicts):

  • 938b494 via 4b95484 ("bpf: add metrics for fragmented IPv4 packets")
  • a78f75e via 217ebb5 ("bpf: reduce complexity of logic to handle IPv4 fragments")

Copy link
Contributor

@twpayne twpayne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my trivial changes.

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13976 -- checkpatch: update image tag to latest (@qmonnet)

Looks good, thank you!

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes, thanks.

@kaworu
Copy link
Member Author

kaworu commented Nov 17, 2020

Removed #13347 patch set at @jibi's request to be backported later on.

@kaworu kaworu removed the request for review from jibi November 17, 2020 16:59
@kaworu kaworu unassigned jibi Nov 17, 2020
sayboras and others added 11 commits November 17, 2020 18:00
[ upstream commit 659da4b ]

Related to https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

Signed-off-by: Tam Mach <sayboras@yahoo.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 7dc5a57 ]

Hubble Relay is no more in-progress and is declared stable with the
release of Cilium v1.9 thus remove the "work in progress" note
Also update the Hubble internals doc section for Hubble Relay and
complete the section with technical information.
While here, fix a few typos in the whole doc section.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 7ffff79 ]

In the new version [1], the client sends a request to the server every
0.5s over the long-lived connection, and it expects a reply from the
server in 1s. The change can help to catch intermittent connection
failures during Cilium upgrades.

[1]: cilium/test-connection-disruption@edc628b

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 6b53c86 ]

Commit a695f53 ("Endpoint for host") changed the logic for the
HOST_ID check to ...

  [...]
   #ifndef ENABLE_EXTRA_HOST_DEV
  -                               if (sec_label != HOST_ID)
  +                               if (from_host && sec_label != HOST_ID)
   #endif
  [...]

... with the rationale of:

  [...]
  ENABLE_EXTRA_HOST_DEV was only defined in init.sh for the from_host path.
  It is now defined from the Go side for all paths, and we check whether
  from_host is true in the C code instead.
  [...]

Lets review the situation before a695f53:

The reason we added the ENABLE_EXTRA_HOST_DEV define came from back when
ipvlan was added via 7fbfe1c (cilium, ipvlan: implement base host
connectivity). The logic was placed in handle_ipv4() of bpf_netdev and
invoked via from-netdev. When in direct routing or ipvlan mode, we would
load the latter via 'bpf_load $NATIVE_DEV "$OPTS" "ingress" bpf_netdev.c
bpf_netdev.o from-netdev $CALLS_MAP' to the native netdev facing external
world and 'bpf_load $HOST_DEV1 "$OPTS" "egress" bpf_netdev.c bpf_host.o
from-netdev $CALLS_MAP' to the cilium_host dev and in case of ipvlan where
cilium_host is a ipvlan slave dev in hostns, the latter additionally has
ENABLE_EXTRA_HOST_DEV set. This means for the veth case 'sec_label != HOST_ID'
was always present, and for the ipvlan case 'sec_label != HOST_ID' was only
compiled in for cilium_host / egress path.

The change into 'from_host && sec_label != HOST_ID' has two issues after
a695f53 transformation:

 i) The 'srcid_from_ipcache = *sec_label' assignment under this check is
    never invoked for traffic in case of veth datapath which is !from_host,
    that is, for outside world traffic arriving on the node.

ii) Given after a695f53 the ENABLE_EXTRA_HOST_DEV is now defined from
    the Go side for all paths for ipvlan, the 'from_host && sec_label != HOST_ID'
    test is now compiled out and external traffic could wrongly assign HOST_ID.
    Meaning, for external traffic arriving on the node, the 'sec_label != HOST_ID'
    guard needs to stay intact.

Fixes: a695f53 ("Endpoint for host")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
…irect

[ upstream commit 8b0c892 ]

We're in a similar situation as ipvlan datapath here in that we must
derive the secid for policy enforcement via SECCTX_FROM_IPCACHE. This
is needed as now we do not push the packet up the stack anymore where
it will take the tc egress path of the bpf_lxc dev where it would resolve
the secid, but instead we pass it onwards via ipv{4,6}_local_delivery()
from bpf_host given we do not have skip_redirect. So in the latter this
gets encoded via CB_SRC_LABEL before tail calling into ep->lxc_id. In
bpf_host resolve_srcid_ipv4() was always picking WORLD (2) which will
fail CIDR-based enforcement, e.g. hubble logs revealed this:

{"time":"2020-11-13T13:53:09.636444980Z","verdict":"DROPPED","drop_reason":133,"ethernet":{"source":"0a:4b:c4:b6:2d:4b","destination":"92:79:4f:8e:96:4f"},"IP":{"source":"192.168.36.13","destination":"10.0.1.190","ipVersion":"IPv4"},"l4":{"TCP":{"source_port":56228,"destination_port":80,"flags":{"SYN":true}}},"source":{"identity":2,"labels":["reserved:world"]},"destination":{"ID":1091,"identity":41849,"namespace":"default","labels":["k8s:io.cilium.k8s.policy.cluster=default","k8s:io.cilium.k8s.policy.serviceaccount=default","k8s:io.kubernetes.pod.namespace=default","k8s:zgroup=testDS"],"pod_name":"testds-944zc"},"Type":"L3_L4","node_name":"k8s2","event_type":{"type":5},"traffic_direction":"INGRESS","drop_reason_desc":"POLICY_DENIED","Summary":"TCP Flags: SYN"}

The source identity in this case should have been 16777217 as per ipcache
dump of ...

  [...]
  192.168.36.13/32               16777217   0   0.0.0.0
  [...]

... and thus it failed CI test 'Suite-k8s-1.12.K8sPolicyTest Multi-node policy
test validates ingress CIDR-dependent L4 connectivity is restored after
importing ingress policy' where a non-Cilium managed node (192.168.36.13) made
a request to the backend Pod directly via curl. It succeeded before policy
to allow 192.168.36.13 to port 80 was installed but failed after the latter
was set in place due to secid mismatch. Fix it by enabling SECCTX_FROM_IPCACHE
so that it can assign correct identity in resolve_srcid_ipv{4,6}().

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 92edccf ]

Refactor current redirect_neigh() code into redirect_direct_{v4,v6}() and
add multi-device support. The latter performs a route lookup and only calls
into redirect_neigh() if L2 addresses must be resolved. It also passes the
GW information from the fib_lookup() to redirect_neigh() to avoid a second
lookup for the latter. This now enabled to use the fast redirect in the more
general case.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit bbd6886 ]

... since both need to go up the stack for packet handling.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 3ce28c0 ]

Allow this knob to be configured for Helm users.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit d170ce8 ]

Dump info on whether we use ktime or jiffies in BPF datapath.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 7436d3c ]

Add a Cilium status dump on this datapath feature in order to allow
for easier introspection. Clock source will show up under --verbose.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
[ upstream commit 9153a87 ]

Add note to upgrade guide with regards to host routing probing.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my commits

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for my changes

@kaworu
Copy link
Member Author

kaworu commented Nov 17, 2020

test-backport-1.9

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my commit

@borkmann borkmann merged commit a01784d into v1.9 Nov 18, 2020
@borkmann borkmann deleted the pr/v1.9-backport-2020-11-17 branch November 18, 2020 08:10
@aanm aanm mentioned this pull request Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet