Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2022-01-31 #18669

Merged
merged 24 commits into from
Feb 7, 2022
Merged

Conversation

glibsm
Copy link
Member

@glibsm glibsm commented Jan 31, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18612 18527 18627 18585 18448 18636 18647 18608; do contrib/backporting/set-labels.py $pr done 1.11; done

@glibsm glibsm requested a review from a team as a code owner January 31, 2022 01:31
@glibsm glibsm added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Jan 31, 2022
Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good. Thanks!

@pchaigno
Copy link
Member

pchaigno commented Jan 31, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-1.18-kernel-4.9' hit: #17981 (89.69% similarity)

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@YutaroHayakawa YutaroHayakawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Thanks!

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes LGTM, thanks.

christarazi and others added 22 commits February 3, 2022 16:41
[ upstream commit c3ef235 ]

Additional sanity check coverage that the kube-apiserver policy matching
feature still works with a duplicate policy added and the previous
removed.

Updates: cilium#17829

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit bd7ee73 ]

Now that the limitation has been fixed, remove it from the docs.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit b6a41d2 ]

The init container name doesn't change during tests, so define it const.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit f49dda7 ]

Ensure that all endpoints exist and are ready before attempting to
extract a specific endpoint from GetAllEndpointsIds results.

Follows commit 3738680 ("test/runtime: Fix flake on reserved:init
endpoints") which introduced a similar check before calls to
GetEndpointIds.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 96152dc ]

Previously, the bpf_fib_lookup() helper was invoked in nodeport.h with
the BPF_FIB_LOOKUP_{OUTPUT,DIRECT} flags set. This commit drops both
flags to make the NodePort BPF multihoming possible.

The former flag instructed the helper to do the FIB lookup from egress
perspective. This required us to set the correct ifindex for the lookup.
In the NodePort BPF code we used DIRECT_ROUTING_DEV_IFINDEX which
corresponds to an iface which is used to communicate between k8s Nodes
(i.e., the iface with k8s Node InternalIP or ExternalIP). However, the
reliance on the DIRECT_ROUTING_DEV_IFINDEX meant that in the case of
more than a single iface used to connect the nodes, only one iface could
be used for the NodePort BPF traffic (forwarded service requests).

To fix this, we can drop the BPF_FIB_LOOKUP_OUTPUT and set the ingress
iface in the FIB lookup parameter. This makes the FIB lookup from the
perspective of the device which received a packet (before forwarding to
egress iface), and it eliminates the need for
DIRECT_ROUTING_DEV_IFINDEX in the NodePort BPF.

The latter flag meant to speed up the FIB lookups. The kernel commit which
introduced it [1] says:

    BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
    straight to the table associated with the device (expert setting for
    those looking to maximize throughput)

This means that we bypass ip rules. However, cilium-agent for some
setups installs IP rules (e.g., EKS, WireGuard, etc). Therefore, to
avoid any potential pitfals we drop the latter flag so that the IP rules
can be respected.

To measure a potential perf regression, I provisioned [2] scripts on
three AWS EC2 c5a.8xlarge machines (10Gbit, Ubuntu 20.04 LTS, 5.11
kernel). The netperf client was sending requests to the LB node which
was forwarding them to the destination node. The only Helm setting for
Cilium was KPR=strict. The results from three runs each setup are the
following:

---------------------+-----------------+-------------------------
                     | v1.11.1         | v1.11.1 with FIB changes
---------------------+-----------------+-------------------------
TCP_RR MEAN_LATENCY  | 358.74 +- 27.08 | 359.56 +- 30.50
with STDDEV_LATENCY  | 360.28 +- 26.01 | 361.76 +- 26.71
                     | 368.18 +- 28.22 | 365.33 +- 35.83
---------------------+-----------------+-------------------------

As you can see, the increase in latency is negligible. Therefore, this
commit drops the latter flag unconditionally, i.e., without checking
whether there are any non-default IP rules or Cilium is running on a
multi-homing setup.

[1]: torvalds/linux@87f5fc7
[2]: https://github.com/brb/cilium-without-netfilter

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 6c3bf64 ]

According to the Ginkgo e2e test docs, All "AfterEach" statements are
executed before all "AfterAll" statements[1].

Previously, this code was assuming that AfterAll inside a context would
run first (to delete pods) and then the AfterEach would run outside the
context (to wait for the pods to terminate), then AfterAll outside the
context to finally clean up the Cilium pods.

The result was that in issue 18447, the next test to run could
hit a race condition where pods were deleted but did not fully terminate
before Cilium was removed. Cilium ends up getting deleted before all the
pods, which means that there is no longer any way to execute the CNI DEL
operation for those pods, so they get stuck in Terminating state.

Fix it by moving the check that the test pods are fully terminated
before returning from the AfterAll statement where we initiate the
deletion of those pods.

[0]: https://docs.cilium.io/en/latest/contributing/testing/e2e/#example-test-layout

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 2ae89c1 ]

This test seems like it's always skipped, so who knows whether it works
or not. But this should at least mitigate issues with cleaning up state
at the end of the test, similar to the issues in the adjacent commits.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit cd7df97 ]

Previously this code was mismatched: The BeforeEach would attempt to
re-create the graceful termination app before each of the It statements,
however the cleanup would only occur once.

While preparing PR 18448, I hit an issue ciliumGH-18634 where if we didn't
wait sufficiently long between individual tests in the graceful
termination test Context, the test would fail during the following
BeforeEach because the previous test's client was not yet cleaned up.

Fix this by ensuring the cleanup & wait for pod termination occurs in an
AfterEach for each It statement in this Context, to match the behaviour
of the overall test. The next commit will then shift the cleanup logic
out of a Describe level AfterEach and into the individual Contexts.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 589c215 ]

The library in the tree that we use to interact with kubectl doesn't
automatically wait for all pods to be deleted when we delete them.
Depending on which test runs next, this can mean that we occasionally
hit a race condition like the one in Issue 18447:

* Delete Pods
* Delete Cilium
* Pods can't be fully deleted because Cilium is down
* BeforeAll for next test waits to make sure everything is ready so that
  we can install Cilium
* The next test times out while attempting this process

This commit goes through all of the existing suites, removes cases where
we are relying on incorrect scheduling between AfterEach/AfterAll
statements, and adds in waits after deleting pods anywhere in the suite
where we delete pods.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 323cb87 ]

When the test fails because it is waiting for containers to terminate,
add this information into the error that is reported so that developers
can more easily debug the issue that's occurring.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 216bcb1 ]

Previously the skip was on the Context which meant that the
BeforeAll/AfterAll for the K8sBandwidthTest Describe would be run, which
could deploy pods and fail to clean them up properly. Skip these
unnecessary steps as they can cause problems in test runs where this
test is skipped, and because it's unnecessary work.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 119653b ]

Previously the skip was on the Context which meant that the
BeforeAll/AfterAll for the K8sCustomCalls Describe would be run, which
could deploy pods and fail to clean them up properly. Skip these
unnecessary steps as they can cause problems in test runs where this
test is skipped, and because it's unnecessary work.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit fb2d11c ]

Deleting all pods after deleting Cilium is a sure-fire way to introduce
flakiness where pods hang in Terminating state during deletion, because
the Cilium pods have been cleaned up and kubelet can't call into the CNI
to properly clean up the pods.

Delete the pods first and wait for termination to finish.

CC: Quentin Monnet <quentin@isovalent.com>
Fixes: 9d4e99d ("tests: rework custom calls's AfterEach/AfterAll blocks to skip if needed")
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 95a46c6 ]

Use the spec.replicas field in the kube-dns deployment to reconfigure
the number of DNS pods in the cluster during pod restart. This should
restart the kube-dns pods correctly, similar to just deleting the pods.

A subsequent patch intends to split this usage out depending the status
of Cilium deployments, in order to mitigate issues where kube-dns pods
are deployed differently between test runs due to Cilium configuration
(like endpoint routes mode).

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 8302f0e ]

As of the writing of this patch, Cilium does not support transitioning
between enable-endpoint-routes=true and enable-endpoint-routes=false.

Commit c18cfc8 ("test: Redeploy DNS pods in AfterAll for datapath tests")
attempted to ensure that DNS pods are properly deleted from nodes prior
to completing a particular test file. This was to ensure that when
making changes to Cilium configuration like toggling endpoint routes
mode, the pods are properly switched over to the new mode.

Previously, the order of the test cleanup could cause problems with
terminating the kube-dns pods because Cilium was first deleted from the
cluster, but the kube-dns pods would get stuck in "Terminating" state.
This is believed to be because the nodes would still attempt to call the
cilium-cni binary in order to remove the DNS pods, and that would
attempt to reach out to the cilium-agent on the node, which is no longer
present.

The motivation behind this patch, then, is to first scale the pods down
to zero so that no kube-dns pods are present in the cluster during test
file cleanup, then shut down Cilium. Finally, we restore the number of
replicas back to the original value, so that when the next test sets
Cilium back up, the kubedns pods will be ready to be deployed afresh.

Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 6f9e1df ]

Currently, we determine whether a device is attached with an XDP
program by checking if `link.Attrs().Xdp == nil`. We need to check
for `Xdp.Attached` as well, otherwise, the XDP unloading will be
executed on innocent devices every time the agent restarts, which
might further cause network interruption for some NIC drivers,
e.g. Mellanox mlx5.

Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 3cd68af ]

Similarly to other guides, we can put incompatibilities between
different features into a Limitations section, at the end.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 411da57 ]

Reported-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 622eb86 ]

Current inherit_identity_from_host cannot inherit identity from
encrypted packets (packets with MARK_MAGIC_ENCRYPT coming from host
stack). Extend it to handle it.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 49b31e7 ]

Currently, return value of inherit_identity_from_host indicates the
packet is coming from proxy or not. However, in some cases, we want to
know whether the packet was encrypted or not (and it is impossible to
know it after the call since inherit_identity_from_host wipes the mark).
To cover both of the cases, we can just extend it to return magic value
extracted from the mark.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 8420939 ]

Currently, when packets passed from from-container (bpf_lxc) to host
stack, src_id is not encoded to the mark. It is inconvenient for rest of
the datapath especially when we generate the trace.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 144e9e0 ]

When ipsec packets comes from the host (xfrm) stack to from-host program
(bpf_host), it bypasses the send_trace_notify (<- stack). As a result, we
only see the trace for inner packet (ICMPRequest/Reply with <- endpoint
and -> stack). Fix bpf_host program not to bypass the trace.

Note that since we moved do_netdev_encrypt after
inherit_identity_from_host, we cannot extract identity from ctx->mark
anymore, because inherit_identity_from_host wipes the mark. That's why
we need to pass inherited identity to do_netdev_encrypt. Otherwise, in
tunneling mode, get_identity doesn't work correctly.

Trace on source node before fix

```
<- endpoint ... EchoRequest
-> stack ... EchoRequest
```

Trace on the source node after fix

```
<- endpoint ... EchoRequest
-> stack ... EchoRequest
<- stack encrypted ...
```

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
Signed-off-by: Glib Smaga <code@gsmaga.com>
@joamaki joamaki force-pushed the pr/v1.11-backport-2022-01-31 branch from 1536d5e to ca4db19 Compare February 3, 2022 15:41
@joamaki
Copy link
Contributor

joamaki commented Feb 4, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-1.22-kernel-4.19' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks service across nodes Tests NodePort BPF Tests with vxlan Tests with secondary NodePort device

Failure Output

FAIL: Request from k8s1 to service tftp://192.168.57.12:31959/hello failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.22-kernel-4.19 so I can create a new GitHub issue to track it.

@joamaki
Copy link
Contributor

joamaki commented Feb 4, 2022

/test-upstream-k8s

k8s-122-kernel-4.19 failed due to #12690 and that test was disabled in master, so going to ignore that. @twpayne shall I backport the commit to disable the flaky test?

k8s-upstream failed due to box download timeout so kicking it off again.

@twpayne
Copy link
Contributor

twpayne commented Feb 4, 2022

k8s-122-kernel-4.19 failed due to #12690 and that test was disabled in master, so going to ignore that. @twpayne shall I backport the commit to disable the flaky test?

Yes please.

@joamaki
Copy link
Contributor

joamaki commented Feb 4, 2022

/test-upstream-k8s

@joamaki
Copy link
Contributor

joamaki commented Feb 7, 2022

k8s-upstream still hitting the download issue. I'll merge this and then open new backports that will include the fix for that issue.

@joamaki joamaki merged commit da010b8 into cilium:v1.11 Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet