Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.10 backports 2022-03-07 #19062

Merged
merged 6 commits into from
Mar 17, 2022

Conversation

sayboras
Copy link
Member

@sayboras sayboras commented Mar 7, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 17840 17856 18897 19017 19036; do contrib/backporting/set-labels.py $pr done 1.10; done

@sayboras sayboras requested a review from a team as a code owner March 7, 2022 11:15
@sayboras sayboras added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Mar 7, 2022
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good, thanks Tam!

@sayboras
Copy link
Member Author

sayboras commented Mar 7, 2022

/test-backport-1.10

@sayboras
Copy link
Member Author

sayboras commented Mar 8, 2022

/test-runtime

@gandro gandro requested a review from aanm March 8, 2022 08:47
@gandro
Copy link
Member

gandro commented Mar 8, 2022

Runtime is failing with 05:54:23 runtime: Mar 08 05:54:19 runtime cilium-agent[20397]: level=fatal msg="Envoy: Binary \"cilium-envoy\" cannot be executed" error="exit status 1" subsys=envoy-manager - unclear what commit would have caused that. Have we seen similar issues on the v1.10 branch before?

@sayboras
Copy link
Member Author

sayboras commented Mar 8, 2022

Thanks to Paul for pointing out #18919

@sayboras
Copy link
Member Author

sayboras commented Mar 9, 2022

Runtime failure is mentioned to linked issue above #18919.
CodeQL failure is not related to any change in this PR.

@aanm aanm added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Mar 9, 2022
@aanm
Copy link
Member

aanm commented Mar 9, 2022

/test-runtime

tklauser and others added 6 commits March 10, 2022 23:22
[ upstream commit 71452d5 ]

Building the BPF datapath with LLVM 14 leads to the following errors:

    bpf_lxc.c:101:16: error: variable 'daddr' set but not used [-Werror,-Wunused-but-set-variable]
            union v6addr *daddr, orig_dip;
                          ^
    bpf_lxc.c:103:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable]
            __u8 encrypt_key = 0;
                 ^
    bpf_lxc.c:102:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable]
            __u32 tunnel_endpoint = 0;
                  ^
    bpf_lxc.c:526:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable]
            __u8 encrypt_key = 0;
                 ^
    bpf_lxc.c:525:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable]
            __u32 tunnel_endpoint = 0;
                  ^

These are normally warnings, but errors in this case due to the use of
-Werror when compiling Cilium's bpf programs. Fix these by marking the
affected variables as __maybe_unused.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit 3cb9ba1 ]

In the bpf_lxc program's functions ipv6_l3_from_lxc and
handle_ipv4_from_lxc, currently encrypt_key is always looked up in the
encrypt map, regardless of whether IPsec is enabled or not. However, its
value is only actually used when IPsec is enabled. Thus, the call can
be avoid when IPsec is disabled.

This also slightly reduces program size if !defined(ENABLE_IPSEC).

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit 18b10b4 ]

When running in CRD-based IPAM modes (Alibaba, Azure, ENI, CRD), it is
possible to observe spurious "Unable to update CiliumNode custom
resource" failures in the cilium-agent.

The full error message is as follows: "Operation cannot be fulfilled on
ciliumnodes.cilium.io <node>: the object has been modified; please apply
your changes to the latest version and try again".

It means that the Kubernetes `UpdateStatus` call has failed because the
local `ObjectMeta.ResourceVersion` of submitted CiliumNode version is
out of date. In the presence of races, this error is expected and will
resolve itself once the agent receives a more recent version of the
object with the new resource version.

However, it is possible that the resource version of a `CiliumNode`
object is bumped even though the `Spec` or `Status` of the `CiliumNode`
remains the same. This for examples happens when
`ObjectMeta.ManagedFields` is updated by the Kubernetes apiserver.

Unfortunately, `CiliumNode.DeepEqual` does _not_ consider any
`ObjectMeta` fields (including the resource version). Therefore two
objects with different resource versions are considered the same by the
`CiliumNode` watcher used by IPAM.

But to be able to successfully call `UpdateStatus` we need to know the
most recent resource version. Otherwise, `UpdateStatus` will always fail
until the `CiliumNode` object is updated externally for some reason.

Therefore, this commit modifies the logic to always store the most
recent version of the `CiliumNode` object, even if `Spec` or `Status`
has not changed.  This in turn allows `nodeStore.refreshNode` (which
invokes `UpdateStatus`) to always work on the most recently observed
resource version.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit 6c15df9 ]

Checking for the existence of the .Values.nodeinit.bootstrapFile
file will be a no-op because the file is created by kubelet if
it does not exist. Instead, we should check if the file has some
contents inside of it which is when we can be sure the node-init
DaemonSet has started.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit ea9fd6f ]

Turns out that in GKE's 'cos' images the 'containerd' binary is still
present even though '/etc/containerd/config.toml' is not. Hence, the
kubelet wrapper would still be installed for these images according to
the current check, even though it's not necessary. What's worse,
starting the kubelet would fail because the 'sed' command targeting the
aforementioned file would fail.

This PR changes the check to rely on the presence of the
'--container-runtime-endpoint' flag in the kubelet, which is probably a
more reliable way of detecting '*_containerd' flavours and only applying
the fix in these cases.

Fixes cilium#19015.

Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com>
Co-authored-by: Alexandre Perrin <alex@kaworu.ch>
Signed-off-by: Tam Mach <tam.mach@isovalent.com>
[ upstream commit e396b5e ]

The documentation page about setting up Hubble observability wrongly
states that TCP port 4245 needs to be open on all nodes running Cilium
to allow Hubble Relay to operate correctly. This is incorrect. Port 4245
is actually the default port used by Hubble Relay, which is a regular
deployment and doesn't require any particular action from the user.

However, Hubble server uses port 4244 by default and, given that it is
embedded in the Cilium agent and uses a host port, it requires it to be
open on all nodes to allow Hubble Relay to connect to each Hubble server
instance.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Tam Mach <tam.mach@isovalent.com>
@sayboras sayboras force-pushed the pr/v1.10-backport-2022-03-07 branch from fca8dd1 to f64a507 Compare March 10, 2022 12:23
@sayboras
Copy link
Member Author

Force pushed to resolve the conflict with v1.10 branch

@ti-mo
Copy link
Contributor

ti-mo commented Mar 11, 2022

/test-backport-1.10

Job 'Cilium-PR-K8s-1.19-kernel-5.4' hit: #17353 (91.08% similarity)

Job 'Cilium-PR-K8s-1.19-kernel-4.9' failed:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.9 so I can create one.

Job 'Cilium-PR-K8s-1.18-kernel-4.9' failed:

Click to show.

Test Name

K8sPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.18-kernel-4.9 so I can create one.

@sayboras sayboras removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Mar 14, 2022
@ti-mo
Copy link
Contributor

ti-mo commented Mar 15, 2022

Both 4.9 tests are failing with the following error:

The Service "echo-b" is invalid: spec.ports[0].nodePort: Invalid value: 31414: provided port is already allocated

Could this be related to the changes in the PR or is this a flake?


Edit: this is #13071.

@ti-mo
Copy link
Contributor

ti-mo commented Mar 15, 2022

/ci-multicluster-1.10


Multicluster workflow had a node preemption during the test, re-running.

@ti-mo
Copy link
Contributor

ti-mo commented Mar 15, 2022

/test-1.19-5.4


Connectivity test to the internet failed to connect.

@ti-mo
Copy link
Contributor

ti-mo commented Mar 15, 2022

/test-1.18-4.9

@ti-mo
Copy link
Contributor

ti-mo commented Mar 15, 2022

/test-1.19-4.9

@joestringer
Copy link
Member

Reviews are in, tests are passing except for codeql (not required, new reports due to updates in codeql) and known runtime job breakage #18919 . Merging, thanks!

@joestringer joestringer merged commit 302c805 into cilium:v1.10 Mar 17, 2022
@sayboras sayboras deleted the pr/v1.10-backport-2022-03-07 branch March 17, 2022 21:15
@qmonnet qmonnet mentioned this pull request Mar 30, 2022
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants