Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2022-03-30 #19277

Merged
merged 22 commits into from Apr 4, 2022
Merged

v1.11 backports 2022-03-30 #19277

merged 22 commits into from Apr 4, 2022

Conversation

qmonnet
Copy link
Member

@qmonnet qmonnet commented Mar 30, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18011 18336 18857 18869 19104 19152 19071 19118 19216 19228 19229 19172 19165 19193 18609 19249 19162 19042; do contrib/backporting/set-labels.py $pr done 1.11; done

or with

$ make add-label BRANCH=v1.11 ISSUES=18011,18336,18857,18869,19104,19152,19071,19118,19216,19228,19229,19172,19165,19193,18609,19249,19162,19042

[ upstream commit 04c29ba ]

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
@qmonnet qmonnet requested review from a team as code owners March 30, 2022 15:00
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Mar 30, 2022
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and looks good for my changes 💯

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good. Thanks!

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes and db3a9af looks good to me. Thanks!

Copy link
Member

@jibi jibi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR looks good, thanks!

Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good. Thanks!

@qmonnet qmonnet force-pushed the pr/v1.11-backport-2022-03-30 branch from d80b2bb to 8fe003a Compare March 30, 2022 15:58
@qmonnet
Copy link
Member Author

qmonnet commented Mar 30, 2022

@jibi: I pushed again one of your commits, after renaming test/k8s/Node.go as test/k8sT/Node.go (note the T).

@qmonnet qmonnet requested a review from jibi March 30, 2022 16:00
@qmonnet
Copy link
Member Author

qmonnet commented Mar 30, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-1.23-kernel-4.9' failed:

Click to show.

Test Name

K8sServicesTest Checks E/W loadbalancing (ClusterIP, NodePort from inside cluster, etc) with L4 policy Tests NodePort with L4 Policy

Failure Output

FAIL: Request from testclient-9m9hf pod to service tftp://[fd04::12]:31100/hello failed

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-4.9 so I can create one.

Copy link
Member

@jibi jibi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@brb brb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes LGTM, thanks.

@qmonnet
Copy link
Member Author

qmonnet commented Apr 1, 2022

Failure seems to be #18977; Runtime tests are currently broken (#18919). ConformanceAKS is currently disabled.
/test-1.23-4.9

@qmonnet
Copy link
Member Author

qmonnet commented Apr 1, 2022

Cilium-PR-K8s-1.23-kernel-4.9 passed this time. Marking as ready-to-merge.

brb and others added 20 commits April 4, 2022 10:58
[ upstream commit a63e02e ]

While debugging a CI flake in which pod2pod@same host via ClusterIP
done by bpf_lxc TCP SYN was not making into the dst pod, I've noticed
that the corresponding CT entry had ProxyRedirect flag set. The test
case in which the flag happened didn't have any L7 CNP, but some
previous test case had it. So it's very likely that the CT entry was
reused for by the connection which failed.

Paul Chaignon pointed to a very similar issue [1].

For now, to avoid such flakes, clear the CT tables to avoid stale CT
entries reuse.

[1]: #17459

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit ab7ff52 ]

There is no need to redeploy Cilium in AfterAll of the "With ClusterIP
external access" test suite, as the next test case will deploy it
anyway.

It should improve the K8sServices test suite run time by a bit.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 6e2ce4a ]

Use the same condition in defining the label 'skip_policy_enforcement' as
for jumping to it. Otherwise we could jump to a nonexisting label given a
suitable datapath configuration. This does not seem possible now, as
ENABLE_HOST_SERVICES_FULL should not be defined if
ENABLE_SOCKET_LB_HOST_ONLY is defined. However, as the conditions for
defining ENABLE_PER_PACKET_LB evolve, this would become a compilation
bug.

Fixes: #17154
Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit d53795e ]

For this condition:

 !defined(ENABLE_ROUTING) && defined(TUNNEL_MODE) && !defined(ENABLE_NODEPORT)

the endif comment should be:

 /* !ENABLE_ROUTING && TUNNEL_MODE && !ENABLE_NODEPORT */

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 93bc479 ]

Switch from gopkg.in/natefinch/lumberjack.v2 to github.com/cilium/lumberjack/v2

This fork fixes log compression by using a temp file and then atomically
renaming to the final destination.

[ Backport notes: Some minor conflicts in pkg/ files on the includes, as
    well as in go.mod and go.sum. File
    pkg/logging/hooks/file_rotation.go is not present in this branch. ]

Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 75f597b ]

The Clustermesh-APIServer creates a CiliumEndPoint and sets
a node as its ownerReference, also setting blockOwnerDeletion
to "true". If the OwnerReferencesPermissionEnforcement admission
controller is enabled (such as in environments like Openshift) then
the Clustermesh-APIServer will fail to create the CiliumEndPoint
as it has insufficient privileges to set blockOwnerDeletion of
a node. It needs to be able to "update" "nodes/finalizers" in order
to do this. See #19053 for more details and references.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 085874c ]

This commit modifies the correct CIDR mask when wireguard agent init.
Fixes: #19106

Signed-off-by: Ye Sijun <junnplus@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit eb3662e ]

Signed-off-by: Sigurd Spieckermann <sigurd.spieckermann@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 8b67e74 ]

Quay changed its API on the 22nd. This commit updates the step that
checks for the Docker images in all workflows to use the new API.

[ Backport notes: Only update the few files that are present on this
    branch. Original commit would touch more workflows, but they're on
    the current branch. ]

Suggested-by: André Martins <andre@cilium.io>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 277ed8c ]

This is a follow up to 8b67e74 ("workflows: Update call to Quay API").
We also need to change the Quay API calls we're doing in the
Jenkinsfiles.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 96e7fa6 ]

Because string map options are configured `flags.Var`, they are assigned
twice: Once in `flags.Var`, and then again in `option.Config.Populate()`
with `GetStringMapString`. The latter is needed because it works around
a viper bug where ConfigMap options were not properly parsed
(#18328). This commit ensures we're always using the result
of the fallback parser, which fixes a bug where a ConfigMap value of
`{}` was parsed as `map["{}":""]`.

Fixes: 768659f ("cmd: Fix issue reading string map type via config map")

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 1a34bc5 ]

This test already passes, but it's a good corner case to test to avoid
running into regressions.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f7bb659 ]

One of the main callers of the K8s cache sync checker is the logic of
the kube-apiserver policy feature. However, if K8s is disabled, this is
all moot and the kube-apiserver policy feature, well, isn't relevant.
Bypass the check on the cache channel if K8s is disabled.

The reason why this commit chooses to return true here instead of at the
ipcache.InjectLabels() call site is because the ipcache package would
have an import cycle with the k8s package. The logic implemented in this
commit isn't in the ideal location, but until the k8s & ipcache packages
are reasonably well isolated from each other, then this is the only
approach to workaround the problem.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 2efbdd6 ]

Local Redirect Policy (LRP) namespace needs to match
with the backend pods selected by the LRP.

This check was missing in the case where backend
pods are deployed after an LRP that selects them
was applied.

Added unit tests.

Reported-by: Joe Stringer <joe@covalent.io>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 01cbc59 ]

This commit ensures that any update to the labels of a k8s Node object
is reflected in the corresponding CiliumNode one.

[ Backport notes:
    - Minor conflicts in pkg/k8s/watchers/node.go.
    - test/k8s/Node.go renamed as test/k8sT/Node.go. ]

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f5a8eea ]

[ Backport notes: Minor conflicts in pkg/k8s/watchers/cilium_node.go. ]

Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit ffa5d48 ]

Signed-off-by: ivan <i.makarychev@tinkoff.ru>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 4e422e3 ]

The IP rules for ENI can have two forms, with or without the to
x.x.x.x/x part. There is a bug between the creation and deletion of
rules, where the conditions to decide which form to use aren't the same.
As the result, we would attempt to remove the wrong IP rules on pod
deletion, causing a warning message in logs and leaving a stale IP rule
behind:

    level=warning msg="No rule matching found" rule="111: from 100.69.11.94/32 to 10.244.0.0/16 lookup 0" subsys=linux-routing

In case a new rule is later added for a new pod with the same IP, we can
end up with a conflict if the new pod is on a different ENI. We will
then have two rules for the same IP with lookups in different rouing
tables.

    111: from 10.19.192.168 lookup 10
    111: from 10.19.192.168 lookup 11

This conflict can then cause connectivity issues because of pod traffic
leaving through the wrong interface.

Flag --aws-release-excess-ips needs to be enabled for the bug to occur.
Otherwise the IP address stays attached to the same ENI and we don't end
up with an IP rule conflict.

Reported-by: Sebastian Wicki <gandro@gmx.net>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 298ece3 ]

In addition to the inconsistency fixed in the previous commit, there is
another we need to fix. On the deletion of ip rules with the form "to
x.x.x.x/x", we check that we are running in ENI mode. We don't perform
the same check on the creation of those rules. Thus, on AKS and
AlibabaCloud, we could theoretically create rules of one form and
attempt to delete the other form.

In practice, this isn't causing a bug because:
- The CIDR used in the "to x.x.x.x/x" form appears to always be
  0.0.0.0/0 and is therefore equivalent to not having a to CIDR
  specified.
- We don't support equivalent flags to --aws-release-excess-ips on AKS
  and AlibabaCloud and, as detailed in the previous commit, that flag is
  necessary to trigger the bug.

Thus, this commit is fixing an inconsistency in the code that doesn't
have any consequence today.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit f046521 ]

The changes in previous PR #18766 was assumed shared service annotation
is for local cluster, rather than for remote cluster.

This commit is to perform the adjustment. It's easier to explain with
concrete example. Let's say we have one clusters A and B, if the service
in cluster B is having shared-service: false annotation added, below
behavior will happen:

- requests to cluster A service will LB to local endpoint only, as
  service B is no longer shared.
- requests to cluster B service will LB to local and remote endpoints
  as usual.

Relates #18766

Signed-off-by: Tam Mach <tam.mach@isovalent.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
@qmonnet qmonnet force-pushed the pr/v1.11-backport-2022-03-30 branch from 8fe003a to 885f97f Compare April 4, 2022 09:59
@qmonnet
Copy link
Member Author

qmonnet commented Apr 4, 2022

/test-only --focus="K8sServicesTest.*With ClusterIP external access"

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backport of #18336 looks good to me.

@pchaigno
Copy link
Member

pchaigno commented Apr 4, 2022

/test-backport-1.11

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my part :-)

Copy link
Contributor

@chancez chancez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backport of #19152 7342c4b looks good to me

@pchaigno
Copy link
Member

pchaigno commented Apr 4, 2022

All backports with conflcits have been reviewed. All required tests are passing except for Runtime which is broken on all stable branches (cf. #18919). A bunch of other tests are failing but that's because I mistakenly triggered the master tests with /test. Merging.

@pchaigno pchaigno merged commit b3d4e4c into v1.11 Apr 4, 2022
@pchaigno pchaigno deleted the pr/v1.11-backport-2022-03-30 branch April 4, 2022 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet