Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.10 backports 2022-12-06 #22582

Merged
merged 14 commits into from
Dec 14, 2022
Merged

Conversation

gandro
Copy link
Member

@gandro gandro commented Dec 6, 2022

🛑 Skipped due to conflicts:

Once this PR is merged, you can update the PR labels via:

$ for pr in 22002 22296 22304 20366 21990 22393 22461 22508; do contrib/backporting/set-labels.py $pr done 1.10; done

[ upstream commit d5ada37 ]

Since k8s had remove support for extensions/v1beta1 API version after 1.16, we should update the docs to the latest and stable version.

Signed-off-by: cleverhu <shouping.hu@daocloud.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro requested review from a team as code owners December 6, 2022 17:14
@gandro gandro added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels Dec 6, 2022
christarazi and others added 11 commits December 7, 2022 10:37
[ upstream commit 366b968 ]

The error check handling should be done immediately after the
GetProxyPort() call, in order to error out as soon as possible.

This unchecked error can cascade to code integrations with the Agent and
cause potentially difficult to track down behavior.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit f3a22e8 ]

When opening a PR to update the base images from external forks, the bot
does not have necessary permissions to push the changes into the fork.
For those cases the developer should amend the commit locally and push
the changes themselves.

Fixes: c5a7787 ("add auto-commit capability to build base images GH workflow")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 1b33ead ]

The number of returned CiliumClusterwideNetworkPolicies is known in
advance, so the preallocation of the backing array will avoid
reallocations after the append.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 22ba23e ]

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit fdc6d39 ]

Use the context from the GC controller to execute the update queries.
Doing so, possible pending queries will be cancelled as soon as the
controller context is cancelled.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 20bd519 ]

When the option `disable-cnp-status-updates` is set to true, no policy
enforcement update is tracked in CiliumNetworkPolicies.
However, if the option was previously set to false, the field
status.nodes still contains the last status of each node when the
feature was turned off.
Currently, the GC in the cilium operator removes status entries only if
the relative node has been turned off.

Given that these stale updates may hinder scalability for large
clusters, we clean up all those entries at startup if
`disable-cnp-status-updates` is set to true.

Fixes cilium#20231

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit c2c66a8 ]

When the option `disable-cnp-status-updates` is set to true, the
operator, at startup, will garbage collect all stale status nodes
updates in CNPs and CCNPs.

This new option `skip-cnp-status-startup-cleaning` may be used to skip
this clean up so to speed up the operator startup.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit f4ff2ce ]

When the option `disable-cnp-status-updates` is set to true, the
operator, at startup, will garbage collect all stale status nodes
updates in CNPs and CCNPs.

To avoid an excessive requests rate to the API server, the clean up is
rate limited. The requests rate per second and the maximum allowed burst
of requests is controlled, respectively, by the two new options
`cnp-status-cleanup-qps` and `cnp-status-cleanup-burst`.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 14babaf ]

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
…ctor

[ upstream commit c2d6908 ]

Previously, 'validate-cnp' preflight check would log a verbose warning
if it detected a CCNP with an empty toEndpoints/fromEndpoints selector and pass
the check with the following output:

time="2022-11-03T15:50:04Z" level=info msg="Validation OK!" CiliumClusterwideNetworkPolicy=test-empty-endpointselector
time="2022-11-03T15:50:04Z" level=info msg="All CCNPs and CNPs valid!"

This could be misleading and tempt the user to ignore the warning. The
preflight check will now fail with the following output:

time="2022-11-03T16:05:30Z" level=error msg="Unexpected validation error" CiliumClusterwideNetworkPolicy=test-empty-endpointselector error="use of empty toEndpoints/fromEndpoints selector"
time="2022-11-03T16:05:30Z" level=error msg="Start hook failed" error="Found invalid CiliumClusterwideNetworkPolicy" function="cilium/cmd.validateCNPCmd.func1.1 (preflight_k8s_valid_cnp.go:41)" subsys=hive
time="2022-11-03T16:05:30Z" level=info msg="Stop hook executed" duration="21.858µs" function="pkg/k8s/client.(*compositeClientset).onStop-fm (<autogenerated>:1)" subsys=hive
time="2022-11-03T16:05:30Z" level=fatal msg="failed to start: Found invalid CiliumClusterwideNetworkPolicy"

Fixes: cilium#17471

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit ffef1a8 ]

While a 404 Not Found or a 409 Conflict can be considered successful
interactions with the k8s API, a blanket accept for all 4xx codes is
problematic. Since LastSuccessInteraction is exclusively used as an
optimisation, we should err on the cautious side: accept the potential
increase in heartbeats to avoid missing being unable to effecticely
communicate with the k8s API.

As an example of how this can go wrong, in cilium#20915 we have an issue
around receiving 401 Unauthorized from the EKS control plane. At
sufficient scale, we never see a need to run the heartbeat. Running the
heartbeat, however, would close and reopen the connections on receiving
a 401, and thus restore connectivity to the k8s API.

We currently only use the LastSuccessInteraction to as an optimisation
to not perform unnecessary k8s API heartbeats, this "metric" (possibly a
misnomer) is not used or exposed and changing its semantics is
acceptable.

Fixes: f2998b0

Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
Co-authored-by: Sebastian Wicki <gandro@gmx.net>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 473ddda ]

Filtering PRs from external contributors will allow committers of the
Cilium project to give more attention to those PRs and avoid them to get
stale.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit baeb276 ]

A pull_request_target workflow generates pull_request events and not
issue events. This was causing all PRs to have the label
'kind/community-contribution' added to them.

Also remove the 'synchronize' event type since it's not necessary to
re-run this workflow for every push the PR author does.

Fixes: 473ddda (".github: add PR labeler for external contributions")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/v1.10-backport-2022-12-06 branch from ee39088 to b6d1fcd Compare December 7, 2022 09:38
@gandro
Copy link
Member Author

gandro commented Dec 7, 2022

Fixed a missing import in f40079c and had to remove the following PR, as it broke the documentation build:

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PR #20366 🚀

@gandro
Copy link
Member Author

gandro commented Dec 7, 2022

/test-backport-1.10

@gandro
Copy link
Member Author

gandro commented Dec 13, 2022

@gandro
Copy link
Member Author

gandro commented Dec 13, 2022

/ci-gke-1.10

@gandro
Copy link
Member Author

gandro commented Dec 13, 2022

/ci-multicluster-1.10

@gandro
Copy link
Member Author

gandro commented Dec 13, 2022

Discussed on Slack - the GKE is currently broken due to the version being too new for v1.10. Will be fixed outside of this PR.

@nbusseneau
Copy link
Member

Tentative fix PR is here: #22724

@gandro
Copy link
Member Author

gandro commented Dec 14, 2022

I'm marking this ready-to-merge. As mentioned above, the GKE pipelines (i.e. all 4 red ones) failures are unrelated to the PR and will be fixed in a separate PR.

@gandro gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 14, 2022
@pchaigno pchaigno merged commit 38718a5 into cilium:v1.10 Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants