Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator: Fix logic used to sync Cilium's IngressClass on startup #28663

Merged

Conversation

learnitall
Copy link
Contributor

@learnitall learnitall commented Oct 17, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Please see the commit for more details. The TL;DR here is that the Ingress Controller in the Cilium Operator will always assume that Cilium is the non-default IngressClass on startup. This is because the Ingress Controller processes events from its Ingress resource Informer before processing events from its IngressClass Informer (which is a part of the ingressClassManager).

This bug may cause two full reconciliations of Ingress resources on the startup of the operator: one after the Ingress resource Informer is synced, and one after the ingress controller learns Cilium should act as the default IngressClass in the cluster.

Fix incorrect logic used by the Ingress Controller to sync Cilium's IngressClass on startup.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 17, 2023
@learnitall learnitall force-pushed the pr/learnitall/ingress-operator-fixes branch 2 times, most recently from 186fc04 to d1f8d79 Compare October 18, 2023 20:04
@learnitall learnitall changed the title operator: Fix issues impacting IngressClass synching operator: Fix logic used to sync Cilium's IngressClass on startup Oct 18, 2023
@learnitall learnitall added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Oct 18, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 18, 2023
@learnitall learnitall added area/operator Impacts the cilium-operator component dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. area/servicemesh GH issues or PRs regarding servicemesh labels Oct 18, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 18, 2023
@learnitall learnitall added needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 18, 2023
@learnitall learnitall force-pushed the pr/learnitall/ingress-operator-fixes branch 2 times, most recently from 0e2d78d to b116bc7 Compare October 18, 2023 20:52
@learnitall learnitall marked this pull request as ready for review October 19, 2023 16:17
@learnitall learnitall requested review from a team as code owners October 19, 2023 16:17
@learnitall
Copy link
Contributor Author

/test

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I've done a first pass (skipped the tests, for now) and left a question to better understand the reason behind the changes.

operator/pkg/ingress/ingress.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
@learnitall learnitall force-pushed the pr/learnitall/ingress-operator-fixes branch from b116bc7 to 9844695 Compare October 20, 2023 21:16
Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this @learnitall

Some initial questions & proposals.

operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/k8s/resource_ctors.go Show resolved Hide resolved
@learnitall learnitall force-pushed the pr/learnitall/ingress-operator-fixes branch from 9844695 to 2b49098 Compare October 23, 2023 18:29
@learnitall
Copy link
Contributor Author

@pippolo84 @mhofstetter I made some changes to address your feedback. Let me know what you think!

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I like the way you changed the Run method in the ingress controller, I think this is effective and simpler!

I've left some nits about the test, but nothing blocking. Only thing left to address is the check on the ingressClassEvents channel.

operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class_test.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class_test.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class_test.go Show resolved Hide resolved
operator/pkg/ingress/ingress_class_test.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class.go Outdated Show resolved Hide resolved
operator/pkg/ingress/ingress_class_test.go Outdated Show resolved Hide resolved
This commit introduces changes to the ingress class manager piece of the
ingress controller, in order to address bugs impacting the proper
synching of Cilium's IngressClass during start up. The following changes
are made:

* Replace use of an Informer with Resource[T] for IngressClass. This
  helps simplify the logic used to perform the initial sync.
* Move the responsibility of tracking if Cilium should act as the default
  IngressClass into the ingress class manager, rather than having the
  ingress controller track this itself when processing IngressClass
  events.

After the ingress class manager is constructed, the ingress controller
will be able to determine if Cilium is the default IngressClass for a
cluster through the ingress class manager. The ingress controller no
longer has to wait to process an event for Cilium's IngressClass to
learn if Cilium should be the default.

Before this commit, the ingress controller would process all Ingress
resources before processing IngressClass resources. This is because the
Ingress resource informer would be started before the ingress class
manager, so all events related to Ingress resources would appear in the
ingress controller's event queue before events relating to IngressClass
resources. This presented a problem, because the ingress controller
would always believe that it was not the default IngressClass for a
cluster on startup while processing each Ingress resource for the first
time. This could lead to the following situation:

1. The ingress controller processes all Ingress resources.
2. The ingress controller processes IngressClass resources, and learns
   that it should act as the default IngressClass for the cluster.
3. A resync of Ingress resources is triggered.

This double-sync overhead can act as a problem for large-scale clusters.

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! 💯

Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks again!

@mhofstetter
Copy link
Member

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.4 Oct 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.9 Oct 25, 2023
@dylandreimerink dylandreimerink merged commit 2cfc825 into cilium:main Oct 25, 2023
61 of 62 checks passed
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
9 tasks
@pippolo84 pippolo84 added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 30, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.4 Oct 30, 2023
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
6 tasks
@learnitall learnitall removed the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Oct 30, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed this from Needs backport from main in 1.13.9 Oct 30, 2023
learnitall added a commit to learnitall/cilium that referenced this pull request Nov 3, 2023
The ingressClassManager emits a warning log while handling an Upsert
event on IngressClasses, however, this log should be at the debug level.

This log was set to the warning level while testing cilium#28663 and never
changed back to debug.

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
@jibi jibi added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Nov 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.4 Nov 7, 2023
learnitall added a commit to learnitall/cilium that referenced this pull request Nov 21, 2023
The ingressClassManager emits a warning log while handling an Upsert
event on IngressClasses, however, this log should be at the debug level.

This log was set to the warning level while testing cilium#28663 and never
changed back to debug.

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch area/operator Impacts the cilium-operator component area/servicemesh GH issues or PRs regarding servicemesh backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.14.4
Backport done to v1.14
Development

Successfully merging this pull request may close these issues.

None yet

5 participants