-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon: don't go ready until CNI configuration has been written #32168
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
squeed
added
kind/bug
This is a bug in the Cilium logic.
area/cni
Impacts the Container Networking Interface between Cilium and the orchestrator.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
labels
Apr 24, 2024
If the daemon is configured to write a CNI configuration file, we should not go ready until that CNI configuration file has been written. This prevents a race condition where the controller removes the taint from a node too early, meaning pods may be created with a different CNI provider. In cilium#29405, Cilium was configured in chaining mode, but the "primary" CNI provider hadn't written its configuration yet. This caused the not-ready taint to be removed from the node too early, and pods were created in a bad state. By hooking in the CNI cell's status in the daemon's Status type, we prevent the daemon's healthz endpoint from returning a successful response until the CNI cell has been successful. Fixes: cilium#29405 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
squeed
force-pushed
the
cni-chained-readiness
branch
from
April 24, 2024 15:08
f03e628
to
ec776ff
Compare
squeed
added
the
needs-backport/1.15
This PR / issue needs backporting to the v1.15 branch
label
Apr 24, 2024
(removed some api changes I realized I didnt' want) |
nebril
approved these changes
Apr 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
/test |
maintainer-s-little-helper
bot
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Apr 25, 2024
Awesome! Will this be backported to 1.14 too? |
1.15 for sure, I’m not sure about 1.14 yet |
squeed
added
the
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
label
Apr 26, 2024
Looks like this applies mostly cleanly to v1.14 too; marked as such. |
gandro
added
backport-pending/1.15
The backport for Cilium 1.15.x for this PR is in progress.
and removed
needs-backport/1.15
This PR / issue needs backporting to the v1.15 branch
labels
Apr 29, 2024
gandro
added
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
and removed
needs-backport/1.14
This PR / issue needs backporting to the v1.14 branch
labels
Apr 30, 2024
github-actions
bot
added
backport-done/1.14
The backport for Cilium 1.14.x for this PR is done.
backport-done/1.15
The backport for Cilium 1.15.x for this PR is done.
and removed
backport-pending/1.14
The backport for Cilium 1.14.x for this PR is in progress.
backport-pending/1.15
The backport for Cilium 1.15.x for this PR is in progress.
labels
May 2, 2024
This was referenced May 10, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/cni
Impacts the Container Networking Interface between Cilium and the orchestrator.
backport-done/1.14
The backport for Cilium 1.14.x for this PR is done.
backport-done/1.15
The backport for Cilium 1.15.x for this PR is done.
kind/bug
This is a bug in the Cilium logic.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If the daemon is configured to write a CNI configuration file, we should not go ready until that CNI configuration file has been written. This prevents a race condition where the controller removes the taint from a node too early, meaning pods may be created with a different CNI provider.
In #29405, Cilium was configured in chaining mode, but the "primary" CNI provider hadn't written its configuration yet. This caused the not-ready taint to be removed from the node too early, and pods were created in a bad state.
By hooking in the CNI cell's status in the daemon's Status type, we prevent the daemon's healthz endpoint from returning a successful response until the CNI cell has been successful.
Fixes: #29405