Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s: Defer marking node as ready to just API is served #10767

Merged
merged 1 commit into from Mar 31, 2020

Conversation

tgraf
Copy link
Member

@tgraf tgraf commented Mar 30, 2020

The Kubernetes node was marked as ready after the daemon was finished
initializing. There were still several operations that could fail after
that point which could lead to a situation in which a node was marked
ready while the Cilium agent then later errored out, leading to a ready
but failing node.

Move the marking of the node readiness to the very end of the
bootstrapping.

Fixes: #10762

The Kubernetes node was marked as ready after the daemon was finished
initializing. There were still several operations that could fail after
that point which could lead to a situation in which a node was marked
ready while the Cilium agent then later errored out, leading to a ready
but failing node.

Move the marking of the node readiness to the very end of the
bootstrapping.

Fixes: #10762

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf added pending-review release-note/bug This PR fixes an issue in a previous release of Cilium. labels Mar 30, 2020
@tgraf tgraf requested review from a team March 30, 2020 21:40
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.8.0 Mar 30, 2020
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 45.491% when pulling b3a28c1 on pr/tgraf/move-node-marking into d933cfe on master.

@tgraf
Copy link
Member Author

tgraf commented Mar 30, 2020

test-me-please

All tests passed except for known flake:

  • Suite-k8s-1.18.K8sPolicyTest Basic Test Denies traffic with k8s default-deny ingress policy

@tgraf tgraf requested a review from aanm March 31, 2020 09:10
@tgraf tgraf added kind/bug This is a bug in the Cilium logic. needs-backport/1.6 labels Mar 31, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.7.2 Mar 31, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.6.9 Mar 31, 2020
@pchaigno
Copy link
Member

pchaigno commented Mar 31, 2020

test-me-please

CI was hitting #10760 (specifically Suite-k8s-1.18.K8sPolicyTest Basic Test Denies traffic with k8s default-deny ingress policy).

@aanm aanm merged commit 948659f into master Mar 31, 2020
1.8.0 automation moved this from In progress to Merged Mar 31, 2020
@aanm aanm deleted the pr/tgraf/move-node-marking branch March 31, 2020 14:37
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.2 Apr 1, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.2 Apr 1, 2020
@joestringer joestringer moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.2 Apr 7, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.6 in 1.6.9 Apr 30, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.6 to Backport done to v1.6 in 1.6.9 May 13, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.6 to Backport done to v1.6 in 1.6.9 May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.6.9
Backport done to v1.6
1.7.2
Backport done to v1.7
1.8.0
  
Merged
Development

Successfully merging this pull request may close these issues.

NetworkUnavailable is set to false too early
6 participants