Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint: trigger k8s sync controller on identity update #16381

Merged
merged 2 commits into from Jun 2, 2021

Conversation

jibi
Copy link
Member

@jibi jibi commented Jun 1, 2021

When an endpoint's identity is updated, Cilium does not sync immediately
the new state with k8s, but rather waits up to 10 seconds for the
sync-to-k8s-ciliumendpoint controller to run, meaning that the the new
identity can remain unannounced for up to 10 seconds.

This commit fixes this by explicitly triggering the k8s sync controller
whenever an endpoint's identity is updated.

Fixes #15097

@jibi jibi added sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. release-note/bug This PR fixes an issue in a previous release of Cilium. labels Jun 1, 2021
@jibi jibi requested review from gandro, aanm and a team June 1, 2021 13:43
@jibi jibi requested a review from a team as a code owner June 1, 2021 13:43
@jibi jibi requested a review from a team June 1, 2021 13:43
@maintainer-s-little-helper maintainer-s-little-helper bot assigned aanm and gandro and unassigned aanm Jun 1, 2021
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for taking care of this! Looks good overall. I have some minor nits (none of which need to be blocking)

@@ -135,7 +136,7 @@ func (m *Manager) removeController(ctrl *Controller) {
ctrl.getLogger().Debug("Removed controller")
}

func (m *Manager) lookup(name string) *Controller {
func (m *Manager) Lookup(name string) *Controller {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action required, just a thought:

One thing that I just noticed is that this export here is the first export that gives access to a Controller struct outside of the package. All other methods of the Manager do not allow direct access to the Controller. While it seems safe (i.e. it seems that m.mutex is not protecting it), I wonder if it might be cleaner and more future-proof to expose a TriggerController(name string) method instead (similar to TerminationChannel), instead of allowing a a direct lookup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense, that's where (e *Endpoint) TriggerController(name string) should actually live 👍

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved
pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi jibi force-pushed the pr/jibi/ipcache-propagation-delay branch from a3e881a to 26d3fca Compare June 1, 2021 14:25
@jibi
Copy link
Member Author

jibi commented Jun 1, 2021

test-me-please

@aanm
Copy link
Member

aanm commented Jun 1, 2021

@jibi can you add the GH issue ID this fixes?

@jibi
Copy link
Member Author

jibi commented Jun 1, 2021

@jibi can you add the GH issue ID this fixes?

@aanm it's in the second commit (I only linked it to #15097 as given the latest comment I don't think this fully fixes also #16302)

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have one suggestion below, but LMK what you think on it.

pkg/endpoint/endpoint.go Outdated Show resolved Hide resolved
@aanm
Copy link
Member

aanm commented Jun 1, 2021

@jibi can you add the GH issue ID this fixes?

@aanm it's in the second commit (I only linked it to #15097 as given the latest comment I don't think this fully fixes also #16302)

@jibi thank you, I meant in the PR description so that GH will close them automatically as soon this PR is merged.

When an endpoint's identity is updated, Cilium does not sync immediately
the new state with k8s, but rather waits up to 10 seconds for the
sync-to-k8s-ciliumendpoint controller to run, meaning that the the new
identity can remain unannounced for up to 10 seconds.

This commit fixes this by explicitly triggering the k8s sync controller
whenever an endpoint's identity is updated.

Fixes: #15097
Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi jibi force-pushed the pr/jibi/ipcache-propagation-delay branch from 26d3fca to 35b2ed8 Compare June 2, 2021 06:34
@jibi
Copy link
Member Author

jibi commented Jun 2, 2021

@jibi can you add the GH issue ID this fixes?

@aanm it's in the second commit (I only linked it to #15097 as given the latest comment I don't think this fully fixes also #16302)

@jibi thank you, I meant in the PR description so that GH will close them automatically as soon this PR is merged.

My bad, I thought GH would look for that also in commit messages 🤦‍♂️

@jibi
Copy link
Member Author

jibi commented Jun 2, 2021

test-me-please

@jibi
Copy link
Member Author

jibi commented Jun 2, 2021

retest-5.4

@nbusseneau
Copy link
Member

Added for backporting to 1.10, otherwise we'll continue having these flakes in 1.10 automated jobs ad vitam æternam.

This was referenced Jun 16, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.9 Jul 7, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.9 Jul 14, 2021
@joestringer joestringer added this to Backport pending to v1.9 in 1.9.10 Jul 19, 2021
@joestringer joestringer removed this from Backport pending to v1.9 in 1.9.9 Jul 19, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Backport done to v1.9 in 1.9.9 Jul 19, 2021
@joestringer joestringer removed this from Backport pending to v1.9 in 1.9.10 Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
No open projects
1.9.9
Backport done to v1.9
Development

Successfully merging this pull request may close these issues.

CI: AutoDirectNodeRoutes Check connectivity with automatic direct nodes routes
8 participants