Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove deadlock from AuthMap Endpoint GC #29082

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

meyskens
Copy link
Member

@meyskens meyskens commented Nov 9, 2023

This changes that a create and delete event from the EndpointManager will also give a "state of the world" update of all current endpoints it has.
This allows the subscribers to check against other endpoints without calling a function that issues a lock when the event was send inside an RLock.

This is then used in the AuthMap GC code instead of doing a call that caused a deadlock.

Fixes: #29078

Fix potential deadlock that results in stale authentication entries in Cilium

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 9, 2023
@meyskens meyskens added kind/bug This is a bug in the Cilium logic. area/servicemesh GH issues or PRs regarding servicemesh feature/authentication labels Nov 9, 2023
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of suggestions; but yep I believe this fixes the deadlock 👍

pkg/endpointmanager/subscribe.go Outdated Show resolved Hide resolved
pkg/endpointmanager/manager.go Outdated Show resolved Hide resolved
pkg/endpointmanager/subscribe.go Show resolved Hide resolved
@christarazi
Copy link
Member

For the release note, typically we'd want to describe the impact of the bug on the user, rather than a description of what the bug is. For example, we'd want to say something like "Fix potential deadlock that results in stale authentication entries in Cilium".

@meyskens meyskens force-pushed the meyskens/auth-ep-gc-locks branch 3 times, most recently from b44c74a to 55a2d96 Compare November 22, 2023 12:28
@meyskens meyskens added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Nov 22, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 22, 2023
@meyskens meyskens force-pushed the meyskens/auth-ep-gc-locks branch 2 times, most recently from 1837248 to 1e26bbf Compare November 22, 2023 13:50
@meyskens
Copy link
Member Author

/test

@meyskens meyskens marked this pull request as ready for review November 22, 2023 16:34
@meyskens meyskens requested review from a team as code owners November 22, 2023 16:34
@meyskens
Copy link
Member Author

/test

Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Glad we ended up no longer having a dependency on the endpointrepo from the authmap cache at all. 🙏

Only some non-blocking suggestions that i like to discuss.

pkg/auth/authmap_gc.go Outdated Show resolved Hide resolved
pkg/auth/authmap_gc.go Show resolved Hide resolved
pkg/auth/authmap_gc.go Show resolved Hide resolved
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also solves the deadlock as far as I can tell 👍

pkg/auth/authmap_gc.go Show resolved Hide resolved
pkg/auth/authmap_gc.go Outdated Show resolved Hide resolved
This change keeps a copy of the endpoints map inside the Autentication
GC code. It will be updated on subscribe events,

This is then used in the AuthMap GC code instead of doing a call that
caused a deadlock.

Signed-off-by: Maartje Eyskens <maartje.eyskens@isovalent.com>
Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go from my side! Thanks @meyskens

@mhofstetter
Copy link
Member

/test

@@ -10,11 +10,15 @@ type Subscriber interface {
// EndpointCreated is called at the end of endpoint creation.
// Implementations must not attempt write operations on the
// EndpointManager from this callback.
// This function is being called inside a RLock, so it must not attempt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just an fyi: We typically follow the convention - EndpointCreatedLocked.

@brb
Copy link
Member

brb commented Dec 6, 2023

@aanm Could we merge this PR? I've just tried running it with the latest main - #29658. The failing ci-l4lb is resolved.

@aanm aanm merged commit 9e76d20 into cilium:main Dec 6, 2023
60 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/servicemesh GH issues or PRs regarding servicemesh feature/authentication kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EndpointManager self-deadlocks via AuthMap GC
7 participants