Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential deadlock in pod identity updates #16801

Merged

Conversation

jrajahalme
Copy link
Member

@jrajahalme jrajahalme commented Jul 6, 2021

Make IdentitySelectionUpdated() callbacks lock-free by queueing them
while still holding selectorcache lock (to keep FIFO order) and
calling from a goroutine not holding any locks. This prevents
deadlocks caused by the implementation of IdentitySelectionUpdated()
taking locks such as endpoint or selectorcache locks.

Potential deadlock in pod identity updates has been fixed.

Fixes: #16236
Fixes: #16300

@jrajahalme jrajahalme added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.8 labels Jul 6, 2021
@jrajahalme jrajahalme requested a review from a team July 6, 2021 04:32
@jrajahalme jrajahalme requested a review from a team as a code owner July 6, 2021 04:32
@jrajahalme jrajahalme requested a review from a team July 6, 2021 04:32
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.10.3 Jul 6, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.9.9 Jul 6, 2021
@jrajahalme jrajahalme requested review from jibi and aditighag July 6, 2021 04:32
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.11 Jul 6, 2021
@jrajahalme jrajahalme marked this pull request as draft July 6, 2021 04:32
@jrajahalme jrajahalme requested a review from kkourt July 6, 2021 04:32
@jrajahalme jrajahalme force-pushed the pr/jrajahalme/policy-selectorcache-deadlock branch from 8faac41 to 9b5af47 Compare July 6, 2021 05:04
@jrajahalme
Copy link
Member Author

test-me-please

@aanm
Copy link
Member

aanm commented Jul 6, 2021

the unit tests have failed:

 	 # github.com/cilium/cilium/pkg/fqdn/dnsproxy [github.com/cilium/cilium/pkg/fqdn/dnsproxy.test]
	 pkg/fqdn/dnsproxy/proxy_test.go:184:36: not enough arguments in call to testSelectorCache.UpdateIdentities
	 	have ("github.com/cilium/cilium/pkg/identity/cache".IdentityCache, nil)
	 	want ("github.com/cilium/cilium/pkg/identity/cache".IdentityCache, "github.com/cilium/cilium/pkg/identity/cache".IdentityCache, *sync.WaitGroup)
	 make: *** [tests-privileged] Error 1

@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

the unit tests have failed:

 	 # github.com/cilium/cilium/pkg/fqdn/dnsproxy [github.com/cilium/cilium/pkg/fqdn/dnsproxy.test]
	 pkg/fqdn/dnsproxy/proxy_test.go:184:36: not enough arguments in call to testSelectorCache.UpdateIdentities
	 	have ("github.com/cilium/cilium/pkg/identity/cache".IdentityCache, nil)
	 	want ("github.com/cilium/cilium/pkg/identity/cache".IdentityCache, "github.com/cilium/cilium/pkg/identity/cache".IdentityCache, *sync.WaitGroup)
	 make: *** [tests-privileged] Error 1

Fixed, thanks!

@jrajahalme jrajahalme force-pushed the pr/jrajahalme/policy-selectorcache-deadlock branch from b9354d3 to c45d00f Compare July 9, 2021 05:46
@jrajahalme
Copy link
Member Author

test-me-please

@jrajahalme
Copy link
Member Author

Test flake #14598 hit in test-1.21-4.9, seems unrelated (deathstar.default.svc.cluster.local not resolved by kube-dns even though the service exists and has 3 backends).

@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport done to v1.10 in 1.10.3 Jul 15, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.9 in 1.9.9 Jul 15, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.9 to Backport done to v1.9 in 1.9.9 Jul 16, 2021
@joestringer joestringer added this to Needs backport from master in 1.8.12 Jul 22, 2021
@joestringer joestringer removed this from Needs backport from master in 1.8.11 Jul 22, 2021
@joestringer joestringer moved this from Needs backport from master to Backport done to v1.8 in 1.8.12 Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
1.10.3
Backport done to v1.10
1.8.12
Backport done to v1.8
1.9.9
Backport done to v1.9
6 participants