[v1.12] Backport of #26242 (don't hold EP lock while generating policy) #29408

bimmlerd · 2023-11-27T15:18:39Z

endpoint: don't hold the endpoint lock while generating policy #26242 (@squeed)

Once this PR is merged, a GitHub action will update the labels of these PRs:

Rationale

In the following, the notation SelectorCache > NameManager means that the selector cache lock is held while the NameManager lock is being acquired. When a lock is held when another is acquired, it creates an edge in the graph of lock dependencies. If there is a cycle in lock dependencies, a wait cycle (or deadlock) is possible.

In this case, the wait cycle in question is the following:

IPCache > EP > NameManager > IPCache

The IPCache > EP link stems from the following stack. Note that EP here stands for any endpoint, as they are all locked in sequence. I believe this is related to replacing the kube API server.

github.com/cilium/cilium/pkg/endpointmanager.(*EndpointManager).UpdatePolicyMaps() <-- takes _all_ EP locks
github.com/cilium/cilium/pkg/ipcache.(*IPCache).UpdatePolicyMaps() <-- t
github.com/cilium/cilium/pkg/ipcache.(*IPCache).removeLabelsFromIPs(...) <-- takes IPCache lock
github.com/cilium/cilium/pkg/ipcache.(*IPCache).RemoveLabelsExcluded(...)
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).handleKubeAPIServerServiceEPChanges(...)
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).addKubeAPIServerServiceEPSliceV1(...)
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).updateK8sEndpointSliceV1(...)

The next link in the chain, EP > NameManager, occurs in the path of EP regeneration when dealing with FQDN selectors. Specifically, the SelectorCache wants to lock NameManager early (due to lock ordering constraints, cf 0a07c9b).

github.com/cilium/cilium/pkg/fqdn.(*NameManager).Lock() <--- NameManager lock
github.com/cilium/cilium/pkg/policy.(*SelectorCache).AddFQDNSelector(...)
github.com/cilium/cilium/pkg/policy.(*L4Filter).cacheFQDNSelector(...)
github.com/cilium/cilium/pkg/policy.(*L4Filter).cacheFQDNSelectors(...)
github.com/cilium/cilium/pkg/policy.createL4Filter(...)
github.com/cilium/cilium/pkg/policy.createL4EgressFilter(...)
github.com/cilium/cilium/pkg/policy.mergeEgressPortProto(...)
github.com/cilium/cilium/pkg/policy.mergeEgress.func1(...)
github.com/cilium/cilium/pkg/policy/api.PortRules.Iterate(...)
github.com/cilium/cilium/pkg/policy.mergeEgress(...)
github.com/cilium/cilium/pkg/policy.(*rule).resolveEgressPolicy(...)
github.com/cilium/cilium/pkg/policy.ruleSlice.resolveL4EgressPolicy(...)
github.com/cilium/cilium/pkg/policy.(*Repository).resolvePolicyLocked(...)
github.com/cilium/cilium/pkg/policy.(*PolicyCache).updateSelectorPolicy(...)
github.com/cilium/cilium/pkg/policy.(*PolicyCache).UpdatePolicy(...)
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regeneratePolicy(...)
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(...) <--- takes EP lock
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(...)

Finally, NameManager > IPCache stems from the incremental policy path used by FQDN. UpdateGenerateDNS locks the NameManager for the duration of the method, and calls into IPCache for CIDR allocation.

github.com/cilium/cilium/pkg/ipcache.(*IPCache).AllocateCIDRs(...) <--- rlocks metadata, locks IPCache
github.com/cilium/cilium/pkg/ipcache.(*IPCache).AllocateCIDRsForIPs(...)
github.com/cilium/cilium/daemon/cmd.cachingIdentityAllocator.AllocateCIDRsForIPs(...)
github.com/cilium/cilium/daemon/cmd.identitiesForFQDNSelectorIPs(...)
github.com/cilium/cilium/daemon/cmd.(*Daemon).updateSelectors(...)
github.com/cilium/cilium/pkg/fqdn.(*NameManager).UpdateGenerateDNS(...) <--- locks the NameManager
github.com/cilium/cilium/daemon/cmd.(*Daemon).notifyOnDNSMsg(...)

[ upstream commit 3ca309d ] [ backporter's notes: 1. adjust types.NamedPortMap -> policy.NamedPortMap 2. EndpointInfoSource is in proxy/logger/epinfo.go instead of proxy/endpoint/endpoint.go 3. Go1.18 compat: atomic.Pointer -> atomic.Value ] This function is called deep within the policy generation hierarchy, and is at a risk of causing deadlocks. Given that it's just reading a pointer to a never-mutated map, we can safely stash this behind an atomic Pointer and remove the lock. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

[ upstream commit e20b16d ] [ backporter's notes: 1. manager_test: we don't yet have ipcache test infra structure, use the real thing instead (as other tests do) 2. go1.18 compat: atomic.Pointer -> atomic.Value ] It turns out that most of the endpoint identities, e.g. pod name / namespace, are actually immutable. So, there's no need to grab a lock before reading them. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

[ upstream commit b63115b ] These methods are no longer used; remove them from the EndpointInfoSource interface. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

[ upstream commit f048a6a ] As preparation for other refactors of the policy engine, no longer hold the endpoint lock while calculating policy. This is safe to do, since the only input is the endpoint's security identity. Furthermore, if, somehow, policy were to be calculated in parallel, we can reject an update if its revision is too old. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

[ upstream commit 9623641 ] This ensures the generated ID works like IDs allocated normally - that their LabelArray is set. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

[ upstream commit 8e163a9 ] This adds a small test that ensures incremental updates are never lost, even in the face of significant identity churn. It simulates a churning ipcache flinging identities in to the policy engine, and similarly recalculates policy constantly. Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>

bimmlerd · 2023-11-28T10:36:05Z

/test-backport-1.12

bimmlerd · 2023-11-28T14:15:54Z

CI triage:

test-1.16-4.9 failed with endpoint 732 SecurityIdentity changed during policy regeneration logged at error severity. This, AFAICT, is now expected to happen and shouldn't fail the test. Rerunning, might look into fixing this upstream.
test-1.24-net-next failed in K8sServicesTest Checks N/S loadbalancing Tests with XDP, direct routing, Hybrid and Random with a curl TFTP request failing. The service state of cilium looks as expected, and there is no policy drop of the flow in the hubble logs. I'm chalking this up to kubectl exec 'curl' failing, and will rerun.

bimmlerd · 2023-11-28T14:16:06Z

/test-1.24-net-next

bimmlerd · 2023-11-28T14:16:18Z

/test-1.16-4.9

squeed · 2023-11-28T15:59:45Z

I recall seeing the error message SecurityIdentity changed during policy regeneration during development of this PR, and I did something to prevent it from happening. It is escaping me, however, what exactly the fix was.

squeed · 2023-11-29T12:43:53Z

I did a lot of debugging, and I determined that the error message popped up because the endpoint in question (actually the local node's endpoint) changed labels precisely during regeneration. Bad luck. I believe it is safe to ignore this error message in the tests.

bimmlerd added kind/backports This PR provides functionality previously merged into master. backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. labels Nov 27, 2023

bimmlerd requested a review from squeed November 27, 2023 15:18

bimmlerd changed the title ~~v1.12 Backports 2023-11-27~~ [v1.12] Backport of #26242 (don't hold EP lock while generating policy) Nov 27, 2023

bimmlerd force-pushed the pr/v1.12-backport-2023-11-27 branch from def22ad to 9a3081f Compare November 27, 2023 15:40

squeed added 5 commits November 28, 2023 11:09

bimmlerd force-pushed the pr/v1.12-backport-2023-11-27 branch from 9a3081f to defb483 Compare November 28, 2023 10:13

bimmlerd marked this pull request as ready for review November 29, 2023 09:35

bimmlerd requested a review from a team as a code owner November 29, 2023 09:35

bimmlerd added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. and removed release-note/bug This PR fixes an issue in a previous release of Cilium. labels Nov 29, 2023

squeed approved these changes Nov 29, 2023

View reviewed changes

youngnick approved these changes Nov 29, 2023

View reviewed changes

maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Nov 29, 2023

youngnick merged commit a40e6fc into v1.12 Nov 29, 2023
106 checks passed

youngnick deleted the pr/v1.12-backport-2023-11-27 branch November 29, 2023 23:42

nebril mentioned this pull request Dec 11, 2023

Prepare for release v1.12.17 #29791

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.12] Backport of #26242 (don't hold EP lock while generating policy) #29408

[v1.12] Backport of #26242 (don't hold EP lock while generating policy) #29408

bimmlerd commented Nov 27, 2023 •

edited by squeed

bimmlerd commented Nov 28, 2023

bimmlerd commented Nov 28, 2023

bimmlerd commented Nov 28, 2023

bimmlerd commented Nov 28, 2023

squeed commented Nov 28, 2023

squeed commented Nov 29, 2023

[v1.12] Backport of #26242 (don't hold EP lock while generating policy) #29408

[v1.12] Backport of #26242 (don't hold EP lock while generating policy) #29408

Conversation

bimmlerd commented Nov 27, 2023 • edited by squeed

Rationale

bimmlerd commented Nov 28, 2023

bimmlerd commented Nov 28, 2023

bimmlerd commented Nov 28, 2023

bimmlerd commented Nov 28, 2023

squeed commented Nov 28, 2023

squeed commented Nov 29, 2023

bimmlerd commented Nov 27, 2023 •

edited by squeed