Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.12] Backport of #26242 (don't hold EP lock while generating policy) #29408

Merged
merged 6 commits into from
Nov 29, 2023

Conversation

bimmlerd
Copy link
Member

@bimmlerd bimmlerd commented Nov 27, 2023

Once this PR is merged, a GitHub action will update the labels of these PRs:

 26242

Rationale

In the following, the notation SelectorCache > NameManager means that the selector cache lock is held while the NameManager lock is being acquired. When a lock is held when another is acquired, it creates an edge in the graph of lock dependencies. If there is a cycle in lock dependencies, a wait cycle (or deadlock) is possible.

In this case, the wait cycle in question is the following:

IPCache > EP > NameManager > IPCache

The IPCache > EP link stems from the following stack. Note that EP here stands for any endpoint, as they are all locked in sequence. I believe this is related to replacing the kube API server.

github.com/cilium/cilium/pkg/endpointmanager.(*EndpointManager).UpdatePolicyMaps() <-- takes _all_ EP locks
github.com/cilium/cilium/pkg/ipcache.(*IPCache).UpdatePolicyMaps() <-- t
github.com/cilium/cilium/pkg/ipcache.(*IPCache).removeLabelsFromIPs(...) <-- takes IPCache lock
github.com/cilium/cilium/pkg/ipcache.(*IPCache).RemoveLabelsExcluded(...)
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).handleKubeAPIServerServiceEPChanges(...)
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).addKubeAPIServerServiceEPSliceV1(...)
github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).updateK8sEndpointSliceV1(...)

The next link in the chain, EP > NameManager, occurs in the path of EP regeneration when dealing with FQDN selectors. Specifically, the SelectorCache wants to lock NameManager early (due to lock ordering constraints, cf 0a07c9b).

github.com/cilium/cilium/pkg/fqdn.(*NameManager).Lock() <--- NameManager lock
github.com/cilium/cilium/pkg/policy.(*SelectorCache).AddFQDNSelector(...)
github.com/cilium/cilium/pkg/policy.(*L4Filter).cacheFQDNSelector(...)
github.com/cilium/cilium/pkg/policy.(*L4Filter).cacheFQDNSelectors(...)
github.com/cilium/cilium/pkg/policy.createL4Filter(...)
github.com/cilium/cilium/pkg/policy.createL4EgressFilter(...)
github.com/cilium/cilium/pkg/policy.mergeEgressPortProto(...)
github.com/cilium/cilium/pkg/policy.mergeEgress.func1(...)
github.com/cilium/cilium/pkg/policy/api.PortRules.Iterate(...)
github.com/cilium/cilium/pkg/policy.mergeEgress(...)
github.com/cilium/cilium/pkg/policy.(*rule).resolveEgressPolicy(...)
github.com/cilium/cilium/pkg/policy.ruleSlice.resolveL4EgressPolicy(...)
github.com/cilium/cilium/pkg/policy.(*Repository).resolvePolicyLocked(...)
github.com/cilium/cilium/pkg/policy.(*PolicyCache).updateSelectorPolicy(...)
github.com/cilium/cilium/pkg/policy.(*PolicyCache).UpdatePolicy(...)
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regeneratePolicy(...)
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(...) <--- takes EP lock
github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(...)

Finally, NameManager > IPCache stems from the incremental policy path used by FQDN. UpdateGenerateDNS locks the NameManager for the duration of the method, and calls into IPCache for CIDR allocation.

github.com/cilium/cilium/pkg/ipcache.(*IPCache).AllocateCIDRs(...) <--- rlocks metadata, locks IPCache
github.com/cilium/cilium/pkg/ipcache.(*IPCache).AllocateCIDRsForIPs(...)
github.com/cilium/cilium/daemon/cmd.cachingIdentityAllocator.AllocateCIDRsForIPs(...)
github.com/cilium/cilium/daemon/cmd.identitiesForFQDNSelectorIPs(...)
github.com/cilium/cilium/daemon/cmd.(*Daemon).updateSelectors(...)
github.com/cilium/cilium/pkg/fqdn.(*NameManager).UpdateGenerateDNS(...) <--- locks the NameManager
github.com/cilium/cilium/daemon/cmd.(*Daemon).notifyOnDNSMsg(...)

@bimmlerd bimmlerd added kind/backports This PR provides functionality previously merged into master. backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. labels Nov 27, 2023
@bimmlerd bimmlerd requested a review from squeed November 27, 2023 15:18
@bimmlerd bimmlerd changed the title v1.12 Backports 2023-11-27 [v1.12] Backport of #26242 (don't hold EP lock while generating policy) Nov 27, 2023
[ upstream commit 3ca309d ]

[ backporter's notes:
 1. adjust types.NamedPortMap -> policy.NamedPortMap
 2. EndpointInfoSource is in proxy/logger/epinfo.go instead of proxy/endpoint/endpoint.go
 3. Go1.18 compat: atomic.Pointer -> atomic.Value
]

This function is called deep within the policy generation hierarchy, and
is at a risk of causing deadlocks.

Given that it's just reading a pointer to a never-mutated map, we can
safely stash this behind an atomic Pointer and remove the lock.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
[ upstream commit e20b16d ]

[ backporter's notes:
1. manager_test: we don't yet have ipcache test infra structure, use the
   real thing instead (as other tests do)
2. go1.18 compat: atomic.Pointer -> atomic.Value
]

It turns out that most of the endpoint identities, e.g. pod name /
namespace, are actually immutable. So, there's no need to grab a lock
before reading them.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
[ upstream commit b63115b ]

These methods are no longer used; remove them from the
EndpointInfoSource interface.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
[ upstream commit f048a6a ]

As preparation for other refactors of the policy engine, no longer hold
the endpoint lock while calculating policy. This is safe to do, since
the only input is the endpoint's security identity. Furthermore, if,
somehow, policy were to be calculated in parallel, we can reject an
update if its revision is too old.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
[ upstream commit 9623641 ]

This ensures the generated ID works like IDs allocated normally - that
their LabelArray is set.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
[ upstream commit 8e163a9 ]

This adds a small test that ensures incremental updates are never lost,
even in the face of significant identity churn.

It simulates a churning ipcache flinging identities in to the policy
engine, and similarly recalculates policy constantly.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: David Bimmler <david.bimmler@isovalent.com>
@bimmlerd
Copy link
Member Author

/test-backport-1.12

@bimmlerd
Copy link
Member Author

CI triage:

  • test-1.16-4.9 failed with endpoint 732 SecurityIdentity changed during policy regeneration logged at error severity. This, AFAICT, is now expected to happen and shouldn't fail the test. Rerunning, might look into fixing this upstream.
  • test-1.24-net-next failed in K8sServicesTest Checks N/S loadbalancing Tests with XDP, direct routing, Hybrid and Random with a curl TFTP request failing. The service state of cilium looks as expected, and there is no policy drop of the flow in the hubble logs. I'm chalking this up to kubectl exec 'curl' failing, and will rerun.

@bimmlerd
Copy link
Member Author

/test-1.24-net-next

@bimmlerd
Copy link
Member Author

/test-1.16-4.9

@squeed
Copy link
Contributor

squeed commented Nov 28, 2023

I recall seeing the error message SecurityIdentity changed during policy regeneration during development of this PR, and I did something to prevent it from happening. It is escaping me, however, what exactly the fix was.

@bimmlerd bimmlerd marked this pull request as ready for review November 29, 2023 09:35
@bimmlerd bimmlerd requested a review from a team as a code owner November 29, 2023 09:35
@bimmlerd bimmlerd added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. and removed release-note/bug This PR fixes an issue in a previous release of Cilium. labels Nov 29, 2023
@squeed
Copy link
Contributor

squeed commented Nov 29, 2023

I did a lot of debugging, and I determined that the error message popped up because the endpoint in question (actually the local node's endpoint) changed labels precisely during regeneration. Bad luck. I believe it is safe to ignore this error message in the tests.

@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Nov 29, 2023
@youngnick youngnick merged commit a40e6fc into v1.12 Nov 29, 2023
106 checks passed
@youngnick youngnick deleted the pr/v1.12-backport-2023-11-27 branch November 29, 2023 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.12 This PR represents a backport for Cilium 1.12.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants