New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional FQDN selector identity tracking fixes #17788
Additional FQDN selector identity tracking fixes #17788
Conversation
0935b6d
to
2a1ae35
Compare
2a1ae35
to
9b6a852
Compare
/test |
9b6a852
to
6f192c1
Compare
/test Job 'Cilium-PR-Runtime-net-next' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment Job 'Cilium-PR-K8s-1.21-kernel-4.19' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment Job 'Cilium-PR-K8s-1.22-kernel-4.9' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
Looking into failures:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!
Jarno and I discussed some of the CI failures out-of-band, and came to the conclusion that the first commit changes too much behaviour for the current model of how we handle pod identities: Specifically, it assumes that endpoint add will update the SelectorCache and therefore the endpoint identity change or endpoint delete should also notify the SelectorCache. This is not true, the selectorcache references to global identities is intended to be tied to their allocation/release in the kvstore/apiserver. I will remove the behaviour change on that side from the first commit and split out the changes a bit more + document the intent behind those changes. |
The AllocateIdentity() method accepts a 'notifyOwner' boolean to determine whether to notify the SelectorCache about the change in identities. To balance the APIs into this structure, this commit adds the corresponding notifyOwner flag to the Release function. No other changes are made at this time; all existing cases which notify the owner will continue to notify the owner, and all cases which do not notify the owner will continue to not notify the owner. The next commit will change the behaviour and argue why that is correct. Signed-off-by: Joe Stringer <joe@cilium.io>
Document why the endpoint's AllocateIdentity() function triggers notification into the SelectorCache and yet the Release does not. Signed-off-by: Joe Stringer <joe@cilium.io>
While attempting to address identity garbage collection issues in upcoming commits, I hit the stack trace below. This is not currently possible to hit, because CIDR identity references from the SelectorCache are never released. However, after introducing the proper identity reference count release implementation, we observe that the SelectorCache attempts to release a CIDR identity, which then triggers a call back into the SelectorCache to notify it about the release of the identities. Since the outer calling code must hold the SelectorCache mutex in order to protect the writes against the SelectorCache itself, when the release is executed and attempts to call back into the SelectorCache, UpdateIdentities() attempts to re-acquire the mutex that the goroutine is already holding. This causes deadlocks which then propagate through to other subsystems, for instance locking Endpoints out from applying policy changes and preventing CLI calls for commands such as `cilium policy selectors list` from gathering and returning the internal state of the system. Upon further investigation, the notification of the "owner", ie SelectorCache, is also disbalanced here between the AllocateIdentity() call and the Release() call. The AllocateIdentity() call will only perform the notification when it is called from endpoint (restore or identity update) or clustermesh-apiserver packages. It does not notify the SelectorCache when called from CIDR allocation routines. On the other hand, the Release function inexplicably only notifies the SelectorCache when CIDR identities are released. When this functionality was initially introduced, it seems that we did not recognize and address this disbalance. The CIDR identity allocation routines would not notify the owner (because the callers take on that responsibility), and yet the release _would_ notify the owner. To fix this, this commit disables the owner notification for the Release() function so that it matches the AllocateIdentity() function. This will also prevent the caller from notifying itself again and triggering a deadlock when we apply the fixes in upcoming commits. sync.runtime_SemacquireMutex(0x44, 0x18, 0xc000a9a2d8) /usr/local/go/src/runtime/sema.go:71 +0x25 sync.(*Mutex).lockSlow(0xc000532d90) /usr/local/go/src/sync/mutex.go:138 +0x165 sync.(*Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:81 sync.(*RWMutex).Lock(0x580) /usr/local/go/src/sync/rwmutex.go:111 +0x36 github.com/cilium/cilium/pkg/policy.(*SelectorCache).UpdateIdentities(0xc000532d90, 0x0, 0xc0016540f0, 0x24bc800) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:950 +0x65 github.com/cilium/cilium/daemon/cmd.(*Daemon).UpdateIdentities(0xc0000c5440, 0xc0016540f0, 0xc001000002) /go/src/github.com/cilium/cilium/daemon/cmd/policy.go:88 +0x5a github.com/cilium/cilium/pkg/identity/cache.(*CachingIdentityAllocator).Release(0xc0007a18c0, {0x2b86420, 0xc0009e7140}, 0xc001c42540) /go/src/github.com/cilium/cilium/pkg/identity/cache/allocator.go:423 +0x324 github.com/cilium/cilium/pkg/ipcache.releaseCIDRIdentities({0x2b86420, 0xc0009e7140}, 0xc0006b08c5) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:141 +0x296 github.com/cilium/cilium/pkg/ipcache.ReleaseCIDRIdentitiesByID({0x2b86420, 0xc0009e7140}, {0xc001597450, 0x41de7a0, 0x1bf08eb000}) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:200 +0x4af github.com/cilium/cilium/daemon/cmd.cachingIdentityAllocator.ReleaseCIDRIdentitiesByID({0x2b863e8}, {0x2b86420, 0xc0009e7140}, {0xc001597450, 0x3, 0x10}) /go/src/github.com/cilium/cilium/daemon/cmd/identity.go:117 +0x37 github.com/cilium/cilium/pkg/policy.(*fqdnSelector).releaseIdentityMappings(0xc001c24e10, {0x7f0588917fb8, 0xc0007a18c0}) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:526 +0x1aa github.com/cilium/cilium/pkg/policy.(*SelectorCache).removeSelectorLocked(0xc000532d90, {0x2ba5660, 0xc001c24e10}, {0x2b33100, 0xc000953880}) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:901 +0xd7 github.com/cilium/cilium/pkg/policy.(*SelectorCache).RemoveSelectors(0xc000532d90, {0xc000f255c0, 0x2, 0xc0026e65c8}, {0x2b33100, 0xc000953880}) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:919 +0xb4 github.com/cilium/cilium/pkg/policy.(*L4Filter).removeSelectors(0xc000953880, 0xc0026e6690) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:624 +0x185 github.com/cilium/cilium/pkg/policy.(*L4Filter).detach(0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:631 +0x1e github.com/cilium/cilium/pkg/policy.L4PolicyMap.Detach(0x0, 0xc000a80160) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:801 +0x7e github.com/cilium/cilium/pkg/policy.(*L4Policy).Detach(0xc000635c80, 0x1) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:1011 +0x45 github.com/cilium/cilium/pkg/policy.(*selectorPolicy).Detach(...) /go/src/github.com/cilium/cilium/pkg/policy/resolve.go:107 github.com/cilium/cilium/pkg/policy.(*cachedSelectorPolicy).setPolicy(0xc000532e70, 0xc0009e63c0) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:188 +0x3b github.com/cilium/cilium/pkg/policy.(*PolicyCache).updateSelectorPolicy(0xc000131410, 0xc0009e63c0) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:124 +0x195 github.com/cilium/cilium/pkg/policy.(*PolicyCache).UpdatePolicy(...) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:153 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regeneratePolicy(0xc00078b500) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:230 +0x22b github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(0xc00078b500, 0xc000363400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:815 +0x2dd github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc00078b500, 0xc000363400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:584 +0x19d github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc00078b500, 0xc000363400) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:405 +0x7b3 github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc000d8e060, 0x40568a) /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:53 +0x32c github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1() /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:245 +0x13b sync.(*Once).doSlow(0x1, 0x43f325) /usr/local/go/src/sync/once.go:68 +0xd2 sync.(*Once).Do(...) /usr/local/go/src/sync/once.go:59 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run(0x0) /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:233 +0x45 created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:229 +0x7b Fixes: 83e576e ("identity: create `CachingIdentityAllocator` type") Signed-off-by: Joe Stringer <joe@cilium.io>
This branch includes the backport: https://github.com/joestringer/cilium/tree/submit/v1.10-identity-fix-backport . I'm waiting on #17861 to get merged before posting a PR for them. |
On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * cilium#17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io>
On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io>
[ upstream commit 1b42f7a ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 1b42f7a ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
[ upstream commit 1b42f7a ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * cilium#17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 1b42f7a ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 1b42f7a ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * cilium#17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
[ upstream commit 1b42f7a ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com>
Review commit-by-commit.
Here's the top bugfix commit message: