-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for service backend entry leak in cilium_lb4_backends_v2 #23749
Conversation
/test |
@hemanthmalla Nice catch!
I am bit struggling to understand how this can lead to duplicate backend entries. Let's assume that a backend exists within another service, while a given service doesn't have it. The |
@brb That's correct. I missed factoring for this behavior in |
@brb Spent some more time looking into this. The race is between removal of a backend in terminating state and deletion of the entire service.
Lines 801 to 807 in 85615c1
We start seeing issues when this service with a terminating backend gets deleted :
Now the ref count for the orphan backend’s hash is 0 and the backend ID is released. If this backend is now added to any service , we’ll see a duplicate entry.
And we have a duplicate backend entry. |
@hemanthmalla Good catch! You might be onto something here! Thanks for the debugging notes. There is a difference in the logic around how we treat terminating backends in v1.11 (terminating backends are removed from the BPF maps) v/s v1.12 (terminating backends are moved to the end of a service entry in the BPF maps). What version are you running (based on the linked code above, it looks like v1.11)? Was there cilium upgrade/restart involved? |
I was also able to reproduce this with the following :
After creation :
While one of the pods in terminating state (513 goes away)
All 4 backends leaked after service delete.
|
@aditighag we're observing this in |
@hemanthmalla Thanks! I have a fix + unit test in progress for v1.11 changes. I'll need to take a closer look for >= v1.12 to see if the problem exists. |
@hemanthmalla Can you give this fix a shot - #23858? |
@aditighag thank you for the fix and unit test. I think we need the commit from this PR as well to avoid duplicates from other failure reasons in future. I need to understand why integration tests are failing here though. Do you think any of these could be flakes ? |
I'm not sure I follow: As Martynas pointed out above, the existing code (
Thanks for testing the fix. The PR explains why we could have duplicate backend entries in some cases.
The travis failure is related. As for the Jenkins failures, here are the documented steps - https://docs.cilium.io/en/v1.13/contributing/testing/ci/. Anyway you may not need this PR? |
@aditighag in a scenario where a backend belongs only to one service, when the k8s service is deleted and cleanup fails halfway through,
During the next |
So actually we have two problem: backend entry leaks caused by service lbmap fails, and failed to reuse the exists backends, am I right? |
After The issue with your changes is that if a backend is common across multiple services, then you'll skip adding the backend to subsequent services. And when the first service is deleted, you would end up deleting the backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aditighag I've been OOO for the last one week. If not sooner, I'll check and get back on the 8th. Sorry for the delay. |
@hemanthmalla Any update on the PR? We can turn it into a draft in the meantime. |
@aditighag apologize for the delay.
Thanks for catching this. What if we moved adding the backend to
I intended this PR only to avoid the leak in the first place. If we also plan on handling leaked / duplicate entries, can we do that on a different PR ? |
f221f4a
to
2286073
Compare
The leak was fixed in this PR. We have a PR to not to skip backend restore for failures, and Jared also backported it. So I'm not sure what this PR is trying to address? |
Primarily I wanted to address the issue that if we don't lookup backends in the global in-memory map there might be a possibility of hitting the same bug again in future. Especially because ref count being zero for a backend doesn't mean that the entry was deleted from the bpf map. |
I realize the current logic is flawed as well since the ref. count doesn't get updated. Pushing an update again to fix that. |
Follow up for cilium#23858 . During backend deletion, reference count decrement and backend ID release operations could go through, but deletion of entries from bpf map might fail. This commit adds additional defenses to avoid potential leaks or duplicates from similar scenarios in the future. Also see cilium#23749 (comment) for more details on past race conditions. Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
2286073
to
d1f6da3
Compare
if b, found := svc.backendByHash[hash]; !found { | ||
if s.backendRefCount.Add(hash) { | ||
globalBackend, existsGlobally := s.backendByHash[hash] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm... When there is a leak, datapath state goes out of sync with userspace. Will this work when a backend is associated with only one service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that s.backendByHash
would still have the entry because its deleted only after the entry is removed from the lbmap. Since the backend ID might be released I'm attempting to restore if an entry exists in s.backendByHash
Lines 1391 to 1399 in 9629343
if s.backendRefCount[hash] == 0 { | |
log.WithField(logfields.BackendID, b.ID). | |
Debug("Removing orphan backend") | |
// The b.ID is unique across IPv4/6, hence attempt | |
// to clean it from both maps, and ignore errors. | |
DeleteBackendID(b.ID) | |
s.lbmap.DeleteBackendByID(b.ID) | |
delete(s.backendByHash, hash) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree with the motivation to prevent such bugs from happening in future. The fix looks fine based on the current logic, but we are relying on the ordering of release of backend id and deletion from the lbmap. Could you add unit tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay! Thanks for the revised fix.
if b, found := svc.backendByHash[hash]; !found { | ||
if s.backendRefCount.Add(hash) { | ||
globalBackend, existsGlobally := s.backendByHash[hash] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree with the motivation to prevent such bugs from happening in future. The fix looks fine based on the current logic, but we are relying on the ordering of release of backend id and deletion from the lbmap. Could you add unit tests?
This pull request has been automatically marked as stale because it |
This pull request has not seen any activity since it was marked stale. |
Backends are meant to be reused across services. When service backends are updated, existence of every backend is looked up by it's hash. Currently, the check is made only against the current service's backends but not against global backends.
This can result in duplicate backend entries. Without this commit whether duplicate is created or not depends on a race between releasing old backend ID during deletion and updating ref. count during creation. Currently on agent startup when service restoration process finds a duplicate entry, restoration stops and the rest of the entries are leaked. Over time with enough restarts, we could max out the backend bpf map.
This commit makes the lookup global and entries in backendByHash map are deleted only after data from lbmap is deleted. This should prevent duplicate entries in backend map.
Fixes #23551
Signed-off-by: Hemanth Malla hemanth.malla@datadoghq.com