-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identity based batching race condition Issues #17520
Conversation
// If CEP identity is changed, remove the old CEP from cache and insert | ||
// the CEP in a new CEB | ||
if c.cebToIdentity[cebName] != cep.IdentityID { | ||
c.RemoveCepFromCache(GetCepNameFromCCEP(cep)) | ||
} else { | ||
// add a cep into the ceb | ||
c.addCEPtoCEB(cep, cebName) | ||
return cebName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment does not match what the code should be doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated comment.
// If CEP identity is changed, remove the old CEP from cache and insert | ||
// the CEP in a new CEB | ||
if c.cebToIdentity[cebName] != cep.IdentityID { | ||
c.RemoveCepFromCache(GetCepNameFromCCEP(cep)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need to remove the CEP from the actual CEB (not just from the cache)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment. RemoveCepFromCache
will remove the CEP from local CEB cache and will trigger an UPDATE to k8s-apiserver to sync local CEB with k8s-apiserver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the code will the CEB be created? What I meant with my previous review was that the else on line 532 seems to be a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the code will the CEB be created?
Here is what we find or create a CEB for this CEP,
https://github.com/krishgobinath/cilium/blob/ceb-race/operator/pkg/ciliumendpointbatch/manager.go#L539-L564
I'm not following you, what issue you see it on line #532
If the given CEP and CEP in local cache are same, then we should update the local cache, to have the latest value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the given CEP and CEP in local cache are same, then we should update the local cache, to have the latest value.
oh I thought this function ended on line 538 I missed it was being added on line 570. Great!
// Queue the CEP in CEB
c.addCEPtoCEB(cep, cb.ceb.GetName())
c4f6cbd
to
ea71ca7
Compare
// If CEP identity is changed, remove the old CEP from cache and insert | ||
// the CEP in a new CEB | ||
if c.cebToIdentity[cebName] != cep.IdentityID { | ||
c.RemoveCepFromCache(GetCepNameFromCCEP(cep)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the code will the CEB be created? What I meant with my previous review was that the else on line 532 seems to be a mistake.
// check the given cep is already exists in any of the CEB. | ||
// if yes, Update a ceb with the given cep object. | ||
// if yes, compare the given CEP Identity with the CEPs stored in CEB. | ||
// If they are same UPDATE the CEP in the CEB. This will trigger CEB UPDATE to k8s-apiserver. | ||
// If the Identities differ, remove the CEP from the existing CEB | ||
// and find a new CEB to batch the given CEP in a CEB. This will trigger following actions, | ||
// 1) CEB UPDATE to k8s-apiserver, removing CEP in old CEB | ||
// 2) CEB CREATE to k8s-apiserver, inserting the given CEP in a new CEB or | ||
// 3) CEB UPDATE to k8s-apiserver, inserting the given CEP in existing CEB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment worries me a little bit. It says the CEP is removed from a CEB and added in a new CEB, won't this means that the cilium-agents receiving the CEB events will receive a "delete" followed by a "add/update" for that CEP? This results in the cilium-agent from removing the IP address of that CEP from the ipcache and then re-adding it back which can result in packet drops in the source node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately for Identity based batching mode, if the Identity for the CEP is changed for whatever the reason. This would trigger couple of CEB updates as you described. On the source node we don't delete anything on Ipcache, because source of truth for the cilium-agent is source node. Check here.
https://github.com/krishgobinath/cilium/blob/ceb-race/pkg/k8s/watchers/cilium_endpoint_batch_subscriber.go#L143-L148
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true for ipcache entries that belong to the local agent but what about for ipcache entries that were populated from information of other nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On remote nodes, yes we may see packet drop issues for a short period of time, eventually that should resolve. That is something we can't avoid here. I think, We should document this behavior in our docs.
@Weil0ng any thoughts here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced with @krishgobinath offline, I think there are two racing conditions:
Issue 1: similar to #16565. Essentially if the remove from old CEB
event arrives after the reinsert into new CEB
, we might end up deleting the ip from ipcache, this can be solved in the same way as #17161, aka checking the "owner" CEB at time of deletion. @krishgobinath will add a new commit in this PR to address this racing issue.
Issue 2: this I believe is what's being discussed above. So essentially with batching, a CEP update
event is broken into a delete
(triggered by removal from old CEB) and an add
(trigger by insertion into new CEB) as @aanm initially pointed out. Since we cannot enforce strong guarantee of ordering for different objects (2 different CEBs), I don't think this is easily solvable. But maybe @aanm have some good ideas?
OTOH, since this only happens when you "change"/"add"/"remove" a label of some running resources, I'd say it's relatively rare and I'm okay with documenting it as a known issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible on the source node, but there's no guarantee that ADDs
will arrive before DELETEs
on other nodes, so I'm not sure how much this (sending ADDs
before DELETEs
) would help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Weil0ng couldn't we guarantee it if we would only send the DELETE
from the operator after receiving the ADD
from the operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good mitigation method but still, in theory, that only guarantees DELETE
will arrive after ADD
for the operator node, for other node, it still could be possible that DELETE
arrives before ADD
but I think that should be very unlikely. On the other hand, I'm not sure how complicated would it be in the code to have such a dependency check between CEP
and CEB
events? @krishgobinath , if it would greatly complicates the code then maybe just add a default delay in sending DELETE
could achieve similar effects.
I'm all for having a mitigation method in place but I still feel we should document the possibility of connection disruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Weil0ng sounds good to me. One other mitigation that just occurred me would be to postpone deletes in the ipcache for 1 minute? This could also allow follow up ADDs to be received by cilium-agent. If ADDs would be received the deletes would be canceled. Might be a little bit too hacky.
Documenting this behavior sounds good, since this will be a beta feature I'm up for documenting this behavior and we can think on a better approach when we switch the feature to GA, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm :) I'll document this caveat in #17430.
// If CEP identity is changed, remove the old CEP from cache and insert | ||
// the CEP in a new CEB | ||
if c.cebToIdentity[cebName] != cep.IdentityID { | ||
c.RemoveCepFromCache(GetCepNameFromCCEP(cep)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the given CEP and CEP in local cache are same, then we should update the local cache, to have the latest value.
oh I thought this function ended on line 538 I missed it was being added on line 570. Great!
// Queue the CEP in CEB
c.addCEPtoCEB(cep, cb.ceb.GetName())
// check the given cep is already exists in any of the CEB. | ||
// if yes, Update a ceb with the given cep object. | ||
// if yes, compare the given CEP Identity with the CEPs stored in CEB. | ||
// If they are same UPDATE the CEP in the CEB. This will trigger CEB UPDATE to k8s-apiserver. | ||
// If the Identities differ, remove the CEP from the existing CEB | ||
// and find a new CEB to batch the given CEP in a CEB. This will trigger following actions, | ||
// 1) CEB UPDATE to k8s-apiserver, removing CEP in old CEB | ||
// 2) CEB CREATE to k8s-apiserver, inserting the given CEP in a new CEB or | ||
// 3) CEB UPDATE to k8s-apiserver, inserting the given CEP in existing CEB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true for ipcache entries that belong to the local agent but what about for ipcache entries that were populated from information of other nodes?
a8b1bb4
to
f0843ef
Compare
f0843ef
to
6dca307
Compare
6dca307
to
367cdb8
Compare
} | ||
} | ||
|
||
// cepToCEBmap is used to map CiliumEndpoint name to CiliumEndpointBatch name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by this piece, isn't this the same as the one you already introduced in https://github.com/cilium/cilium/pull/17543/files#diff-72fe1386c41a6b05ace9b6247fb3ab9d28b2e37cd7e10f6dfce40a226f115a64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those helper function aren't exported to use outside the package. It's better off with separate helper functions.
367cdb8
to
4aca46f
Compare
994418d
to
a71da8f
Compare
a71da8f
to
058834b
Compare
@@ -56,6 +56,11 @@ const ( | |||
CESControllerWorkQueueQPSLimit = 10 | |||
// default burst limit value for workqueues. | |||
CESControllerWorkQueueBurstLimit = 100 | |||
// Delayed CES Synctime, CES's are synced with k8s-apiserver after certain delay | |||
// Some CES's are delayed to sync with k8s-apiserver. | |||
DelayedCESSyncTime = 15 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cleaner if this variable name DelayedCESSyncTime
would be changed to DelayedCESDeleteSyncTime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, Done.
Fix race condition issues in Identity based batching At runtime if CEP Identity is changed, based on batching mode, in case of Identity batching mode. we may remove the CEP from CEB, re-insert the CEP in new CEB. In this case, change in CEP Identity translates into 1) Remove the CEP from a CEB 2) Insert the CEP in a new CEB Signed-off-by: Gobinath Krishnamoorthy <gobinathk@google.com>
058834b
to
4aa7473
Compare
Fix race condition issues in Identity based batching
Signed-off-by: Gobinath Krishnamoorthy gobinathk@google.com