Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identity based batching race condition Issues #17520

Merged
merged 1 commit into from
Oct 22, 2021

Conversation

krishgobinath
Copy link
Contributor

@krishgobinath krishgobinath commented Oct 4, 2021

Fix race condition issues in Identity based batching

Signed-off-by: Gobinath Krishnamoorthy gobinathk@google.com

@krishgobinath krishgobinath requested a review from a team as a code owner October 4, 2021 06:26
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 4, 2021
@krishgobinath krishgobinath requested review from errordeveloper and removed request for a team October 4, 2021 06:26
Comment on lines 524 to 539
// If CEP identity is changed, remove the old CEP from cache and insert
// the CEP in a new CEB
if c.cebToIdentity[cebName] != cep.IdentityID {
c.RemoveCepFromCache(GetCepNameFromCCEP(cep))
} else {
// add a cep into the ceb
c.addCEPtoCEB(cep, cebName)
return cebName
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does not match what the code should be doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment.

// If CEP identity is changed, remove the old CEP from cache and insert
// the CEP in a new CEB
if c.cebToIdentity[cebName] != cep.IdentityID {
c.RemoveCepFromCache(GetCepNameFromCCEP(cep))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need to remove the CEP from the actual CEB (not just from the cache)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment. RemoveCepFromCache will remove the CEP from local CEB cache and will trigger an UPDATE to k8s-apiserver to sync local CEB with k8s-apiserver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the code will the CEB be created? What I meant with my previous review was that the else on line 532 seems to be a mistake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the code will the CEB be created?
Here is what we find or create a CEB for this CEP,
https://github.com/krishgobinath/cilium/blob/ceb-race/operator/pkg/ciliumendpointbatch/manager.go#L539-L564

I'm not following you, what issue you see it on line #532
If the given CEP and CEP in local cache are same, then we should update the local cache, to have the latest value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the given CEP and CEP in local cache are same, then we should update the local cache, to have the latest value.

oh I thought this function ended on line 538 I missed it was being added on line 570. Great!

	// Queue the CEP in CEB
	c.addCEPtoCEB(cep, cb.ceb.GetName())

// If CEP identity is changed, remove the old CEP from cache and insert
// the CEP in a new CEB
if c.cebToIdentity[cebName] != cep.IdentityID {
c.RemoveCepFromCache(GetCepNameFromCCEP(cep))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the code will the CEB be created? What I meant with my previous review was that the else on line 532 seems to be a mistake.

Comment on lines 521 to 531
// check the given cep is already exists in any of the CEB.
// if yes, Update a ceb with the given cep object.
// if yes, compare the given CEP Identity with the CEPs stored in CEB.
// If they are same UPDATE the CEP in the CEB. This will trigger CEB UPDATE to k8s-apiserver.
// If the Identities differ, remove the CEP from the existing CEB
// and find a new CEB to batch the given CEP in a CEB. This will trigger following actions,
// 1) CEB UPDATE to k8s-apiserver, removing CEP in old CEB
// 2) CEB CREATE to k8s-apiserver, inserting the given CEP in a new CEB or
// 3) CEB UPDATE to k8s-apiserver, inserting the given CEP in existing CEB
Copy link
Member

@aanm aanm Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment worries me a little bit. It says the CEP is removed from a CEB and added in a new CEB, won't this means that the cilium-agents receiving the CEB events will receive a "delete" followed by a "add/update" for that CEP? This results in the cilium-agent from removing the IP address of that CEP from the ipcache and then re-adding it back which can result in packet drops in the source node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately for Identity based batching mode, if the Identity for the CEP is changed for whatever the reason. This would trigger couple of CEB updates as you described. On the source node we don't delete anything on Ipcache, because source of truth for the cilium-agent is source node. Check here.
https://github.com/krishgobinath/cilium/blob/ceb-race/pkg/k8s/watchers/cilium_endpoint_batch_subscriber.go#L143-L148

Copy link
Member

@aanm aanm Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for ipcache entries that belong to the local agent but what about for ipcache entries that were populated from information of other nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On remote nodes, yes we may see packet drop issues for a short period of time, eventually that should resolve. That is something we can't avoid here. I think, We should document this behavior in our docs.
@Weil0ng any thoughts here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with @krishgobinath offline, I think there are two racing conditions:

Issue 1: similar to #16565. Essentially if the remove from old CEB event arrives after the reinsert into new CEB, we might end up deleting the ip from ipcache, this can be solved in the same way as #17161, aka checking the "owner" CEB at time of deletion. @krishgobinath will add a new commit in this PR to address this racing issue.

Issue 2: this I believe is what's being discussed above. So essentially with batching, a CEP update event is broken into a delete (triggered by removal from old CEB) and an add (trigger by insertion into new CEB) as @aanm initially pointed out. Since we cannot enforce strong guarantee of ordering for different objects (2 different CEBs), I don't think this is easily solvable. But maybe @aanm have some good ideas?
OTOH, since this only happens when you "change"/"add"/"remove" a label of some running resources, I'd say it's relatively rare and I'm okay with documenting it as a known issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible on the source node, but there's no guarantee that ADDs will arrive before DELETEs on other nodes, so I'm not sure how much this (sending ADDs before DELETEs) would help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Weil0ng couldn't we guarantee it if we would only send the DELETE from the operator after receiving the ADD from the operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good mitigation method but still, in theory, that only guarantees DELETE will arrive after ADD for the operator node, for other node, it still could be possible that DELETE arrives before ADD but I think that should be very unlikely. On the other hand, I'm not sure how complicated would it be in the code to have such a dependency check between CEP and CEB events? @krishgobinath , if it would greatly complicates the code then maybe just add a default delay in sending DELETE could achieve similar effects.

I'm all for having a mitigation method in place but I still feel we should document the possibility of connection disruption.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Weil0ng sounds good to me. One other mitigation that just occurred me would be to postpone deletes in the ipcache for 1 minute? This could also allow follow up ADDs to be received by cilium-agent. If ADDs would be received the deletes would be canceled. Might be a little bit too hacky.

Documenting this behavior sounds good, since this will be a beta feature I'm up for documenting this behavior and we can think on a better approach when we switch the feature to GA, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm :) I'll document this caveat in #17430.

// If CEP identity is changed, remove the old CEP from cache and insert
// the CEP in a new CEB
if c.cebToIdentity[cebName] != cep.IdentityID {
c.RemoveCepFromCache(GetCepNameFromCCEP(cep))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the given CEP and CEP in local cache are same, then we should update the local cache, to have the latest value.

oh I thought this function ended on line 538 I missed it was being added on line 570. Great!

	// Queue the CEP in CEB
	c.addCEPtoCEB(cep, cb.ceb.GetName())

Comment on lines 521 to 531
// check the given cep is already exists in any of the CEB.
// if yes, Update a ceb with the given cep object.
// if yes, compare the given CEP Identity with the CEPs stored in CEB.
// If they are same UPDATE the CEP in the CEB. This will trigger CEB UPDATE to k8s-apiserver.
// If the Identities differ, remove the CEP from the existing CEB
// and find a new CEB to batch the given CEP in a CEB. This will trigger following actions,
// 1) CEB UPDATE to k8s-apiserver, removing CEP in old CEB
// 2) CEB CREATE to k8s-apiserver, inserting the given CEP in a new CEB or
// 3) CEB UPDATE to k8s-apiserver, inserting the given CEP in existing CEB
Copy link
Member

@aanm aanm Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for ipcache entries that belong to the local agent but what about for ipcache entries that were populated from information of other nodes?

}
}

// cepToCEBmap is used to map CiliumEndpoint name to CiliumEndpointBatch name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this piece, isn't this the same as the one you already introduced in https://github.com/cilium/cilium/pull/17543/files#diff-72fe1386c41a6b05ace9b6247fb3ab9d28b2e37cd7e10f6dfce40a226f115a64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those helper function aren't exported to use outside the package. It's better off with separate helper functions.

@@ -56,6 +56,11 @@ const (
CESControllerWorkQueueQPSLimit = 10
// default burst limit value for workqueues.
CESControllerWorkQueueBurstLimit = 100
// Delayed CES Synctime, CES's are synced with k8s-apiserver after certain delay
// Some CES's are delayed to sync with k8s-apiserver.
DelayedCESSyncTime = 15 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cleaner if this variable name DelayedCESSyncTime would be changed to DelayedCESDeleteSyncTime

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, Done.

Fix race condition issues in Identity based batching
At runtime if CEP Identity is changed, based on batching
mode, in case of Identity batching mode.
we may remove the CEP from CEB, re-insert the CEP in new CEB.
In this case, change in CEP Identity translates into
1) Remove the CEP from a CEB
2) Insert the CEP in a new CEB

Signed-off-by: Gobinath Krishnamoorthy <gobinathk@google.com>
@aanm aanm merged commit 77232c4 into cilium:feature/cep-scalability Oct 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/needs-release-note-label The author needs to describe the release impact of these changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants