New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug with toServices
policy where service backend churn left stale CIDR identities
#25687
Conversation
toServices
policy where backend churn left stale CIDR identities in the ipcache
toServices
policy where backend churn left stale CIDR identities in the ipcachetoServices
policy where service backend churn left stale CIDR identities in the ipcache
6da6251
to
f9adfdf
Compare
toServices
policy where service backend churn left stale CIDR identities in the ipcachetoServices
policy where service backend churn left stale CIDR identities
f9adfdf
to
f301050
Compare
/test |
cc @squeed @joestringer for opening #20477 |
#25559 has merged, so this needs a rebase? |
77d3561
to
647f901
Compare
/test Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-kernel-4.19/482/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
/mlh new-flake Cilium-PR-K8s-1.16-kernel-4.19 👍 created #25964 |
647f901
to
5d37c69
Compare
/test Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-kernel-4.19/529/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments/questions about the OldEndpoints propagation, while the rest of the changes looks good to me.
ID: esID.ServiceID, | ||
Service: svc, | ||
Endpoints: endpoints, | ||
OldEndpoints: oldEPs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should OldEndpoints
be set also in the other cases in which an UpdateService
event is emitted? IMO it is mainly necessary in the DeleteEndpoints
method below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given #25687 (comment), can you elaborate why it's necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring in particular to the event generated here. Essentially, that happens when one endpointslice gets deleted, triggering an update event with the remaining endpoints for that service. Since OldEndpoints
is not propagated, the corresponding PrefixesToRelease
rules will not be eventually generated by the K8sTranslator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also another aspect that I didn't realize earlier, OldEndpoints
will contain the previous list of endpoints from a given endpointslice, while Endpoints
the ones merged from all endpointslices. It seems to me that it should not create problems though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, I've set the OldEndpoints
for all UpdateService
events. Please take another look. Thanks for double checking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I see that OldEndpoints
and Endpoints
are always the same for updates from remote service backends. Shouldn't they be different? (I'm not 100% sure about how this feature should play with global services)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toServices
(currently) only supports backends that are outside of the cluster (world), and not pod-based backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same logic is also used in the external workloads case, and IIUC in that case the backends processed through that function might be outside of the cluster. Still that is most likely an extremely rare condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giorio94 Could you file a followup issue so that we don't forget this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5d37c69
to
ae516b4
Compare
/test |
This exposes the old endpoints object so that subsequent commits can make use of it to perform proper diff logic for rule translation (`pkg/k8s/rule_translate.go`). This commit should have no functional impact as the code merely references the new endpoint object and does not touch the newly exposed old object. Signed-off-by: Chris Tarazi <chris@isovalent.com>
Previously, toServices-based rules did not properly cleanup CIDR identities. When service backends were removed or changed, the deletion logic acted on the new object rather than on the old object, thus the entries that were supposed to be deleted were simply added back in generateToCidrFromEndpoint(). Fix this by passing the deletion logic the old endpoint object state and performing a diff between the old and new states. Signed-off-by: Chris Tarazi <chris@isovalent.com>
The previous commit fixed toServices diffing logic. This commit adds a test that was used to validate the fix. Signed-off-by: Chris Tarazi <chris@isovalent.com>
This variable is no longer necessary because it doesn't actually prevent ipcache interaction as of commit 4b87ccc ("pkg/k8s/watcher: fix deadlock with service event handler & CES watcher."). Remove it as provides no functional impact. Signed-off-by: Chris Tarazi <chris@isovalent.com>
ae516b4
to
776d773
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for the very thorough review @giorio94! |
Fixes: #20477
Main commit "k8s: Fix toServices rule translation cleanup" will be pasted below for ease of review: