New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sort identities by id/name to avoid random results #23329
Conversation
Commit 2151c8a68e152f7c3dd155f856f3a78a946ea56a does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
pkg/k8s/identitybackend/identity.go
Outdated
return false | ||
} | ||
|
||
return left.Name < right.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL: identities aren't namespaced :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase on latest master and remove your merge commit.
/test Job 'Cilium-PR-K8s-1.16-kernel-4.9' hit: #22578 (98.61% similarity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and LGTM 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! :)
While the fix looks pretty straight forward, I'm not sure if sorting is strictly required. It's not clear if the test that this PR is fixing was added with a wrong assumption. We probably don't expect a list of identities mapped to the same set of labels to be a large number, so it might be okay to sort the list. Please update the function description, it doesn't return the "first" identity found anymore.
// get returns the first identity found for the given set of labels as we might
// have duplicated entries identities for the same set of labels.
Also, please add fixes: https://github.com/cilium/cilium/pull/23064
to the commit/PR description, so we have context as to why we decided to sort the identities.
Edit: Looking closely, the identities, err := c.Store.ByIndex(byKeyIndex, key.GetKey())
is now returning a list which is supposed to be ordered, so why do we need to sort again? 😕
/cc @alan-kut as the author of the original PR. |
Thanks for the review. I will update accordingly.
From what I can track, the implementation ends up here: where And then this gets converted to the list with:
And this is where the unpredictable order of the result comes from. IMHO this ordering is needed and provides a more stable API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes just to hold this PR on potentially introducing unintended behavior. I think something that we missed during the review of #23064 is how should we handle the duplicated identity case. I don't think we've previously given it much thought, but now that we are changing the behavior, it's unclear if something depends on it.
I'd like to pull in folks who may have more context here, @cilium/sig-policy might be our best bet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that sorting is not required.
Sorry about the test, the test was intended to check only that the code still works when there are duplicated identities, it's not required to be deterministic which one should be returned.
However I like the sorting here. In huge majority of cases there will one or just a few returned identities here so sorting will be cheap and deterministic results are in general much better IMO
@@ -228,6 +230,20 @@ func (c *crdBackend) get(ctx context.Context, key allocator.AllocatorKey) *v2.Ci | |||
return nil | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest adding comment to tell why it is sorted here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the function comment, is that not enough? I mean "duplicates, therefore we are selecting the 'lexicographically smalles' name" implies to certain extent we will 'sort' the results before deciding which one to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cilium identity name is just a numeric. I would suggest something along these lines "In the case of duplicate entries, return an identity entry from a sorted list."
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I don't need to review the change again if it's just the function description that was updated.
@@ -228,6 +230,20 @@ func (c *crdBackend) get(ctx context.Context, key allocator.AllocatorKey) *v2.Ci | |||
return nil | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cilium identity name is just a numeric. I would suggest something along these lines "In the case of duplicate entries, return an identity entry from a sorted list."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further review, I think the gist of the PR is fine regarding changing the semantics of handling duplicated identities. Read below.
The only callers of Get()
on the backend is
pkg/allocator/allocator.go|704 col 19| GetNoCache
pkg/allocator/allocator.go|739 col 26| GetIncludeRemoteCaches
pkg/k8s/identitybackend/identity.go|262 col 11| GetIfLocked
What I see is it's ultimately checking if an identity already exists and acquire a reference on it if it does. It doesn't really matter what the actual identity value is that's assigned to the endpoint, as long as the labels association is correct. Policy selectors use labels and not the identity value, so that's why the identity value doesn't matter.
We can ultimately rely on the identity GC to remove any duplicated identities which are no longer in use (based on refcount). By sorting the list of identities to get a stable output on Get()
, then this ensure we always get the same one. So in theory, the "unselected", duplicated identity will have 0 refcount and be a candidate for GC anyway.
With that said, I would +1 for changing the sort logic to handle either the timestamp or whichever identity value is less. Ideally, we don't need to do string to int conversions as we already have the creation timestamp as a time.Time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding for
With that said, I would +1 for changing the sort logic to handle either the timestamp or whichever identity value is less. Ideally, we don't need to do string to int conversions as we already have the creation timestamp as a time.Time.
OK but - using the timestamps make the test indeterministic again, as there are no guaranties of the order of creation of the test entries. I am updating the PR with the latest comment suggested and will leave it as it is. If someone has a better idea of how this sorting should work, please take over. |
Why would they become nondeterministic? If they are initialized like identities: []v2.CiliumIdentity{
createCiliumIdentity(10, duplicateMap1),
createCiliumIdentity(11, duplicateMap1),
}, then we are instantiating a Go slice which is ordered. We just need to ensure that |
@nickolaev Please check the go linter warning. Travis failure is related.
@christarazi I'm not sure I follow your proposal. In general, we don't seem to set the |
Well, we only need to set the timestamp in the tests because we can't rely on K8s doing that for us. In non-test code, K8s will definitely set the |
Thanks for the context. If it's guaranteed to be always set by the system, then yes, it makes sense to use the timestamp to sort the identities. |
/test Edit: oops, accidentally closed. |
/test Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
Fixes cilium#23314 Signed-off-by: Nikolay Nikolaev <nikolay.nikolaev@isovalent.com>
/test |
Fixes #23314
Fixes: #23064
Signed-off-by: Nikolay Nikolaev nikolay.nikolaev@isovalent.com