Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort identities by id/name to avoid random results #23329

Merged
merged 1 commit into from Feb 6, 2023

Conversation

nickolaev
Copy link
Contributor

@nickolaev nickolaev commented Jan 25, 2023

Fixes #23314

Fixes: #23064

Signed-off-by: Nikolay Nikolaev nikolay.nikolaev@isovalent.com

@nickolaev nickolaev requested a review from a team as a code owner January 25, 2023 08:16
@nickolaev nickolaev requested a review from squeed January 25, 2023 08:16
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 25, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 25, 2023
@maintainer-s-little-helper
Copy link

Commit 2151c8a68e152f7c3dd155f856f3a78a946ea56a does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jan 25, 2023
return false
}

return left.Name < right.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: identities aren't namespaced :-)

@squeed squeed removed the kind/community-contribution This was a contribution made by a community member. label Jan 25, 2023
Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase on latest master and remove your merge commit.

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jan 25, 2023
@tklauser tklauser added the release-note/misc This PR makes changes that have no direct user impact. label Jan 25, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 25, 2023
@sayboras
Copy link
Member

sayboras commented Jan 25, 2023

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.9' hit: #22578 (98.61% similarity)

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and LGTM 💯

@sayboras sayboras added the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Jan 25, 2023
Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! :)
While the fix looks pretty straight forward, I'm not sure if sorting is strictly required. It's not clear if the test that this PR is fixing was added with a wrong assumption. We probably don't expect a list of identities mapped to the same set of labels to be a large number, so it might be okay to sort the list. Please update the function description, it doesn't return the "first" identity found anymore.

// get returns the first identity found for the given set of labels as we might
// have duplicated entries identities for the same set of labels.

Also, please add fixes: https://github.com/cilium/cilium/pull/23064 to the commit/PR description, so we have context as to why we decided to sort the identities.

Edit: Looking closely, the identities, err := c.Store.ByIndex(byKeyIndex, key.GetKey()) is now returning a list which is supposed to be ordered, so why do we need to sort again? 😕

@aditighag
Copy link
Member

/cc @alan-kut as the author of the original PR.

@nickolaev
Copy link
Contributor Author

Thanks for the review. I will update accordingly.

Edit: Looking closely, the identities, err := c.Store.ByIndex(byKeyIndex, key.GetKey()) is now returning a list which is supposed to be ordered, so why do we need to sort again? 😕

From what I can track, the implementation ends up here:
https://github.com/kubernetes/client-go/blob/v0.26.0/tools/cache/thread_safe_store.go#L298

where set, err := c.index.getKeysByIndex(indexName, indexedValue) is returning a map type for set, namely https://github.com/kubernetes/apimachinery/blob/v0.26.0/pkg/util/sets/string.go#L25

And then this gets converted to the list with:

	for key := range set {
		list = append(list, c.items[key])
	}

And this is where the unpredictable order of the result comes from.
In fact, I was checking the failing unit test by printing the members of the test array where I added the slice sort, and they were typically ordered "10, 11", while when the test was failing we were getting "11,10".

IMHO this ordering is needed and provides a more stable API.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes just to hold this PR on potentially introducing unintended behavior. I think something that we missed during the review of #23064 is how should we handle the duplicated identity case. I don't think we've previously given it much thought, but now that we are changing the behavior, it's unclear if something depends on it.

I'd like to pull in folks who may have more context here, @cilium/sig-policy might be our best bet.

@joestringer
Copy link
Member

@sayboras the original PR was not marked for backport to 1.13, so I don't think we need to backport this to 1.13 #23064

@joestringer joestringer removed the needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch label Jan 25, 2023
Copy link
Contributor

@alan-kut alan-kut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that sorting is not required.
Sorry about the test, the test was intended to check only that the code still works when there are duplicated identities, it's not required to be deterministic which one should be returned.

However I like the sorting here. In huge majority of cases there will one or just a few returned identities here so sorting will be cheap and deterministic results are in general much better IMO

@@ -228,6 +230,20 @@ func (c *crdBackend) get(ctx context.Context, key allocator.AllocatorKey) *v2.Ci
return nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding comment to tell why it is sorted here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the function comment, is that not enough? I mean "duplicates, therefore we are selecting the 'lexicographically smalles' name" implies to certain extent we will 'sort' the results before deciding which one to return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cilium identity name is just a numeric. I would suggest something along these lines "In the case of duplicate entries, return an identity entry from a sorted list."

@sayboras
Copy link
Member

sayboras commented Jan 30, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sAgentPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: cannot install connectivity-check

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I don't need to review the change again if it's just the function description that was updated.

@@ -228,6 +230,20 @@ func (c *crdBackend) get(ctx context.Context, key allocator.AllocatorKey) *v2.Ci
return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cilium identity name is just a numeric. I would suggest something along these lines "In the case of duplicate entries, return an identity entry from a sorted list."

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon further review, I think the gist of the PR is fine regarding changing the semantics of handling duplicated identities. Read below.

The only callers of Get() on the backend is

pkg/allocator/allocator.go|704 col 19| GetNoCache
pkg/allocator/allocator.go|739 col 26| GetIncludeRemoteCaches
pkg/k8s/identitybackend/identity.go|262 col 11| GetIfLocked

What I see is it's ultimately checking if an identity already exists and acquire a reference on it if it does. It doesn't really matter what the actual identity value is that's assigned to the endpoint, as long as the labels association is correct. Policy selectors use labels and not the identity value, so that's why the identity value doesn't matter.

We can ultimately rely on the identity GC to remove any duplicated identities which are no longer in use (based on refcount). By sorting the list of identities to get a stable output on Get(), then this ensure we always get the same one. So in theory, the "unselected", duplicated identity will have 0 refcount and be a candidate for GC anyway.


With that said, I would +1 for changing the sort logic to handle either the timestamp or whichever identity value is less. Ideally, we don't need to do string to int conversions as we already have the creation timestamp as a time.Time.

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding for

With that said, I would +1 for changing the sort logic to handle either the timestamp or whichever identity value is less. Ideally, we don't need to do string to int conversions as we already have the creation timestamp as a time.Time.

@nickolaev
Copy link
Contributor Author

Holding for

With that said, I would +1 for changing the sort logic to handle either the timestamp or whichever identity value is less. Ideally, we don't need to do string to int conversions as we already have the creation timestamp as a time.Time.

OK but - using the timestamps make the test indeterministic again, as there are no guaranties of the order of creation of the test entries.
I ma not sure what "indentity value" is.

I am updating the PR with the latest comment suggested and will leave it as it is. If someone has a better idea of how this sorting should work, please take over.

@christarazi
Copy link
Member

Holding for

With that said, I would +1 for changing the sort logic to handle either the timestamp or whichever identity value is less. Ideally, we don't need to do string to int conversions as we already have the creation timestamp as a time.Time.

OK but - using the timestamps make the test indeterministic again, as there are no guaranties of the order of creation of the test entries. I ma not sure what "indentity value" is.

Why would they become nondeterministic? If they are initialized like

identities: []v2.CiliumIdentity{
	createCiliumIdentity(10, duplicateMap1),
	createCiliumIdentity(11, duplicateMap1),
},

then we are instantiating a Go slice which is ordered. We just need to ensure that createCiliumIdentity() sets the creationTimestamp to time.Now() and it should work.

@aditighag
Copy link
Member

@nickolaev Please check the go linter warning. Travis failure is related.

Why would they become nondeterministic? If they are initialized like

identities: []v2.CiliumIdentity{
	createCiliumIdentity(10, duplicateMap1),
	createCiliumIdentity(11, duplicateMap1),
},

then we are instantiating a Go slice which is ordered. We just need to ensure that createCiliumIdentity() sets the creationTimestamp to time.Now() and it should work.

@christarazi I'm not sure I follow your proposal. In general, we don't seem to set the creationTimestamp in the code, so how could the sorting logic use the field for comparison? Based on your suggestion above, it seems like you are suggesting that we only set it for tests? But that's not ideal.

@christarazi
Copy link
Member

@christarazi I'm not sure I follow your proposal. In general, we don't seem to set the creationTimestamp in the code, so how could the sorting logic use the field for comparison? Based on your suggestion above, it seems like you are suggesting that we only set it for tests? But that's not ideal.

Well, we only need to set the timestamp in the tests because we can't rely on K8s doing that for us. In non-test code, K8s will definitely set the creationTimestamp so we can rely on it.

@aditighag
Copy link
Member

@christarazi I'm not sure I follow your proposal. In general, we don't seem to set the creationTimestamp in the code, so how could the sorting logic use the field for comparison? Based on your suggestion above, it seems like you are suggesting that we only set it for tests? But that's not ideal.

Well, we only need to set the timestamp in the tests because we can't rely on K8s doing that for us. In non-test code, K8s will definitely set the creationTimestamp so we can rely on it.

Thanks for the context. If it's guaranteed to be always set by the system, then yes, it makes sense to use the timestamp to sort the identities.

@christarazi
Copy link
Member

christarazi commented Feb 4, 2023

/test

Edit: oops, accidentally closed.

@christarazi christarazi closed this Feb 4, 2023
@christarazi christarazi reopened this Feb 4, 2023
@christarazi
Copy link
Member

christarazi commented Feb 4, 2023

/test

Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 10.0.0.50:80 from testclient-host-5xdvm

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.24-kernel-5.4 so I can create one.

@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Feb 4, 2023
Fixes cilium#23314

Signed-off-by: Nikolay Nikolaev <nikolay.nikolaev@isovalent.com>
@sayboras
Copy link
Member

sayboras commented Feb 5, 2023

/test

@aanm aanm merged commit 84e9641 into cilium:master Feb 6, 2023
@nickolaev nickolaev deleted the fix_23314 branch February 7, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: TestGetIdentity/Duplicated_identity failure
9 participants