-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImmSet optimisation for multi inserts or multi deletions #33138
Merged
dylandreimerink
merged 3 commits into
cilium:main
from
DamianSawicki:immset_optimisation
Jun 15, 2024
Merged
ImmSet optimisation for multi inserts or multi deletions #33138
dylandreimerink
merged 3 commits into
cilium:main
from
DamianSawicki:immset_optimisation
Jun 15, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/assign joamaki |
Commit 79ff848 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
79ff848
to
ac4b3ec
Compare
joamaki
approved these changes
Jun 14, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just few non-blocking nits.
This commit adds alternative implementations of methods of ImmSet: * InsertNew(xs ...T) * DeleteNew(xs ...T) * UnionNew(s2 ImmSet[T]) * DifferenceNew(s2 ImmSet[T]) and benchmarks these implementations agains the existing ones. Benchmarking results: * for Insert, the proposed method becomes faster already with the container of size 1000, and then it performed 10x faster for size 10,000 and 100x faster for size 100,000; * for Delete, the proposed method becomes faster already with the container of size 1000, and then it performed ~5x faster for size 10,000; * for Difference, the proposed method was already 4x faster for size 100, and then it performed 7x faster for size 1000, 35x times faster for size 10,000, and 193x faster for size 100,000; * for Union, the proposed method performs slightly faster, but gains do not visibly grow with increasing size. Theoretically, the proposed solutions have improved computational complexity: * the complexity of Insert is O(len(s.xs)*len(xs)), and the complexity of InsertNew is O(len(s.xs)+len(xs)); * the complexity of Delete is O(len(s.xs)*len(xs)), and the complexity of DeleteNew is O(len(s.xs)+len(xs)); * the complexity of Difference is O(len(s.xs)*len(s2.xs)) because it uses Delete internally, and the complexity of DifferenceNew O(len(s.xs)+len(s2.xs)); * the complexity of Union is harder to estimate: it involves sorting a slice of size n=len(s.xs)+len(s2.xs), but this slice is a concatenation of two sorted slices, so most likely this does not lead to the usual O(n*log(n)) complexity; of course, it is at least O(n); the complexity of UnionNew is O(n). Signed-off-by: Damian Sawicki <dsawicki@google.com>
The rationale is given in the previous commit message. Signed-off-by: Damian Sawicki <dsawicki@google.com>
This adds a check to ImmSet methods Insert and Delete, whether there is a single or multiple elements being inserted or deleted. Depending on that, two different algorithms are used. For a single element, both algorightms are linear in the size of the ImmSet, but we choose the one that benchmarking shows to be faster. Signed-off-by: Damian Sawicki <dsawicki@google.com>
ac4b3ec
to
bfb195f
Compare
/test |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/community-contribution
This was a contribution made by a community member.
release-note/misc
This PR makes changes that have no direct user impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.
This PR proposes new implementations with lower computational complexity of the following methods of
ImmSet
:func (s ImmSet[T]) Insert(xs ...T) ImmSet[T]
,func (s ImmSet[T]) Delete(xs ...T) ImmSet[T]
,func (s ImmSet[T]) Difference(s2 ImmSet[T]) ImmSet[T]
,func (s ImmSet[T]) Union(s2 ImmSet[T]) ImmSet[T]
.The first commit of the PR contains benchmarking comparing the existing and the new implementation. For me, the results were as follows:
Insert
, the proposed method becomes faster already with the container of size 1000, and then it performed 10x faster for size 10,000 and 100x faster for size 100,000;Delete
, the proposed method becomes faster already with the container of size 1000, and then it performed ~5x faster for size 10,000;Difference
, the proposed method was already 4x faster for size 100, and then it performed 7x faster for size 1000, 35x times faster for size 10,000, and 193x faster for size 100,000;Union
, the proposed method performs slightly faster, but gains do not visibly grow with increasing size.The proposed implementation has improved computational complexity, lowering the complexity of
Insert
,Delete
, andDifference
from quadratic to linear:Insert
isO(len(s.xs)*len(xs))
, and the complexity ofInsertNew
isO(len(s.xs)+len(xs))
;Delete
isO(len(s.xs)*len(xs))
, and the complexity ofDeleteNew
isO(len(s.xs)+len(xs))
;Difference
isO(len(s.xs)*len(s2.xs))
because it usesDelete
internally, and the complexity ofDifferenceNew
isO(len(s.xs)+len(s2.xs))
;Union
is harder to estimate: it involves sorting a slice of sizen:=len(s.xs)+len(s2.xs)
, but this slice is a concatenation of two sorted slices, so most likely this does not lead to the usualO(n*log(n))
complexity; of course, it is at leastO(n)
; the complexity ofUnionNew
isO(n)
.EDIT: I see no obvious impact of changes applied after code review on the benchmarking results.