Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImmSet optimisation for multi inserts or multi deletions #33138

Merged
merged 3 commits into from
Jun 15, 2024

Conversation

DamianSawicki
Copy link
Contributor

@DamianSawicki DamianSawicki commented Jun 13, 2024

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

This PR proposes new implementations with lower computational complexity of the following methods of ImmSet :

  • func (s ImmSet[T]) Insert(xs ...T) ImmSet[T],
  • func (s ImmSet[T]) Delete(xs ...T) ImmSet[T],
  • func (s ImmSet[T]) Difference(s2 ImmSet[T]) ImmSet[T],
  • func (s ImmSet[T]) Union(s2 ImmSet[T]) ImmSet[T].

The first commit of the PR contains benchmarking comparing the existing and the new implementation. For me, the results were as follows:

  • for Insert, the proposed method becomes faster already with the container of size 1000, and then it performed 10x faster for size 10,000 and 100x faster for size 100,000;
  • for Delete, the proposed method becomes faster already with the container of size 1000, and then it performed ~5x faster for size 10,000;
  • for Difference, the proposed method was already 4x faster for size 100, and then it performed 7x faster for size 1000, 35x times faster for size 10,000, and 193x faster for size 100,000;
  • for Union, the proposed method performs slightly faster, but gains do not visibly grow with increasing size.

The proposed implementation has improved computational complexity, lowering the complexity of Insert, Delete, and Difference from quadratic to linear:

  • the complexity of Insert is O(len(s.xs)*len(xs)), and the complexity of InsertNew is O(len(s.xs)+len(xs));
  • the complexity of Delete is O(len(s.xs)*len(xs)), and the complexity of DeleteNew is O(len(s.xs)+len(xs));
  • the complexity of Difference is O(len(s.xs)*len(s2.xs)) because it uses Delete internally, and the complexity of DifferenceNew is O(len(s.xs)+len(s2.xs));
  • the complexity of Union is harder to estimate: it involves sorting a slice of size n:=len(s.xs)+len(s2.xs), but this slice is a concatenation of two sorted slices, so most likely this does not lead to the usual O(n*log(n)) complexity; of course, it is at least O(n); the complexity of UnionNew is O(n).

EDIT: I see no obvious impact of changes applied after code review on the benchmarking results.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 13, 2024
@DamianSawicki DamianSawicki changed the title Immset optimisation for multi inserts or multi deletions ImmSet optimisation for multi inserts or multi deletions Jun 13, 2024
@DamianSawicki
Copy link
Contributor Author

/assign joamaki

@maintainer-s-little-helper
Copy link

Commit 79ff848 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 13, 2024
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jun 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 13, 2024
@DamianSawicki DamianSawicki marked this pull request as ready for review June 13, 2024 14:03
@DamianSawicki DamianSawicki requested a review from a team as a code owner June 13, 2024 14:03
@dylandreimerink dylandreimerink requested a review from joamaki June 13, 2024 15:38
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Jun 14, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 14, 2024
Copy link
Contributor

@joamaki joamaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just few non-blocking nits.

This commit adds alternative implementations of methods of ImmSet:
 * InsertNew(xs ...T)
 * DeleteNew(xs ...T)
 * UnionNew(s2 ImmSet[T])
 * DifferenceNew(s2 ImmSet[T])
and benchmarks these implementations agains the existing ones.

Benchmarking results:
 * for Insert, the proposed method becomes faster already with the
   container of size 1000, and then it performed 10x faster for size
   10,000 and 100x faster for size 100,000;
 * for Delete, the proposed method becomes faster already with the
   container of size 1000, and then it performed ~5x faster for size
   10,000;
 * for Difference, the proposed method was already 4x faster for size
   100, and then it performed 7x faster for size 1000, 35x times faster
   for size 10,000, and 193x faster for size 100,000;
 * for Union, the proposed method performs slightly faster, but gains
   do not visibly grow with increasing size.

Theoretically, the proposed solutions have improved computational
complexity:
 * the complexity of Insert is O(len(s.xs)*len(xs)), and the complexity
   of InsertNew is O(len(s.xs)+len(xs));
 * the complexity of Delete is O(len(s.xs)*len(xs)), and the complexity
   of DeleteNew is O(len(s.xs)+len(xs));
 * the complexity of Difference is O(len(s.xs)*len(s2.xs)) because it
   uses Delete internally, and the complexity of DifferenceNew
   O(len(s.xs)+len(s2.xs));
 * the complexity of Union is harder to estimate: it involves sorting a
   slice of size n=len(s.xs)+len(s2.xs), but this slice is a
   concatenation of two sorted slices, so most likely this does not lead
   to the usual O(n*log(n)) complexity; of course, it is at least O(n);
   the complexity of UnionNew is O(n).

Signed-off-by: Damian Sawicki <dsawicki@google.com>
The rationale is given in the previous commit message.

Signed-off-by: Damian Sawicki <dsawicki@google.com>
This adds a check to ImmSet methods Insert and Delete,
whether there is a single or multiple elements being inserted or
deleted. Depending on that, two different algorithms are used. For a
single element, both algorightms are linear in the size of the ImmSet,
but we choose the one that benchmarking shows to be faster.

Signed-off-by: Damian Sawicki <dsawicki@google.com>
@dylandreimerink
Copy link
Member

/test

@dylandreimerink dylandreimerink added this pull request to the merge queue Jun 15, 2024
Merged via the queue into cilium:main with commit 3a0a3fc Jun 15, 2024
63 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants