ImmSet optimisation for multi inserts or multi deletions #33138

DamianSawicki · 2024-06-13T13:56:13Z

Please ensure your pull request adheres to the following guidelines:

For first time contributors, read Submitting a pull request
All code is covered by unit and/or runtime tests where feasible.
All commits contain a well written commit description including a title,
description and a Fixes: #XXX line if the commit addresses a particular
GitHub issue.
If your commit description contains a Fixes: <commit-id> tag, then
please add the commit author[s] as reviewer[s] to this issue.
All commits are signed off. See the section Developer’s Certificate of Origin
Provide a title or release-note blurb suitable for the release notes.
Are you a user of Cilium? Please add yourself to the Users doc
Thanks for contributing!

This PR proposes new implementations with lower computational complexity of the following methods of ImmSet :

func (s ImmSet[T]) Insert(xs ...T) ImmSet[T],
func (s ImmSet[T]) Delete(xs ...T) ImmSet[T],
func (s ImmSet[T]) Difference(s2 ImmSet[T]) ImmSet[T],
func (s ImmSet[T]) Union(s2 ImmSet[T]) ImmSet[T].

The first commit of the PR contains benchmarking comparing the existing and the new implementation. For me, the results were as follows:

for Insert, the proposed method becomes faster already with the container of size 1000, and then it performed 10x faster for size 10,000 and 100x faster for size 100,000;
for Delete, the proposed method becomes faster already with the container of size 1000, and then it performed ~5x faster for size 10,000;
for Difference, the proposed method was already 4x faster for size 100, and then it performed 7x faster for size 1000, 35x times faster for size 10,000, and 193x faster for size 100,000;
for Union, the proposed method performs slightly faster, but gains do not visibly grow with increasing size.

The proposed implementation has improved computational complexity, lowering the complexity of Insert, Delete, and Difference from quadratic to linear:

the complexity of Insert is O(len(s.xs)*len(xs)), and the complexity of InsertNew is O(len(s.xs)+len(xs));
the complexity of Delete is O(len(s.xs)*len(xs)), and the complexity of DeleteNew is O(len(s.xs)+len(xs));
the complexity of Difference is O(len(s.xs)*len(s2.xs)) because it uses Delete internally, and the complexity of DifferenceNew is O(len(s.xs)+len(s2.xs));
the complexity of Union is harder to estimate: it involves sorting a slice of size n:=len(s.xs)+len(s2.xs), but this slice is a concatenation of two sorted slices, so most likely this does not lead to the usual O(n*log(n)) complexity; of course, it is at least O(n); the complexity of UnionNew is O(n).

EDIT: I see no obvious impact of changes applied after code review on the benchmarking results.

DamianSawicki · 2024-06-13T13:58:00Z

/assign joamaki

maintainer-s-little-helper · 2024-06-13T14:00:07Z

Commit 79ff848 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

joamaki

Thanks! Just few non-blocking nits.

pkg/container/immset.go

This commit adds alternative implementations of methods of ImmSet: * InsertNew(xs ...T) * DeleteNew(xs ...T) * UnionNew(s2 ImmSet[T]) * DifferenceNew(s2 ImmSet[T]) and benchmarks these implementations agains the existing ones. Benchmarking results: * for Insert, the proposed method becomes faster already with the container of size 1000, and then it performed 10x faster for size 10,000 and 100x faster for size 100,000; * for Delete, the proposed method becomes faster already with the container of size 1000, and then it performed ~5x faster for size 10,000; * for Difference, the proposed method was already 4x faster for size 100, and then it performed 7x faster for size 1000, 35x times faster for size 10,000, and 193x faster for size 100,000; * for Union, the proposed method performs slightly faster, but gains do not visibly grow with increasing size. Theoretically, the proposed solutions have improved computational complexity: * the complexity of Insert is O(len(s.xs)*len(xs)), and the complexity of InsertNew is O(len(s.xs)+len(xs)); * the complexity of Delete is O(len(s.xs)*len(xs)), and the complexity of DeleteNew is O(len(s.xs)+len(xs)); * the complexity of Difference is O(len(s.xs)*len(s2.xs)) because it uses Delete internally, and the complexity of DifferenceNew O(len(s.xs)+len(s2.xs)); * the complexity of Union is harder to estimate: it involves sorting a slice of size n=len(s.xs)+len(s2.xs), but this slice is a concatenation of two sorted slices, so most likely this does not lead to the usual O(n*log(n)) complexity; of course, it is at least O(n); the complexity of UnionNew is O(n). Signed-off-by: Damian Sawicki <dsawicki@google.com>

The rationale is given in the previous commit message. Signed-off-by: Damian Sawicki <dsawicki@google.com>

This adds a check to ImmSet methods Insert and Delete, whether there is a single or multiple elements being inserted or deleted. Depending on that, two different algorithms are used. For a single element, both algorightms are linear in the size of the ImmSet, but we choose the one that benchmarking shows to be faster. Signed-off-by: Damian Sawicki <dsawicki@google.com>

dylandreimerink · 2024-06-14T12:45:23Z

/test

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 13, 2024

DamianSawicki changed the title ~~Immset optimisation for multi inserts or multi deletions~~ ImmSet optimisation for multi inserts or multi deletions Jun 13, 2024

maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 13, 2024

github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jun 13, 2024

DamianSawicki force-pushed the immset_optimisation branch from 79ff848 to ac4b3ec Compare June 13, 2024 14:02

maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 13, 2024

DamianSawicki marked this pull request as ready for review June 13, 2024 14:03

DamianSawicki requested a review from a team as a code owner June 13, 2024 14:03

DamianSawicki requested a review from dylandreimerink June 13, 2024 14:03

dylandreimerink requested a review from joamaki June 13, 2024 15:38

joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Jun 14, 2024

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 14, 2024

joamaki approved these changes Jun 14, 2024

View reviewed changes

pkg/container/immset.go Outdated Show resolved Hide resolved

pkg/container/immset.go Outdated Show resolved Hide resolved

DamianSawicki added 3 commits June 14, 2024 10:39

Replace existing ImmSet methods with proposed ones

51f1ab7

The rationale is given in the previous commit message. Signed-off-by: Damian Sawicki <dsawicki@google.com>

DamianSawicki force-pushed the immset_optimisation branch from ac4b3ec to bfb195f Compare June 14, 2024 11:27

dylandreimerink enabled auto-merge June 14, 2024 12:45

dylandreimerink added this pull request to the merge queue Jun 15, 2024

Merged via the queue into cilium:main with commit 3a0a3fc Jun 15, 2024
63 of 64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImmSet optimisation for multi inserts or multi deletions #33138

ImmSet optimisation for multi inserts or multi deletions #33138

DamianSawicki commented Jun 13, 2024 •

edited

Loading

DamianSawicki commented Jun 13, 2024

maintainer-s-little-helper bot commented Jun 13, 2024

joamaki left a comment

dylandreimerink commented Jun 14, 2024

ImmSet optimisation for multi inserts or multi deletions #33138

ImmSet optimisation for multi inserts or multi deletions #33138

Conversation

DamianSawicki commented Jun 13, 2024 • edited Loading

DamianSawicki commented Jun 13, 2024

maintainer-s-little-helper bot commented Jun 13, 2024

joamaki left a comment

Choose a reason for hiding this comment

dylandreimerink commented Jun 14, 2024

DamianSawicki commented Jun 13, 2024 •

edited

Loading