Optimised extend #127

jdonszelmann · 2023-09-15T22:50:28Z

Closes #124

github-actions · 2023-09-15T22:51:41Z

🚀 Deployed on https://65059694ca32910d0ff42051--ringbuffer.netlify.app

tertsdiepraam · 2023-09-15T22:54:06Z

src/ringbuffer_trait.rs

+    ///
+    /// Depending on the RingBuffer implementation, may be faster than inserting items in a loop
+    ///
+    /// Note that this function is same as [`extend`] except that it is


Is this actually specialized now or will it be?

in constgeneric yes

tertsdiepraam · 2023-09-15T22:54:33Z

src/ringbuffer_trait.rs

-    where
-        T: Default,
+        where
+            T: Default,


tertsdiepraam · 2023-09-15T22:55:07Z

src/with_const_generics.rs

    {
        #[allow(clippy::let_unit_value)]
-        let _ = Self::ERROR_CAPACITY_IS_NOT_ALLOWED_TO_BE_ZERO;
+            let _ = Self::ERROR_CAPACITY_IS_NOT_ALLOWED_TO_BE_ZERO;


This seems weird?

tertsdiepraam · 2023-09-15T23:05:46Z

src/with_const_generics.rs

@@ -418,6 +634,94 @@ mod tests {
        }
    }

+    struct WeirdIterator<T: IntoIterator>(<T as IntoIterator>::IntoIter, SizeHint);


Weirderator

jdonszelmann · 2023-09-16T11:39:23Z

All improvements are over a baseline of extend-by-push

Batch Size = 1

extend too many         time:   [7.2581 µs 7.3719 µs 7.4832 µs]
                        change: [-50.503% -49.899% -49.252%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

extend many too many    time:   [7.2662 µs 7.3626 µs 7.4526 µs]
                        change: [-65.328% -64.824% -64.132%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

extend exact cap        time:   [7.2107 µs 7.3238 µs 7.4474 µs]
                        change: [-45.954% -45.304% -44.613%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

extend too few          time:   [4.4391 µs 4.4725 µs 4.5156 µs]
                        change: [-43.002% -41.961% -40.831%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

extend after one        time:   [10.913 µs 10.960 µs 11.013 µs]
                        change: [+47.418% +50.702% +54.065%] (p = 0.00 < 0.05)
                        Performance has regressed.

Batch size = 32

extend too many         time:   [8.7412 µs 8.8189 µs 8.8939 µs]
                        change: [-40.068% -39.467% -38.953%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

extend many too many    time:   [8.7698 µs 8.8540 µs 8.9391 µs]
                        change: [-57.313% -56.996% -56.660%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

extend exact cap        time:   [8.7375 µs 8.8073 µs 8.8730 µs]
                        change: [-33.538% -32.973% -32.406%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

extend too few          time:   [5.1076 µs 5.1550 µs 5.2014 µs]
                        change: [-39.999% -37.896% -35.822%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 37 outliers among 100 measurements (37.00%)
  23 (23.00%) low severe
  12 (12.00%) high mild
  2 (2.00%) high severe

extend after one        time:   [5.9613 µs 5.9809 µs 5.9962 µs]
                        change: [-26.778% -24.424% -22.282%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 100 measurements (25.00%)
  24 (24.00%) low severe
  1 (1.00%) high mild

Batch size = 256

extend too many         time:   [7.0522 µs 7.1662 µs 7.2915 µs]
                        change: [-51.959% -51.049% -50.207%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

extend many too many    time:   [6.8534 µs 6.9608 µs 7.0709 µs]
                        change: [-66.835% -66.522% -66.186%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

extend exact cap        time:   [6.8443 µs 6.9401 µs 7.0342 µs]
                        change: [-48.258% -47.694% -47.152%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

extend too few          time:   [4.3663 µs 4.3792 µs 4.3902 µs]
                        change: [-46.243% -44.938% -43.633%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  23 (23.00%) low severe

extend after one        time:   [7.0197 µs 7.0315 µs 7.0445 µs]
                        change: [-7.1696% -5.3259% -3.3635%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Batch size = 1024

extend too many         time:   [6.8468 µs 7.0021 µs 7.1560 µs]
                        change: [-54.049% -53.341% -52.581%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

extend many too many    time:   [6.8626 µs 6.9784 µs 7.0967 µs]
                        change: [-66.826% -66.451% -66.018%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

extend exact cap        time:   [6.9517 µs 7.0748 µs 7.1877 µs]
                        change: [-47.808% -47.035% -46.368%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

extend too few          time:   [4.3263 µs 4.3450 µs 4.3627 µs]
                        change: [-47.223% -45.877% -44.443%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 100 measurements (25.00%)
  23 (23.00%) low severe
  2 (2.00%) high mild

extend after one        time:   [8.1760 µs 8.2432 µs 8.3070 µs]
                        change: [+2.1848% +4.7733% +7.4532%] (p = 0.00 < 0.05)
                        Performance has regressed.

Binary search to a good batch Size; Batch size = 64

extend too many         time:   [7.4389 µs 7.6824 µs 7.9425 µs]
                        change: [-50.277% -49.191% -47.914%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

extend many too many    time:   [7.3194 µs 7.4105 µs 7.5069 µs]
                        change: [-64.339% -64.007% -63.710%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

extend exact cap        time:   [7.1240 µs 7.2078 µs 7.2905 µs]
                        change: [-45.487% -44.958% -44.426%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

extend too few          time:   [4.3125 µs 4.3217 µs 4.3302 µs]
                        change: [-45.560% -44.525% -43.404%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  23 (23.00%) low severe
  1 (1.00%) high mild

extend after one        time:   [6.9133 µs 6.9211 µs 6.9297 µs]
                        change: [-7.6948% -5.7499% -3.7005%] (p = 0.00 < 0.05)
                        Performance has improved.

Batch size = 16

extend too many         time:   [8.3775 µs 8.4665 µs 8.5494 µs]
                        change: [-42.998% -42.399% -41.812%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

extend many too many    time:   [8.4321 µs 8.5265 µs 8.6122 µs]
                        change: [-59.466% -59.067% -58.669%] (p = 0.00 < 0.05)
                        Performance has improved.

extend exact cap        time:   [8.2966 µs 8.4105 µs 8.5225 µs]
                        change: [-36.950% -36.340% -35.688%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

extend too few          time:   [4.8563 µs 4.8749 µs 4.8912 µs]
                        change: [-40.031% -38.532% -37.025%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  23 (23.00%) low severe
  1 (1.00%) high severe

extend after one        time:   [6.6999 µs 6.7177 µs 6.7321 µs]
                        change: [-15.276% -13.279% -11.167%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 100 measurements (25.00%)
  25 (25.00%) low mild

Batch size = 30

extend too many         time:   [7.8955 µs 7.9927 µs 8.0828 µs]
                        change: [-46.726% -46.068% -45.483%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

extend many too many    time:   [7.8121 µs 7.9254 µs 8.0271 µs]
                        change: [-62.714% -62.287% -61.863%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

extend exact cap        time:   [7.7416 µs 7.8717 µs 7.9940 µs]
                        change: [-41.736% -41.071% -40.390%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

extend too few          time:   [4.3390 µs 4.3541 µs 4.3671 µs]
                        change: [-47.165% -45.753% -44.411%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  22 (22.00%) low severe

extend after one        time:   [5.4803 µs 5.4959 µs 5.5123 µs]
                        change: [-27.750% -26.327% -24.778%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

jdonszelmann · 2023-09-16T11:41:12Z

Although surprising to me, a batch size of 30 seems to work very well for large extends when the buffer does not start out empty. This feature is now behind a feature flag, since technically it's not 100% tested but it seems to perform very well in all tests I could throw at it.

tertsdiepraam · 2023-09-16T20:03:03Z

src/with_const_generics.rs

+                // Safety: clear above
+                unsafe { self.finish_iter::<BATCH_SIZE>(iter) };
+            } else if self.is_empty() {
+                self.clear();


I think you can move this into finish_iter and make it safe. Maybe rename it to new_from_iter or something.

tertsdiepraam · 2023-09-16T20:12:59Z

src/with_const_generics.rs

+                self.clear();
+
+                // Safety: clear above
+                unsafe { self.finish_iter::<BATCH_SIZE>(iter) };


Why is this not extend_batched_internal?

tertsdiepraam · 2023-09-16T20:26:16Z

src/with_const_generics.rs

+            // repeat until we find an empty batch
+            loop {
+                // fill up a batch
+                let how_full = Self::fill_batch(&mut batch, &mut other);


I think the code might be simpler if you fill the batch with min(BATCH_SIZE, CAP - len) instead. Then you never have to split the batch up and stuff.

If I understand correctly, finish_iter is necessary because you need the writeptr not to be equal to readptr or something? Maybe if you do the min thing, that might not be necessary? In any case, might be nice to document why and in what cases you need finish_iter.

No wait, I mean min(BATCH_SIZE, CAP - mod(writeptr, CAP)) I think. Whatever the number is of elements that fit until the end of the allocated space. So you don't have to split.

jdonszelmann added 3 commits September 15, 2023 15:24

add append

7ab9e3c

add extend from slice

d5a78de

batched extend

4e3b3ca

github-actions bot temporarily deployed to commit September 15, 2023 22:51 Inactive

github-actions bot temporarily deployed to pull request September 15, 2023 22:51 Inactive

tertsdiepraam reviewed Sep 15, 2023

View reviewed changes

definitely some UB here oof

0a68135

jdonszelmann marked this pull request as draft September 15, 2023 23:48

github-actions bot temporarily deployed to commit September 15, 2023 23:49 Inactive

github-actions bot temporarily deployed to pull request September 15, 2023 23:49 Inactive

more tests, miri doesn't mind

14824ab

github-actions bot temporarily deployed to commit September 16, 2023 09:11 Inactive

github-actions bot temporarily deployed to pull request September 16, 2023 09:11 Inactive

bench after one

90675b6

github-actions bot temporarily deployed to pull request September 16, 2023 11:40 Inactive

github-actions bot temporarily deployed to commit September 16, 2023 11:40 Inactive

batched extend implemented

cf0437d

jdonszelmann force-pushed the optimised_extend branch from 7568202 to cf0437d Compare September 16, 2023 11:41

github-actions bot temporarily deployed to commit September 16, 2023 11:43 Inactive

github-actions bot temporarily deployed to pull request September 16, 2023 11:43 Inactive

fix docs

fbb1f31

github-actions bot temporarily deployed to pull request September 16, 2023 11:50 Inactive

github-actions bot temporarily deployed to commit September 16, 2023 11:50 Inactive

tertsdiepraam reviewed Sep 16, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimised extend #127

Optimised extend #127

jdonszelmann commented Sep 15, 2023 •

edited

github-actions bot commented Sep 15, 2023 •

edited

tertsdiepraam Sep 15, 2023

jdonszelmann Sep 15, 2023

tertsdiepraam Sep 15, 2023

tertsdiepraam Sep 15, 2023

tertsdiepraam Sep 15, 2023

jdonszelmann commented Sep 16, 2023 •

edited

jdonszelmann commented Sep 16, 2023

tertsdiepraam Sep 16, 2023

tertsdiepraam Sep 16, 2023

tertsdiepraam Sep 16, 2023

tertsdiepraam Sep 17, 2023

Optimised extend #127

Are you sure you want to change the base?

Optimised extend #127

Conversation

jdonszelmann commented Sep 15, 2023 • edited

github-actions bot commented Sep 15, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdonszelmann commented Sep 16, 2023 • edited

Batch Size = 1

Batch size = 32

Batch size = 256

Batch size = 1024

Binary search to a good batch Size; Batch size = 64

Batch size = 16

Batch size = 30

jdonszelmann commented Sep 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdonszelmann commented Sep 15, 2023 •

edited

github-actions bot commented Sep 15, 2023 •

edited

jdonszelmann commented Sep 16, 2023 •

edited