Remove hot calls to `memmove` by replacing `Vec<u8>` with `VecDeque<u8>` #17

paolobarbolini · 2022-04-03T10:20:33Z

This removes the need to constantly drain, which is one of the hottest functions on the flamegraph.

From a quick benchmark on a large file decoding goes from about 10.5s to 3.5s. As a comparison zstd 1.4.8 takes 1.2s. We're getting closer 👀

Before

After

KillingSpark · 2022-04-03T11:09:02Z

Thats great! I'll take a look later today and merge this

I wonder how I missed that. I seem to remember that the code taking the most time was the bitreader stuff

paolobarbolini · 2022-04-03T11:44:11Z

I wonder how I missed that. I seem to remember that the code taking the most time was the bitreader stuff

I guess that might have been caused by me trying ruzstd on a 600 MB file compressed with zstd --ultra -22 😅

KillingSpark · 2022-04-03T13:30:15Z

The TODO around the extend_from_within needs to be resolved. This is a massive performance boost for files that have long regions that can be copied from the existing buffer. It should be possible to do, it will likely require some unsafe though. I had an unsafe implementation for this before Vec::extend_from_within existed, and I am willing to have some unsafe in there for this big boost in performance.

If you don't want to do that I can merge this and do it myself when I find the time. The code is still correct, it is just being pretty inefficient in these cases.

Otherwise just a few nitpicks,

drain_to should probably call f before the writes to the hash. Because the bytes only get drained when the calls succeed. This might result in the data being put into the hash multiple times. There might be legitimate reasons to call this again even if f fails for some reason. I apparently failed to see this when I wrote this code, but I think it would be better this way. Same applies to all places where data is drained. It should first be given to the fallible functions and only if they succeed the data should be given to the hasher.
drain_to should probably not call f if n2 is 0. I know it is just a slice with no elements but I think it would "nice" to not do that. And the check is easy enough to do

So (imo) it should be

   f(slice1)?;
   hash(slice1);
   if !slice2.is_empty() {
       f(slice2)?;
       hash(slice2);
   }

drain_to_writer should probably be implemented using drain_to too, like drain_to_window_size_writer right? Maybe I am just overlooking something here?

paolobarbolini · 2022-04-03T14:56:43Z

Done. Unfortunately VecDeque doesn't provide the unsafe APIs required to reproduce the equivalent of extend_from_within so we have to go though an intermediate buffer, which brings a small performance boost anyway for my big file so I guess it's fine?

The size for buf is a random number off the top of my head, so we can change it to a better one.

KillingSpark · 2022-04-03T15:50:24Z

Ah right, I remembered why VecDequeue cant implement this nicely. I might play around with a custom buffer structure for this. We don't need all the features a VecDequeue provides. But that is for another day, I'd call this good enough. I too am not sure what a good size for that buffer is, but 4kb seems fine.

One last thing just came to my mind, regarding the functions in drain_to. Right now it is like this:

   f(slice1)?;
   hash(slice1);
   if !slice2.is_empty() {
       f(slice2)?;
       hash(slice2);
   }
   drain(amount);

When we commit slice1 to the hasher, we should also drain that. So when f(slice2) returns an error, we want to still call drain(0..n1). So something like this maybe?

    f(slice1)?;
    hash(slice1);

    if n2 != 0 {
        let res = f(slice2);
        let drain_amount;
        if res.is_ok() {
            hash(slice2);
            drain_amount = n1+n2;
        } else {
            drain_amount = n1;
        }
        drain(drain_amount)
        res
    } else {
        drain(n1);
        Ok(())
    }

Sorry for doing another roundtrip but I think it's good to fix this in this PR too. The drain_to is a very good idea, and having it do the right thing when it's merged in would be nice :)

paolobarbolini · 2022-04-04T07:43:00Z

I'll take a look at your proposal later in the day.

This PR just went by in my notifications and it could be very cool to leverage this implementation in the future if it gets optimized more. rust-lang/rust#95632

paolobarbolini · 2022-04-04T14:05:37Z

Done. Using DrainGuard also makes this panic resistant

KillingSpark · 2022-04-05T19:04:06Z

Very nice. Thank you!

…=the8472 Add VecDeque::extend from vec::IntoIter and slice::Iter specializations Inspired from the [`Vec` `SpecExtend` implementation](https://github.com/rust-lang/rust/blob/027a232755fa9728e9699337267f6675dfd0a8ba/library/alloc/src/vec/spec_extend.rs), but without the specialization for `TrustedLen` which I'll look into in the future. Should help rust-lang#95632 and KillingSpark/zstd-rs#17 ## Benchmarks Before ``` test vec_deque::bench_extend_bytes ... bench: 862 ns/iter (+/- 10) test vec_deque::bench_extend_vec ... bench: 883 ns/iter (+/- 19) ``` After ``` test vec_deque::bench_extend_bytes ... bench: 8 ns/iter (+/- 0) test vec_deque::bench_extend_vec ... bench: 24 ns/iter (+/- 1) ```

Add VecDeque::extend from vec::IntoIter and slice::Iter specializations Inspired from the [`Vec` `SpecExtend` implementation](https://github.com/rust-lang/rust/blob/027a232755fa9728e9699337267f6675dfd0a8ba/library/alloc/src/vec/spec_extend.rs), but without the specialization for `TrustedLen` which I'll look into in the future. Should help #95632 and KillingSpark/zstd-rs#17 ## Benchmarks Before ``` test vec_deque::bench_extend_bytes ... bench: 862 ns/iter (+/- 10) test vec_deque::bench_extend_vec ... bench: 883 ns/iter (+/- 19) ``` After ``` test vec_deque::bench_extend_bytes ... bench: 8 ns/iter (+/- 0) test vec_deque::bench_extend_vec ... bench: 24 ns/iter (+/- 1) ```

paolobarbolini added 3 commits April 3, 2022 10:50

Optimize calls to drain

8157cd6

Simplify draining

a348eb4

Replace Vec<u8> with VecDeque<u8>

92d89d1

paolobarbolini added 2 commits April 3, 2022 16:18

Address code review comments around draining

97d4359

Bring back the equivalent of extend_from_within

124fc36

Don't drain more than we were successfully able to write

e4b2ab8

Use .min instead of manual implementation

aa03782

KillingSpark merged commit 7779f8b into KillingSpark:master Apr 5, 2022

paolobarbolini mentioned this pull request Apr 10, 2022

Add VecDeque::extend from vec::IntoIter and slice::Iter specializations rust-lang/rust#95904

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove hot calls to `memmove` by replacing `Vec<u8>` with `VecDeque<u8>` #17

Remove hot calls to `memmove` by replacing `Vec<u8>` with `VecDeque<u8>` #17

paolobarbolini commented Apr 3, 2022 •

edited

Loading

KillingSpark commented Apr 3, 2022

paolobarbolini commented Apr 3, 2022 •

edited

Loading

KillingSpark commented Apr 3, 2022

paolobarbolini commented Apr 3, 2022 •

edited

Loading

KillingSpark commented Apr 3, 2022

paolobarbolini commented Apr 4, 2022 •

edited

Loading

paolobarbolini commented Apr 4, 2022 •

edited

Loading

KillingSpark commented Apr 5, 2022

Remove hot calls to memmove by replacing Vec<u8> with VecDeque<u8> #17

Remove hot calls to memmove by replacing Vec<u8> with VecDeque<u8> #17

Conversation

paolobarbolini commented Apr 3, 2022 • edited Loading

Before

After

KillingSpark commented Apr 3, 2022

paolobarbolini commented Apr 3, 2022 • edited Loading

KillingSpark commented Apr 3, 2022

paolobarbolini commented Apr 3, 2022 • edited Loading

KillingSpark commented Apr 3, 2022

paolobarbolini commented Apr 4, 2022 • edited Loading

paolobarbolini commented Apr 4, 2022 • edited Loading

KillingSpark commented Apr 5, 2022

Remove hot calls to `memmove` by replacing `Vec<u8>` with `VecDeque<u8>` #17

Remove hot calls to `memmove` by replacing `Vec<u8>` with `VecDeque<u8>` #17

paolobarbolini commented Apr 3, 2022 •

edited

Loading

paolobarbolini commented Apr 3, 2022 •

edited

Loading

paolobarbolini commented Apr 3, 2022 •

edited

Loading

paolobarbolini commented Apr 4, 2022 •

edited

Loading

paolobarbolini commented Apr 4, 2022 •

edited

Loading