Create array in-place without an intermediate Vec #1

hniksic · 2021-05-07T20:13:55Z

Here is a commit that should speed up deserialization by avoiding the intermediate allocation.

Although unsafe appears to be complex compared to the original code, creating a vector for each array might represent a significant performance hit, especially in code bases with many small arrays. (Generics are often used to create small arrays whose sizes are e.g. 2 or 3 to represent two- or three-dimensional vectors.)

Kromey · 2021-05-07T20:32:01Z

Thank you!

As I mentioned on Reddit, I'm going to sit on this one for now. It looks like a great contribution, but unfortunately I simply lack the knowledge and experience right now to fully understand the safety of this code, and won't feel comfortable about merging it until I do.

hniksic · 2021-05-07T20:39:04Z

It might help to know that the safety analysis is based on (i.e. copied verbatim from) the MaybeUninit documentation that shows how to do the exact same thing, only for arrays of fixed size, and without the serde part. So one could say that, if not the actual code, the approach was vetted by experts.

The biggest difference is that the code in the PR drops the already-deserialized array elements in case of error.

Kromey · 2021-05-09T15:43:47Z

I was curious about the actual performance difference, and had a few minutes this morning so whipped up a quick benchmark:

Vec buffer              time:   [20.287 ns 20.352 ns 20.434 ns]   
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

MaybeUninit array            time:   [1.4959 ns 1.4983 ns 1.5003 ns]          
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

I'm not sure what it means that there's significantly more "outliers" with the MaybeUninit array (this is consistent across multiple runs), but the time speaks volumes about the benefits of this approach!

hniksic · 2021-05-09T16:30:12Z

That's really impressive. Although it's not surprising that allocation would take about 20ns, whereas writing into the array directly would be (almost) free, it's different to see it in actual measurement.

What array sizes did you use for the benchmark, and what was the serialization format?

hniksic · 2021-05-09T16:39:06Z

I've now pushed a commit to the PR that avoids the whole dropping loop if T doesn't need dropping in the first place. It shouldn't affect the no-error path, but it's an easy optimization of the error path, in case the compiler is not smart enough to realize the whole loop is in fact no-op.

On an unrelated note, I now see that the document example proceeds to drop the initialized elements using ptr::drop_in_place(elem.as_mut_ptr()), which might be a bit more efficient because it never moves any of the array content. On the other hand, my approach seems (to me) more readable and easier to reason about safety-wise, so micro-optimizing the error path further is probably not worth it.

hniksic · 2021-05-09T17:37:14Z

I've now pushed another commit to the PR, which extracts the "array-filling" functionality into a separate function and provides both a safe version (containing your original code, which is really elegant) and my unsafe version, switching between them based on a configuration switch. This switch could be exported through a crate "feature", so that users who prefer the no-unsafe version can request it.

Feel free to revert this commit if it is not what you want and you decide to merge the PR.

If you decide to use it, it might be a good idea to re-run the benchmark, just in case exposing seq.next_element() through iterator abstraction made things slower. (into_array could then accept impl SeqAccess<'de>, but that requires additional fiddling with serde-specific lifetimes and generics that would be nicer to avoid.)

Kromey · 2021-05-10T18:41:13Z

What array sizes did you use for the benchmark, and what was the serialization format?

Going on your theory (which I happen to agree with) that lots of small arrays is likely to be the most common use case for this crate, as well as the fact that I wanted to target the benchmarking at the actual allocation and not at the "fill 'er up" to populate the data, the benchmarks used Vec::with_capacity(3) against [MaybeUninit<usize>; 3]; they were populated with a simple for i in 0..3 loop that just inserted i * 3. Then to avoid the compiler seeing the function as a no-op and optimizing it away, the result was returned to the benchmarker, for which I was using Criterion. I didn't actually serialize anything.

I suppose it could be conceivable that there might be some amount of overhead in "initializing" the MaybeUninit, in which case there might be a cutoff somewhere at which the allocation of a Vec is actually cheaper. Not sure it would be worth chasing that possibility down, however.

I'm going to give your new commits a look-through, but barring anything shocking in them I think I'm ready to merge your PR.

Kromey

I like these changes, especially after my own benchmarking has shown such marked performance difference between my original approach and this. After reviewing this and the MaybeUninit documentation, I'm reasonably comfortable that the use of unsafe here is safe.

Kromey

Please roll back this latest commit, I think the option to provide safe/unsafe versions at compile time, while interesting, adds too much additional complexity while offering little if any gain in return.

…ling" This reverts commit 47c71fa.

hniksic · 2021-05-10T19:45:44Z

Please roll back this latest commit, I think the option to provide safe/unsafe versions at compile time, while interesting, adds too much additional complexity while offering little if any gain in return.

Fair enough - done.

Kromey · 2021-05-10T19:58:16Z

Thank you!

Create array in-place without an intermediate Vec

025b122

Kromey added the enhancement New feature or request label May 7, 2021

Skip the dropping code if T is not Drop

459107b

Provide both a safe and an unsafe implementation of array filling

47c71fa

Kromey approved these changes May 10, 2021

View reviewed changes

Kromey requested changes May 10, 2021

View reviewed changes

Revert "Provide both a safe and an unsafe implementation of array fil…

0ea84f0

…ling" This reverts commit 47c71fa.

Kromey approved these changes May 10, 2021

View reviewed changes

Kromey merged commit 5a69ec5 into Kromey:main May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create array in-place without an intermediate Vec #1

Create array in-place without an intermediate Vec #1

hniksic commented May 7, 2021

Kromey commented May 7, 2021

hniksic commented May 7, 2021

Kromey commented May 9, 2021

hniksic commented May 9, 2021

hniksic commented May 9, 2021

hniksic commented May 9, 2021

Kromey commented May 10, 2021

Kromey left a comment

Kromey left a comment

hniksic commented May 10, 2021

Kromey commented May 10, 2021

Create array in-place without an intermediate Vec #1

Create array in-place without an intermediate Vec #1

Conversation

hniksic commented May 7, 2021

Kromey commented May 7, 2021

hniksic commented May 7, 2021

Kromey commented May 9, 2021

hniksic commented May 9, 2021

hniksic commented May 9, 2021

hniksic commented May 9, 2021

Kromey commented May 10, 2021

Kromey left a comment

Choose a reason for hiding this comment

Kromey left a comment

Choose a reason for hiding this comment

hniksic commented May 10, 2021

Kromey commented May 10, 2021