Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create array in-place without an intermediate Vec #1

Merged
merged 4 commits into from May 10, 2021
Merged

Create array in-place without an intermediate Vec #1

merged 4 commits into from May 10, 2021

Conversation

hniksic
Copy link
Contributor

@hniksic hniksic commented May 7, 2021

Here is a commit that should speed up deserialization by avoiding the intermediate allocation.

Although unsafe appears to be complex compared to the original code, creating a vector for each array might represent a significant performance hit, especially in code bases with many small arrays. (Generics are often used to create small arrays whose sizes are e.g. 2 or 3 to represent two- or three-dimensional vectors.)

@Kromey
Copy link
Owner

Kromey commented May 7, 2021

Thank you!

As I mentioned on Reddit, I'm going to sit on this one for now. It looks like a great contribution, but unfortunately I simply lack the knowledge and experience right now to fully understand the safety of this code, and won't feel comfortable about merging it until I do.

@Kromey Kromey added the enhancement New feature or request label May 7, 2021
@hniksic
Copy link
Contributor Author

hniksic commented May 7, 2021

It might help to know that the safety analysis is based on (i.e. copied verbatim from) the MaybeUninit documentation that shows how to do the exact same thing, only for arrays of fixed size, and without the serde part. So one could say that, if not the actual code, the approach was vetted by experts.

The biggest difference is that the code in the PR drops the already-deserialized array elements in case of error.

@Kromey
Copy link
Owner

Kromey commented May 9, 2021

I was curious about the actual performance difference, and had a few minutes this morning so whipped up a quick benchmark:

Vec buffer              time:   [20.287 ns 20.352 ns 20.434 ns]   
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

MaybeUninit array            time:   [1.4959 ns 1.4983 ns 1.5003 ns]          
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

I'm not sure what it means that there's significantly more "outliers" with the MaybeUninit array (this is consistent across multiple runs), but the time speaks volumes about the benefits of this approach!

@hniksic
Copy link
Contributor Author

hniksic commented May 9, 2021

That's really impressive. Although it's not surprising that allocation would take about 20ns, whereas writing into the array directly would be (almost) free, it's different to see it in actual measurement.

What array sizes did you use for the benchmark, and what was the serialization format?

@hniksic
Copy link
Contributor Author

hniksic commented May 9, 2021

I've now pushed a commit to the PR that avoids the whole dropping loop if T doesn't need dropping in the first place. It shouldn't affect the no-error path, but it's an easy optimization of the error path, in case the compiler is not smart enough to realize the whole loop is in fact no-op.

On an unrelated note, I now see that the document example proceeds to drop the initialized elements using ptr::drop_in_place(elem.as_mut_ptr()), which might be a bit more efficient because it never moves any of the array content. On the other hand, my approach seems (to me) more readable and easier to reason about safety-wise, so micro-optimizing the error path further is probably not worth it.

@hniksic
Copy link
Contributor Author

hniksic commented May 9, 2021

I've now pushed another commit to the PR, which extracts the "array-filling" functionality into a separate function and provides both a safe version (containing your original code, which is really elegant) and my unsafe version, switching between them based on a configuration switch. This switch could be exported through a crate "feature", so that users who prefer the no-unsafe version can request it.

Feel free to revert this commit if it is not what you want and you decide to merge the PR.

If you decide to use it, it might be a good idea to re-run the benchmark, just in case exposing seq.next_element() through iterator abstraction made things slower. (into_array could then accept impl SeqAccess<'de>, but that requires additional fiddling with serde-specific lifetimes and generics that would be nicer to avoid.)

@Kromey
Copy link
Owner

Kromey commented May 10, 2021

What array sizes did you use for the benchmark, and what was the serialization format?

Going on your theory (which I happen to agree with) that lots of small arrays is likely to be the most common use case for this crate, as well as the fact that I wanted to target the benchmarking at the actual allocation and not at the "fill 'er up" to populate the data, the benchmarks used Vec::with_capacity(3) against [MaybeUninit<usize>; 3]; they were populated with a simple for i in 0..3 loop that just inserted i * 3. Then to avoid the compiler seeing the function as a no-op and optimizing it away, the result was returned to the benchmarker, for which I was using Criterion. I didn't actually serialize anything.

I suppose it could be conceivable that there might be some amount of overhead in "initializing" the MaybeUninit, in which case there might be a cutoff somewhere at which the allocation of a Vec is actually cheaper. Not sure it would be worth chasing that possibility down, however.

I'm going to give your new commits a look-through, but barring anything shocking in them I think I'm ready to merge your PR.

Copy link
Owner

@Kromey Kromey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these changes, especially after my own benchmarking has shown such marked performance difference between my original approach and this. After reviewing this and the MaybeUninit documentation, I'm reasonably comfortable that the use of unsafe here is safe.

Copy link
Owner

@Kromey Kromey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please roll back this latest commit, I think the option to provide safe/unsafe versions at compile time, while interesting, adds too much additional complexity while offering little if any gain in return.

@hniksic
Copy link
Contributor Author

hniksic commented May 10, 2021

Please roll back this latest commit, I think the option to provide safe/unsafe versions at compile time, while interesting, adds too much additional complexity while offering little if any gain in return.

Fair enough - done.

@Kromey Kromey merged commit 5a69ec5 into Kromey:main May 10, 2021
@Kromey
Copy link
Owner

Kromey commented May 10, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants