Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert endianess while copying in read/write into methods #189

Closed
wants to merge 4 commits into from

Conversation

mina86
Copy link
Contributor

@mina86 mina86 commented Dec 5, 2022

Rather than first copying data from source to destination buffer and
then performing endianess adjustment, to the conversion while copying.
This means that each byte is accessed only once which (according to
benchmarks) speeds up read_xxx_into and write_xxx_into methods:

| Benchmark                      | Before [ns/iter] | After [ns/iter] |
|--------------------------------+------------------+-----------------|
| slice_i64::read_big_endian     |  34,863  (±  30) | 23,656  (± 935) |
| slice_i64::read_little_endian  |  15,518  (±  19) | 13,362  (± 405) |
| slice_i64::write_big_endian    |  30,910  (± 109) | 23,123  (±  91) |
| slice_i64::write_little_endian |  14,924  (±  21) | 13,209  (± 180) |
|--------------------------------+------------------+-----------------|
| slice_u16::read_big_endian     |   7,492  (± 343) |  3,788  (±  16) |
| slice_u16::read_little_endian  |   3,366  (±   8) |  3,198  (±   3) |
| slice_u16::write_big_endian    |   4,066  (±   7) |  4,497  (±   8) |
| slice_u16::write_little_endian |   4,040  (± 946) |  3,193  (±   7) |
|--------------------------------+------------------+-----------------|
| slice_u64::read_big_endian     |  35,816  (± 251) | 23,259  (±  21) |
| slice_u64::read_little_endian  |  15,506  (±  86) | 13,365  (±  81) |
| slice_u64::write_big_endian    |  30,948  (±  63) | 23,102  (±  36) |
| slice_u64::write_little_endian |  14,938  (±  17) | 13,158  (±  18) |

The benchmarks were done on AMD Ryzen 9 5900X 12-Core Processor.

I’m somewhat confused why little endian benchmark show improvements
but the results are reproducible. My best guess is that it’s
compiler failing to optimise out for v $dst.iter_mut() { nop(); }
loops currently present.

@BurntSushi
Copy link
Owner

Hi, we spoke on reddit. I briefly skimmed this PR and it is touching a fair bit of code, including unsafe code. That makes it more difficult for me to review and my review latency is quite high. Is this the smallest change that can be made that satisfies what you're trying to do? Or can it be made smaller?

Add read/write into benchmark for u16 to test how the code behaves if
the type is smaller than the native register size and for i64 to test
that there’s nothing weird going on with signedness conversion.
@mina86
Copy link
Contributor Author

mina86 commented Aug 21, 2023

Done. Note that I haven’t rerun the benchmarks but Godbolt confirms that the old code uses memcpy followed by a loop doing the conversion while the new code skips memcpy and does conversion while copying.

The unsafe code that previous PR had was a simplified version of [T]:as_chunks which is currently experimental. I like it because it removes .try_into().unwrap() calls. Generated code is the same either way.

Lastly, this PR has two separate commits. Best to merge them separately. I can submit the first commit as a separate PR if you prefer.

PS. While working on this, I’ve also noticed two unnecessary uses of unsafe and sent separate PRs for those.

Replace unsafe_read_slice macro which first copies data to destination
buffer and then swaps endianess if necessary with a read_slice macro
which does the swapping in a single pass while the data is read.

This is done with `[T]:chunks_exact` which splits source into chunks
which can be converted into integer of desired type with from_xx_bytes
method.
The compiler is smart enough that it notices when endianess conversion
is not necessary and it simplifies copying with a conversion into
plain memcpy.  This means there’s no need to check target_endian and
use unsafe_write_slice_native macro.
Firstly, get rid of $size argument.  The size can be determined from
$ty so there’s no need to pass it as a separate argument.  Secondly,
change it to use to_xx_bytes methods instead of Self::write_uxx so
that it resembles read_slice.
@mina86
Copy link
Contributor Author

mina86 commented Aug 21, 2023

OK, wen’t ahead and split it further. When reviewing I recommend looking at individual commits or I can send each commit as individual PR if you prefer.

BurntSushi pushed a commit that referenced this pull request Oct 5, 2023
Replace unsafe_read_slice macro which first copies data to destination
buffer and then swaps endianess if necessary with a read_slice macro
which does the swapping in a single pass while the data is read.

This is done with `[T]:chunks_exact` which splits source into chunks
which can be converted into integer of desired type with from_xx_bytes
method.

Closes #189
BurntSushi pushed a commit that referenced this pull request Oct 5, 2023
Replace unsafe_read_slice macro which first copies data to destination
buffer and then swaps endianess if necessary with a read_slice macro
which does the swapping in a single pass while the data is read.

This is done with `[T]:chunks_exact` which splits source into chunks
which can be converted into integer of desired type with from_xx_bytes
method.

Closes #189, Closes #196
BurntSushi pushed a commit that referenced this pull request Oct 6, 2023
Replace unsafe_read_slice macro which first copies data to destination
buffer and then swaps endianess if necessary with a read_slice macro
which does the swapping in a single pass while the data is read.

This is done with `[T]:chunks_exact` which splits source into chunks
which can be converted into integer of desired type with from_xx_bytes
method.

Closes #189, Closes #196
@BurntSushi
Copy link
Owner

This PR is on crates.io in byteorder 1.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants