-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert endianess while copying in read/write into methods #189
Conversation
Hi, we spoke on reddit. I briefly skimmed this PR and it is touching a fair bit of code, including |
Add read/write into benchmark for u16 to test how the code behaves if the type is smaller than the native register size and for i64 to test that there’s nothing weird going on with signedness conversion.
Done. Note that I haven’t rerun the benchmarks but Godbolt confirms that the old code uses memcpy followed by a loop doing the conversion while the new code skips memcpy and does conversion while copying. The unsafe code that previous PR had was a simplified version of Lastly, this PR has two separate commits. Best to merge them separately. I can submit the first commit as a separate PR if you prefer. PS. While working on this, I’ve also noticed two unnecessary uses of unsafe and sent separate PRs for those. |
Replace unsafe_read_slice macro which first copies data to destination buffer and then swaps endianess if necessary with a read_slice macro which does the swapping in a single pass while the data is read. This is done with `[T]:chunks_exact` which splits source into chunks which can be converted into integer of desired type with from_xx_bytes method.
The compiler is smart enough that it notices when endianess conversion is not necessary and it simplifies copying with a conversion into plain memcpy. This means there’s no need to check target_endian and use unsafe_write_slice_native macro.
Firstly, get rid of $size argument. The size can be determined from $ty so there’s no need to pass it as a separate argument. Secondly, change it to use to_xx_bytes methods instead of Self::write_uxx so that it resembles read_slice.
OK, wen’t ahead and split it further. When reviewing I recommend looking at individual commits or I can send each commit as individual PR if you prefer. |
Replace unsafe_read_slice macro which first copies data to destination buffer and then swaps endianess if necessary with a read_slice macro which does the swapping in a single pass while the data is read. This is done with `[T]:chunks_exact` which splits source into chunks which can be converted into integer of desired type with from_xx_bytes method. Closes #189
Replace unsafe_read_slice macro which first copies data to destination buffer and then swaps endianess if necessary with a read_slice macro which does the swapping in a single pass while the data is read. This is done with `[T]:chunks_exact` which splits source into chunks which can be converted into integer of desired type with from_xx_bytes method. Closes #189, Closes #196
Replace unsafe_read_slice macro which first copies data to destination buffer and then swaps endianess if necessary with a read_slice macro which does the swapping in a single pass while the data is read. This is done with `[T]:chunks_exact` which splits source into chunks which can be converted into integer of desired type with from_xx_bytes method. Closes #189, Closes #196
This PR is on crates.io in |
Rather than first copying data from source to destination buffer and
then performing endianess adjustment, to the conversion while copying.
This means that each byte is accessed only once which (according to
benchmarks) speeds up read_xxx_into and write_xxx_into methods:
The benchmarks were done on AMD Ryzen 9 5900X 12-Core Processor.
I’m somewhat confused why little endian benchmark show improvements
but the results are reproducible. My best guess is that it’s
compiler failing to optimise out
for v $dst.iter_mut() { nop(); }
loops currently present.