Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use sys_byteorder.h functions in big_endian.h
The compiler does not consistently manage to compile the existing code down to use the architecture's byteswap instruction. By using the functions from sys_byteorder.h instead we get consistently small and performant code. Also add a microbenchmark to validate the performance. On Android this results in a 2-5 times speedup. Before this change (arm32, measured on Pixel5): BM_WriteBigEndianAligned<int16_t> 4.59 ns BM_WriteBigEndianAligned<uint16_t> 4.17 ns BM_WriteBigEndianAligned<int32_t> 5.49 ns BM_WriteBigEndianAligned<uint32_t> 5.49 ns BM_WriteBigEndianAligned<int64_t> 15.5 ns BM_WriteBigEndianAligned<uint64_t> 15.5 ns BM_WriteBigEndianMisaligned<int16_t> 4.59 ns BM_WriteBigEndianMisaligned<uint16_t> 4.17 ns BM_WriteBigEndianMisaligned<int32_t> 5.49 ns BM_WriteBigEndianMisaligned<uint32_t> 5.49 ns BM_WriteBigEndianMisaligned<int64_t> 16.0 ns BM_WriteBigEndianMisaligned<uint64_t> 16.0 ns BM_ReadBigEndianAligned<int16_t> 4.59 ns BM_ReadBigEndianAligned<uint16_t> 4.59 ns BM_ReadBigEndianAligned<int32_t> 5.84 ns BM_ReadBigEndianAligned<uint32_t> 5.84 ns BM_ReadBigEndianAligned<int64_t> 13.4 ns BM_ReadBigEndianAligned<uint64_t> 13.4 ns BM_ReadBigEndianMisaligned<int16_t> 4.59 ns BM_ReadBigEndianMisaligned<uint16_t> 4.59 ns BM_ReadBigEndianMisaligned<int32_t> 5.84 ns BM_ReadBigEndianMisaligned<uint32_t> 5.84 ns BM_ReadBigEndianMisaligned<int64_t> 13.4 ns BM_ReadBigEndianMisaligned<uint64_t> 13.4 ns After this change (arm32, measured on Pixel5): BM_WriteBigEndianAligned<int16_t> 2.31 ns BM_WriteBigEndianAligned<uint16_t> 1.98 ns BM_WriteBigEndianAligned<int32_t> 1.98 ns BM_WriteBigEndianAligned<uint32_t> 1.98 ns BM_WriteBigEndianAligned<int64_t> 2.78 ns BM_WriteBigEndianAligned<uint64_t> 2.80 ns BM_WriteBigEndianMisaligned<int16_t> 2.30 ns BM_WriteBigEndianMisaligned<uint16_t> 1.98 ns BM_WriteBigEndianMisaligned<int32_t> 1.98 ns BM_WriteBigEndianMisaligned<uint32_t> 1.98 ns BM_WriteBigEndianMisaligned<int64_t> 2.95 ns BM_WriteBigEndianMisaligned<uint64_t> 2.95 ns BM_ReadBigEndianAligned<int16_t> 1.85 ns BM_ReadBigEndianAligned<uint16_t> 1.85 ns BM_ReadBigEndianAligned<int32_t> 1.60 ns BM_ReadBigEndianAligned<uint32_t> 1.59 ns BM_ReadBigEndianAligned<int64_t> 2.33 ns BM_ReadBigEndianAligned<uint64_t> 2.34 ns BM_ReadBigEndianMisaligned<int16_t> 1.88 ns BM_ReadBigEndianMisaligned<uint16_t> 1.88 ns BM_ReadBigEndianMisaligned<int32_t> 1.62 ns BM_ReadBigEndianMisaligned<uint32_t> 1.62 ns BM_ReadBigEndianMisaligned<int64_t> 2.36 ns BM_ReadBigEndianMisaligned<uint64_t> 2.35 ns On x86-64 the compiler seems to have less trouble optimizing the existing code, and only the 64-bit integer read results change significantly: Before this change (x86-64, Linux): BM_WriteBigEndianAligned<int16_t> 0.924 ns BM_WriteBigEndianAligned<uint16_t> 0.903 ns BM_WriteBigEndianAligned<int32_t> 0.933 ns BM_WriteBigEndianAligned<uint32_t> 0.932 ns BM_WriteBigEndianAligned<int64_t> 1.08 ns BM_WriteBigEndianAligned<uint64_t> 1.09 ns BM_WriteBigEndianMisaligned<int16_t> 0.952 ns BM_WriteBigEndianMisaligned<uint16_t> 0.925 ns BM_WriteBigEndianMisaligned<int32_t> 0.947 ns BM_WriteBigEndianMisaligned<uint32_t> 0.931 ns BM_WriteBigEndianMisaligned<int64_t> 1.08 ns BM_WriteBigEndianMisaligned<uint64_t> 1.08 ns BM_ReadBigEndianAligned<int16_t> 1.03 ns BM_ReadBigEndianAligned<uint16_t> 0.988 ns BM_ReadBigEndianAligned<int32_t> 0.956 ns BM_ReadBigEndianAligned<uint32_t> 0.965 ns BM_ReadBigEndianAligned<int64_t> 2.33 ns BM_ReadBigEndianAligned<uint64_t> 2.30 ns BM_ReadBigEndianMisaligned<int16_t> 0.994 ns BM_ReadBigEndianMisaligned<uint16_t> 0.996 ns BM_ReadBigEndianMisaligned<int32_t> 0.959 ns BM_ReadBigEndianMisaligned<uint32_t> 0.964 ns BM_ReadBigEndianMisaligned<int64_t> 2.31 ns BM_ReadBigEndianMisaligned<uint64_t> 2.30 ns After this change (x86-64, Linux): BM_WriteBigEndianAligned<int16_t> 0.917 ns BM_WriteBigEndianAligned<uint16_t> 0.927 ns BM_WriteBigEndianAligned<int32_t> 0.956 ns BM_WriteBigEndianAligned<uint32_t> 0.942 ns BM_WriteBigEndianAligned<int64_t> 1.09 ns BM_WriteBigEndianAligned<uint64_t> 1.09 ns BM_WriteBigEndianMisaligned<int16_t> 0.925 ns BM_WriteBigEndianMisaligned<uint16_t> 0.906 ns BM_WriteBigEndianMisaligned<int32_t> 0.939 ns BM_WriteBigEndianMisaligned<uint32_t> 0.936 ns BM_WriteBigEndianMisaligned<int64_t> 1.11 ns BM_WriteBigEndianMisaligned<uint64_t> 1.12 ns BM_ReadBigEndianAligned<int16_t> 0.997 ns BM_ReadBigEndianAligned<uint16_t> 0.996 ns BM_ReadBigEndianAligned<int32_t> 0.972 ns BM_ReadBigEndianAligned<uint32_t> 0.956 ns BM_ReadBigEndianAligned<int64_t> 1.17 ns BM_ReadBigEndianAligned<uint64_t> 1.17 ns BM_ReadBigEndianMisaligned<int16_t> 0.999 ns BM_ReadBigEndianMisaligned<uint16_t> 0.997 ns BM_ReadBigEndianMisaligned<int32_t> 0.969 ns BM_ReadBigEndianMisaligned<uint32_t> 0.965 ns BM_ReadBigEndianMisaligned<int64_t> 1.21 ns BM_ReadBigEndianMisaligned<uint64_t> 1.19 ns Change-Id: I21119e03ef799458c4530b031ca3142a146580ed Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4756034 Reviewed-by: Daniel Cheng <dcheng@chromium.org> Commit-Queue: Adam Rice <ricea@chromium.org> Cr-Commit-Position: refs/heads/main@{#1185111}
- Loading branch information