Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve speed of std.digest.digest!(Hash, Range) on non-array ranges #7509

Merged
merged 1 commit into from
May 31, 2020

Conversation

n8sh
Copy link
Member

@n8sh n8sh commented May 31, 2020

Speed up std.digest.digest!(Hash, Range) on non-array ranges by chunking the data. I would have liked to use something like std.stdio.File.byChunk(ubyte[]) but I didn't find anything equivalent. std.algorithm.iteration.chunkBy doesn't work for this purpose.

In the below tables "old/new len 16 time" refers to the time to hash 2^^24 times a range of length 16. "Old/new len. 2^^20 time" refers to the time to hash 256 times a range of length 2^^20. While benchmarking I was surprised to discover that with the implementation prior to this PR dmd -O -inline was noticeably faster than ldc2 -O3 for CRC32 and every MurmurHash3 variant.

ldc2 -O3

Hash Old len. 16 time New len. 16 time Old len. 2^^20 time New len. 2^^20 time
CRC32 119.678 ms 56.893 ms 125.997 ms 43.728 ms
CRC64-ECMA 104.908 ms 57.266 ms 139.615 ms 43.212 ms
MurmurHash3_32 447.410 ms 80.189 ms 436.264 ms 72.763 ms
MurmurHash3_128_32 387.852 ms 60.567 ms 373.398 ms 43.661 ms
MurmurHash3_128_64 390.114 ms 49.332 ms 380.530 ms 39.437 ms
MD5 319.029 ms 239.154 ms 148.082 ms 65.568 ms
RIPEMD-160 562.388 ms 448.181 ms 197.306 ms 118.093 ms
SHA-1 251.524 ms 178.457 ms 139.191 ms 48.883 ms
SHA-256 601.730 ms 488.641 ms 215.740 ms 132.275 ms
SHA-512 807.068 ms 676.071 ms 178.022 ms 90.610 ms

dmd -m64 -O -inline

Hash Old len. 16 time New len. 16 time Old len. 2^^20 time New len. 2^^20 time
CRC32 91.092 ms 69.856 ms 67.266 ms 55.229 ms
CRC64-ECMA 148.647 ms 71.273 ms 127.772 ms 55.503 ms
MurmurHash3_32 262.746 ms 100.529 ms 218.865 ms 59.750 ms
MurmurHash3_128_32 301.565 ms 135.436 ms 214.086 ms 45.737 ms
MurmurHash3_128_64 233.733 ms 65.100 ms 218.865 ms 59.750 ms
MD5 1178.596 ms 1062.393 ms 344.910 ms 258.120 ms
RIPEMD-160 3380.222 ms 3301.520 ms 929.543 ms 835.330 ms
SHA-1 401.889 ms 300.644 ms 147.163 ms 56.278 ms
SHA-256 2831.336 ms 2710.006 ms 762.206 ms 649.417 ms
SHA-512 4168.025 ms 4076.388 ms 671.243 ms 568.103 ms

dmd -m64 (no optimizations)

Hash Old len. 16 time New len. 16 time Old len. 2^^20 time New len. 2^^20 time
CRC32 682.807 ms 253.921 ms 633.794 ms 216.614 ms
CRC64-ECMA 684.976 ms 254.633 ms 635.416 ms 213.671 ms
MurmurHash3_32 976.169 ms 374.700 ms 912.716 ms 324.124 ms
MurmurHash3_128_32 917.519 ms 386.122 ms 847.622 ms 308.732 ms
MurmurHash3_128_64 863.298 ms 328.495 ms 863.298 ms 328.495 ms
MD5 1733.697 ms 1271.102 ms 879.290 ms 421.817 ms
RIPEMD-160 4001.052 ms 3456.247 ms 1460.950 ms 1006.049 ms
SHA-1 956.143 ms 480.133 ms 681.675 ms 220.734 ms
SHA-256 3463.750 ms 2984.336 ms 1309.795 ms 823.479 ms
SHA-512 4713.230 ms 4246.403 ms 1182.474 ms 658.689 ms

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @n8sh!

Bugzilla references

Auto-close Bugzilla Severity Description
20887 enhancement Improve speed of std.digest.digest!(Hash, Range) on non-array ranges

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#7509"

@dlang-bot dlang-bot merged commit 781a4a0 into dlang:master May 31, 2020
@kinke
Copy link
Contributor

kinke commented May 31, 2020

While benchmarking I was surprised to discover that with the implementation prior to this PR dmd -O -inline was noticeably faster than ldc2 -O3 for CRC32 and every MurmurHash3 variant.

That should never happen. ;) - Please let us know via an LDC issue if you come across such a case. There's one catch though - due to cross-module inlining being disabled by default (and otherwise experimental), the code in question should first be tested with LTO (-flto=<thin or full> -defaultlib=phobos2-ldc-lto,druntime-ldc-lto), as that can indeed be a common bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants