SSE2/AVX2 optimized get_checksum2()/MD5 for x86-64, and MD5P8 whole-f…#23
Closed
Chainfire wants to merge 1 commit into
Closed
SSE2/AVX2 optimized get_checksum2()/MD5 for x86-64, and MD5P8 whole-f…#23Chainfire wants to merge 1 commit into
Chainfire wants to merge 1 commit into
Conversation
…ile checksum - MD5 optimization in block matching phase: MD5 hashes computed during rsync's block matching phase are independent and thus possible to process in parallel. This code processes 4 blocks in parallel if SSE2 is available, or 8 if AVX2 is available. An increase of performance (or decrease of CPU usage) of up to 6x has been measured. A prefetching algorithm is used to predict and load upcoming blocks, as this prevents the need for extensive modifications to other parts of the rsync sources to get this working. This remains compatible with existing rsync builds using MD5 checksums. - MD5P8 whole-file checksum: Splits the input up into 8 independent streams (64-byte interleave), and produces a final checksum based on the end state of those 8 streams. If parallelization of MD5 hashing is available, the performance gain (or CPU usage decrease) is 2x to 6x compared to traditional MD5. The rsync version on both ends of the connection need MD5P8 support built-in for it to be used. xxHash is still preferred (and faster), but this provides a reasonably fast fallback for the case where xxHash libraries are not available at build time.
Contributor
Author
Member
|
Thanks! I've turned this into the latest md5p8.diff in the rsync-patches repo (which is now also on GitHub). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…ile checksum
MD5 hashes computed during rsync's block matching phase are independent
and thus possible to process in parallel. This code processes 4 blocks
in parallel if SSE2 is available, or 8 if AVX2 is available. An increase
of performance (or decrease of CPU usage) of up to 6x has been measured.
A prefetching algorithm is used to predict and load upcoming blocks, as
this prevents the need for extensive modifications to other parts of
the rsync sources to get this working.
This remains compatible with existing rsync builds using MD5 checksums.
Splits the input up into 8 independent streams (64-byte interleave), and
produces a final checksum based on the end state of those 8 streams. If
parallelization of MD5 hashing is available, the performance gain (or
CPU usage decrease) is 2x to 6x compared to traditional MD5.
The rsync version on both ends of the connection need MD5P8 support
built-in for it to be used.
xxHash is still preferred (and faster), but this provides a reasonably
fast fallback for the case where xxHash libraries are not available at
build time.