-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-4808. Add Genesis benchmark for various CRC implementations #1910
Conversation
Running the new benchmarks give the following results. I have posted the conclusion at the start as this comment is quite long. TLDR: Conclusion:
Recommendation:
BenchmarksThere are several implementations of CRC available:
The performance of the algorithm can also depend on the number of data bytes used for each checksum - bytes Per Checksum (BPC). HDFS has a default BPS of 512 generating 1MB of checksum data per 128MB block. Ozone has a default BPS of 1MB generating 512 bytes of checksum data per 128MB block. There is a benchmark class in Hadoop, called Crc32PerformanceTest.java which produces results like the following for varying BPC:
Here:
The numbers in the table show throughput in MB/s. Therefore a higher number is better. With only this data, it is easy conclude that NativeC is the clear winner for all BPC. However, that may not be the case. In the hadoop benchmark, the logic creates a 64MB byte buffer. Then it calculates the expected checksum. Then it benchmarks a "validate checksums" routine, where it generates the checksums for the new data and compares that with the expected. For the native calls, the code is like this:
Ie, it calls NativeCRC32.verifyChunkedSums, which takes the entire data set (64MB) and runs the complete validation in a single native call. The pure Java and java.util.zip implementations cannot do this. They must loop over the data and make multiple calls to the Checksum implementation to checksum at each BPC boundary. Its also worth noting the java.util.zip CRC classes make native calls too. The above does not test real world use. We don't buffer 64MB of data and then calculate / verify all the CRCs in a batch. Rather, we stream the data and calculate the CRCs on demand. It is important to test the streaming case to get more realistic results. Using the following simple loop in a JMH benchmark, we can get a more realistic test. First populate a 64MB ByteBuffer with random bytes. Then using the following loop, calculate the checksums for the 64MB at BPC intervals:
The performance at 512 BPC:
The numbers above are JHM throught put - ie how many times we can calculate the checksums on 64MB of data per second.
I ran twice on Java 11 and twice on Java 8. PureCRC32(C), as used in Ozone is the slowest. The pure java hadoop implementation as significantly faster, but still not great. java.util.zip is best, beating the native Hadoop implementation by quite a margin. Also notable, and reproducible in all test runs - java.util.zip.CRC32 is improved significantly in Java 11 over Java 8. If we also test the Hadoop native implementation, calculating all checksums in a single call (as the hadoop benchmark did), we can see it is fastest as the earlier Hadoop test showed:
I don't have an explanation as to why CRC32CB is so much faster than CRC32B, but this is consistently so. Moving on to a higher BPC:
The pure Java implementations have not benefited at all. The zip implementations are significantly faster and still best. The Hadoop native have improved too. There does appear to be something wrong with nativeCRC32 as it lags CRC32C by a large margin.
The numbers have more variance at the higher BPC, but the trend remains. Conclusion:
Recommendation:
|
* is package private there. The intention of making this class available | ||
* in Ozone is to allow the native libraries to be benchmarked alongside other | ||
* implementations. At the current time, the hadoop native CRC is not used | ||
* anywhere in Ozone except for benchmarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important to call this out in the jira description as well as the PR. With the changes in this patch could Ozone start making use of the native CRC implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not unless you get the compiled shared library from a hadoop build and then add it to the java.library.path. However to be able to benchmark the native libs, we need this code here. The classes inside Hadoop common are marked private, which is why I needed to wrap them.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCBatch.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sodonnel , thanks a lot for working on this. Some comments inlined.
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/common/ChecksumByteBufferImpl.java
Show resolved
Hide resolved
.../common/src/test/java/org/apache/hadoop/ozone/common/TestChecksumImplsComputeSameValues.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/genesis/BenchMarkCRCStreaming.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 the change looks good.
…ing-upgrade * upstream/master: (29 commits) HDDS-4741. Modularize upgrade test (apache#1928) HDDS-4864. Add acceptance tests to certify Ozone with boto3 python client. (apache#1976) HDDS-4791. StateContext.getReports may return list with size larger t… (apache#1892) HDDS-4867. Ozone admin datanode list should report dead and stale nodes (apache#1966) HDDS-4858. Useless Maven cache cleanup (apache#1956) HDDS-4769. Simplify insert operation of ContainerAttribute (apache#1865) HDDS-4847. Fix typo in name of IdentityService (apache#1941) HDDS-4869. Bump jackson version number (apache#1963) HDDS-4871. Fix intellij runConfigurations for datanode (apache#1968) HDDS-4870. Bump jetty version (apache#1964) HDDS-4722. Creating RDBStore fails due to RDBMetrics instance race (apache#1820) HDDS-4138. Improve crc efficiency by using Java.util.zip.CRC when available (apache#1950) HDDS-4816. Add UsageInfoSubcommand to get Datanode usage information. (apache#1919) HDDS-4754. Make scm heartbeat rpc retry interval configurable (apache#1942) HDDS-4832. Show Datanode OperationalState in Recon (apache#1937) HDDS-4653. Support TDE for MPU Keys on Encrypted Buckets (apache#1766) HDDS-4853. libexec/entrypoint.sh might copy from wrong path (apache#1951) HDDS-4857. Format ReplicationType.java which indentation are confusion (apache#1952) HDDS-4850. Intermittent failure in ozonesecure due to unable to allocate block (apache#1948) HDDS-4808. Add Genesis benchmark for various CRC implementations (apache#1910) ... Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/client/ScmClient.java hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto hadoop-hdds/interface-client/src/main/proto/hdds.proto hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/ContainerOperationClient.java hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java
…ing-upgrade-merge-candidate * upstream/master: (29 commits) HDDS-4741. Modularize upgrade test (apache#1928) HDDS-4864. Add acceptance tests to certify Ozone with boto3 python client. (apache#1976) HDDS-4791. StateContext.getReports may return list with size larger t… (apache#1892) HDDS-4867. Ozone admin datanode list should report dead and stale nodes (apache#1966) HDDS-4858. Useless Maven cache cleanup (apache#1956) HDDS-4769. Simplify insert operation of ContainerAttribute (apache#1865) HDDS-4847. Fix typo in name of IdentityService (apache#1941) HDDS-4869. Bump jackson version number (apache#1963) HDDS-4871. Fix intellij runConfigurations for datanode (apache#1968) HDDS-4870. Bump jetty version (apache#1964) HDDS-4722. Creating RDBStore fails due to RDBMetrics instance race (apache#1820) HDDS-4138. Improve crc efficiency by using Java.util.zip.CRC when available (apache#1950) HDDS-4816. Add UsageInfoSubcommand to get Datanode usage information. (apache#1919) HDDS-4754. Make scm heartbeat rpc retry interval configurable (apache#1942) HDDS-4832. Show Datanode OperationalState in Recon (apache#1937) HDDS-4653. Support TDE for MPU Keys on Encrypted Buckets (apache#1766) HDDS-4853. libexec/entrypoint.sh might copy from wrong path (apache#1951) HDDS-4857. Format ReplicationType.java which indentation are confusion (apache#1952) HDDS-4850. Intermittent failure in ozonesecure due to unable to allocate block (apache#1948) HDDS-4808. Add Genesis benchmark for various CRC implementations (apache#1910) ... Conflicts: hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/client/ScmClient.java hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConsts.java hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java hadoop-hdds/interface-admin/src/main/proto/ScmAdminProtocol.proto hadoop-hdds/interface-client/src/main/proto/hdds.proto hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java hadoop-hdds/tools/src/main/java/org/apache/hadoop/hdds/scm/cli/ContainerOperationClient.java hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OzoneManager.java hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconNodeManager.java
* HDDS-3698-nonrolling-upgrade: (29 commits) HDDS-4741. Modularize upgrade test (apache#1928) HDDS-4864. Add acceptance tests to certify Ozone with boto3 python client. (apache#1976) HDDS-4791. StateContext.getReports may return list with size larger t… (apache#1892) HDDS-4867. Ozone admin datanode list should report dead and stale nodes (apache#1966) HDDS-4858. Useless Maven cache cleanup (apache#1956) HDDS-4769. Simplify insert operation of ContainerAttribute (apache#1865) HDDS-4847. Fix typo in name of IdentityService (apache#1941) HDDS-4869. Bump jackson version number (apache#1963) HDDS-4871. Fix intellij runConfigurations for datanode (apache#1968) HDDS-4870. Bump jetty version (apache#1964) HDDS-4722. Creating RDBStore fails due to RDBMetrics instance race (apache#1820) HDDS-4138. Improve crc efficiency by using Java.util.zip.CRC when available (apache#1950) HDDS-4816. Add UsageInfoSubcommand to get Datanode usage information. (apache#1919) HDDS-4754. Make scm heartbeat rpc retry interval configurable (apache#1942) HDDS-4832. Show Datanode OperationalState in Recon (apache#1937) HDDS-4653. Support TDE for MPU Keys on Encrypted Buckets (apache#1766) HDDS-4853. libexec/entrypoint.sh might copy from wrong path (apache#1951) HDDS-4857. Format ReplicationType.java which indentation are confusion (apache#1952) HDDS-4850. Intermittent failure in ozonesecure due to unable to allocate block (apache#1948) HDDS-4808. Add Genesis benchmark for various CRC implementations (apache#1910) ...
What changes were proposed in this pull request?
Add a Genesis benchmark to compare the performance of various CRC32 implementations.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-4808
How was this patch tested?
Benchmarks were execute manually. One new test added to validate that all CRC implementations give the same result.