-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-13056. Add support for a new COMPOSITE_CRC FileChecksum which is comparable between different block layouts and between striped/replicated files #344
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Previously it failed to convert ms to seconds and thus reports aggregate throughput as 1/1000x actual numbers. By recommendation from reviewers, such a calculation is not in-scope for TestDFSIO anyways, so remove it. Also, make all the bytes-to-mb and milliseconds-to-seconds conversions consistent in the reporting messages to help avoid this type of error in the future.
This reverts commit bb73366.
Adds new file-level ChecksumCombineMode options settable through config and lower-level BlockChecksumOptions to indicate block-checksum types supported by both blockChecksum and blockGroupChecksum in DataTransferProtocol. CRCs are composed such that they are agnostic to block/chunk/cell layout and thus can be compared between replicated-files and striped-files of different underlying blocksize, bytes-per-crc, and cellSize settings. Does not alter default behavior, and doesn't touch the data-read or data-write paths at all.
Minor optimization by starting multiplier at x^8 and fix the behavior of composing a zero-length crcB.
…OSITE_CRC. Update BlockChecksumHelper's CRC composition to use the same data buffer used in MD5 case, and factor our shared logic from the StripedBlockChecksumReconstructor into an abstract base class so that reconstruction logic can be shared between MD5CRC and COMPOSITE_CRC.
Encapsulate all the CRC internals such as tracking the CRC polynomial, precomputing the monomial, etc., into this class so taht BlockChecksumHelper and FileChecksumHelper only need to interact with the clean interfaces of CrcComposer.
Wire it in to BlockChecksumHelper and use CrcComposer to regenerate striped composite CRCs for missing EC data blocks.
Extract hooks in TestFileChecksum to allow a subclass to share core tests while modifying expectations of a subset of tests; add TestFileChecksumCompositeCrc which extends TestFileChecksum to apply the same test suite to COMPOSITE_CRC, and add a test case for comparing two replicated files with different block sizes. Test confirms that MD5CRC will yield different checksums between replicated vs striped, and two replicated files with different block sizes, while COMPOSITE_CRC yields the same checksum for all cases.
Fix a bug in handling byte-array updates with nonzero offset.
Refactor to just use stripeLength with COMPOSITE_CRC, where non-striped COMPOSITE_CRC is just an edge case where stripeLength is longer than the data range.
…hecksum. Additionally, fix up remaining TODOs; add wrappers for late-evaluating hex format of CRCs to pass into debug statements and clean up logging logic.
-Gate creation of some debug objects on LOG.isDebugEnabled() -Expand InterfaceAudience of CrcUtil/CrcComposer to include Common, Yarn -Split out the CRC reassembly logic in BlockGroupNonStripedChecksumComputer#compute to helper function -Remove "throws IOException" from CrcComposer.digest
Fixes hadoop.tools.TestHdfsConfigFields
Due to HDFS-13191 the byte buffer size will not match getLength(), and equals() compares the entire buffer regardless of getLength(), so other filesystems would be unable to match the COMPOSITE-CRC FileChecksum.
-Remove explicit InterfaceStability declarations on new classes -Add checks for digester != null in StripedBlockChecksumReconstructor -Add bounds checking in CrcUtil.readInt/writeInt -Throw instead of return null in FileChecksumHelper.makeFinalResult when combineMode not recognized -Add BlockChecksumType to debug message in FileChecksumHelper.tryDatanode -Follow usual style of delegating to multi-arg constructor in BlockChecksumOptions -Instead of building late-evaluated "Object" representations for debug purposes, make CrcUtil just provide toString methods and wrap the calls getting the strings inside isDebugEnabled() checks.
-Mark getFileChecksumWithCombineMode as LimitedPrivate -Add TestCopyMapperCompositeCrc extending TestCopyMapper with differentiation of behaviors between the checksum options in terms of what kinds of file layouts are supported. -Remove String.format from some LOG.debug statements -Make ReplicatedFileChecksumComputer raise PathIOExceptions -Switch TestCrcUtil and TestCrcComposer to use LambdaTestUtils.intercept instead of junit ExpectedException
dennishuo
force-pushed
the
add-composite-crc32
branch
from
March 25, 2018 01:51
e0983d5
to
e53c453
Compare
-Add InterfaceStability.Unstable annotations to LimitedPrivate classes -Remove unnecessary isDebugEnabled() checks in FileChecksumHelper -Add global test timeouts for TestCrcUtil and TestCrcComposer
Remove unnecessary LimitedPrivate annotation in DFSClient. Fix up some javadoc formatting.
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Close the PR since it was committed. |
shanthoosh
pushed a commit
to shanthoosh/hadoop
that referenced
this pull request
Oct 15, 2019
…nges You can find more details on the bug fixes and API changes [here](https://github.com/facebook/rocksdb/releases). I upgraded to 5.7.3 since 5.8.0 has a regression [KAFKA-6100](https://issues.apache.org/jira/browse/KAFKA-6100) All of our tests passed locally. I will monitor travis for failures. Author: Bharath Kumarasubramanian <bkumaras@linkedin.com> Reviewers: Boris Shkolnik <boryas@gmail.com> Closes apache#344 from bharathkk/master
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.