-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-2950] Addressing performance traps in Bulk Insert/Layout Optimization #4234
Conversation
45b0927
to
1fa571c
Compare
@alexeykudinkin Is this ready to be reviewed? |
@vinothchandar yes, just need to rebase on master since this one was stacked |
1fa571c
to
b4eb85f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiarixiaoyao do you want to review this once?
hudi-common/src/main/java/org/apache/hudi/common/util/ObjectSizeCalculator.java
Show resolved
Hide resolved
*/ | ||
public class PathCachingFileName extends Path { | ||
|
||
// NOTE: volatile keyword is redundant here and put mostly for reader notice, since all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment? Don't want to overload the code with these. I'd expect people to understand volatile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But its goal is the opposite: volatile isn't required here and i added a comment for those who might decide that it isn't needed, removing it (while the purpose of volatile here is to hint to the reader that the read/write of the ref doesn't need to be synchronized here)
/** | ||
* NOTE: This class is thread-safe | ||
*/ | ||
public class PathCachingFileName extends Path { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename: FileNameCachingPath or NameCachedPath, ending with Path
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/sort/SpaceCurveSortingHelper.java
Show resolved
Hide resolved
...in/java/org/apache/hudi/execution/bulkinsert/RDDSpatialCurveOptimizationSortPartitioner.java
Show resolved
Hide resolved
...in/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
Outdated
Show resolved
Hide resolved
- Streamlined flow - Removed unnecessary operations (double-mapping, boxing, etc) Updated `CollectionUtils::combine` to avoid AL resizing
…g `RowFactory`, passing `Object[]`
…oid unnecessary Avro ser/de loop
…ile-name is fetched; Inject `PathCachingFileName` into `HoodieWrapperFileSystem.convertPathWithScheme`
b4eb85f
to
24a2303
Compare
@hudi-bot run azure |
…zation (apache#4234) * Cleaned up Z-curve/Hilbert ordering seqs: - Streamlined flow - Removed unnecessary operations (double-mapping, boxing, etc) Updated `CollectionUtils::combine` to avoid AL resizing * Tidying up * Reducing small objects churn due to Scala/Java conversions by re-using `RowFactory`, passing `Object[]` * Fixing name resolution (disambiguation overloads) * `lint` * Replaced `OverwriteAvroPayloadRecord` w/ `RewriteRecordPayload` to avoid unnecessary Avro ser/de loop * Added `PathCachingFileName` to avoid fetching substrings every time file-name is fetched; Inject `PathCachingFileName` into `HoodieWrapperFileSystem.convertPathWithScheme` * Drastically reducing size of the `ArrayDeque` allocated by `ObjectSizeCalculator` * XXX * Missing license * Fixed refs (after rebase) * Fixing compilation failure in Scala 2.11 * `PathCachingFileName` > `FileNameCachingPath` * Tidying up
…zation (apache#4234) * Cleaned up Z-curve/Hilbert ordering seqs: - Streamlined flow - Removed unnecessary operations (double-mapping, boxing, etc) Updated `CollectionUtils::combine` to avoid AL resizing * Tidying up * Reducing small objects churn due to Scala/Java conversions by re-using `RowFactory`, passing `Object[]` * Fixing name resolution (disambiguation overloads) * `lint` * Replaced `OverwriteAvroPayloadRecord` w/ `RewriteRecordPayload` to avoid unnecessary Avro ser/de loop * Added `PathCachingFileName` to avoid fetching substrings every time file-name is fetched; Inject `PathCachingFileName` into `HoodieWrapperFileSystem.convertPathWithScheme` * Drastically reducing size of the `ArrayDeque` allocated by `ObjectSizeCalculator` * XXX * Missing license * Fixed refs (after rebase) * Fixing compilation failure in Scala 2.11 * `PathCachingFileName` > `FileNameCachingPath` * Tidying up
…zation (apache#4234) * Cleaned up Z-curve/Hilbert ordering seqs: - Streamlined flow - Removed unnecessary operations (double-mapping, boxing, etc) Updated `CollectionUtils::combine` to avoid AL resizing * Tidying up * Reducing small objects churn due to Scala/Java conversions by re-using `RowFactory`, passing `Object[]` * Fixing name resolution (disambiguation overloads) * `lint` * Replaced `OverwriteAvroPayloadRecord` w/ `RewriteRecordPayload` to avoid unnecessary Avro ser/de loop * Added `PathCachingFileName` to avoid fetching substrings every time file-name is fetched; Inject `PathCachingFileName` into `HoodieWrapperFileSystem.convertPathWithScheme` * Drastically reducing size of the `ArrayDeque` allocated by `ObjectSizeCalculator` * XXX * Missing license * Fixed refs (after rebase) * Fixing compilation failure in Scala 2.11 * `PathCachingFileName` > `FileNameCachingPath` * Tidying up
Tips
What is the purpose of the pull request
NOTE: This stacked on top of #4106
This PR targets addressing performance traps found during Perf Benchmarking of the Layout Optimization implementation
Brief change log
RowFactory
, passingObject[]
OverwriteAvroPayloadRecord
w/RewriteRecordPayload
to avoid unnecessary Avro ser/de loopObjectSizeCalculator
Verify this pull request
This pull request is a trivial rework / code cleanup without any test coverage.
This pull request is already covered by existing tests, such as (please describe tests).
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.