New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-2028] Implement RockDbBasedMap as an alternate to DiskBasedMap in ExternalSpillableMap #3117
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3117 +/- ##
============================================
- Coverage 46.01% 44.29% -1.73%
+ Complexity 5306 4615 -691
============================================
Files 911 826 -85
Lines 39476 36669 -2807
Branches 4254 3949 -305
============================================
- Hits 18166 16243 -1923
+ Misses 19456 18674 -782
+ Partials 1854 1752 -102
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
hudi-common/src/main/java/org/apache/hudi/common/util/collection/RocksDBDAO.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java
Outdated
Show resolved
Hide resolved
…ethod have wrong filter condition (apache#3109)
…culative execution is enabled (apache#3093) * unit tests added
…thsOfInstant (apache#3125) Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
…Table is enabled. (apache#3079)
…te instants from the dataset timeline. (apache#3082)
…eClient$commitstats (apache#3050)
…tionPlanOperator (apache#3105) Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
…apache#3111) [HUDI-2069] KafkaAvroSchemaDeserializer should get sourceSchema passed instead using Reflection
…tadata Safely (apache#3138) Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
…can not assign initially (apache#3148)
…putFormat#MergeIterator to avoid StackOverflow (apache#3159)
…he#3168) Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
…et io for flink batch compaction (apache#3169)
I had tests for TestExternalSpillableMap in the stacked diff, anyway added it here.
We do test if the right amount of keys were in memory and the right amount was spilled over. Wondering how different values of maxInMemorySizeInBytes will help? |
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
* [HUDI-1944] Support Hudi to read from committed offset * [HUDI-1944] Adding group option to KafkaResetOffsetStrategies * [HUDI-1944] Update Exception msg
Co-authored-by: 喻兆靖 <yuzhaojing@bilibili.com>
…to rm_rocks_db
Moved to #3194 |
Moved to #3194 |
What is the purpose of the pull request
This pull request adds a new alternative based on RockDb for the Disk Based Map that is used within the ExternalSpillableMap. Our benchmark results shows that RockDb may improve performance significantly when the data set is large while available memory may be scarce. RockDb supports compression, efficient memory usage and native library, that may be more efficient in certain situations. By default, disk based map will be used, and a config change will be required to enable rocksDb.
In this PR, the rocksDB support is only enabled for HoodieMergeHandle, and a subsequent PR will extend it to all consumers of ExternalSpillableMap (tracked here HUDI-2044)
Brief change log
Verify this pull request
This change added tests and can be verified as follows:
Added the unit test in TestSpillableRocksDBBasedMap
Updated the test for TestExternalSpillableMap