Use disk based key value store for deduplication#10572
Use disk based key value store for deduplication#10572raghavgautam wants to merge 14 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10572 +/- ##
=============================================
- Coverage 70.35% 13.83% -56.52%
+ Complexity 6464 439 -6025
=============================================
Files 2103 2052 -51
Lines 112769 110896 -1873
Branches 16981 16795 -186
=============================================
- Hits 79341 15348 -63993
- Misses 27877 94295 +66418
+ Partials 5551 1253 -4298
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1668 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
It'll be good to move this rockdb dependency to a plugin subfolder so Pinot core does not depend on this lib |
|
Currently the deduplication is handled using the same way as upsert as a short term solution (not production ready). We have done a lot of bugfixes to the upsert implementation, but not actively maintain the dedup implementation. My suggestion would be to redesign dedup from scratch since it is not the same as upsert (no need to maintain valid docs, no need to track segment etc.), and TTL (dedup window) should be a must have for dedup. If we have proper TTL, the key size should be much smaller. After that if we still need disk based KV store we can introduce that as a plug-in. We don't want to introduce RocksDB dependency in default distribution |
I think the redesign of dedup makes sense but it'll be a much larger and involved effort, and unlikely for @raghavgautam to take. I think it makes sense to start a separate effort of dedup V2, but in the meantime it's fine for the community to add features and make improvements over V1, assuming it'll take some time for V2 to be ready. For depedency on KV store, I left a similar comment that we'd better move this to a plugin for simpler dependencies of the core. |
My concern for the current dedup implementation is that it is not maintained properly (not production ready), and the bugfixes for upsert is not added to dedup. If we decide to keep V1, we want to first apply all the changes to upsert to dedup as well. |
This sounds like a bigger problem that dedup is not ready for the community but we released it. I don't see any warning in Pinot doc... I think we either maintain V1, or not advocate its use (e.g. mark it as experimental, or remove it). @kishoreg @icefury71 wdyt? |
This patch addresses memory problem as discussed in #10571
It uses rocksdb as an on disk key-value store instead of ConcurrentHashMap to reduce memory requirement
The memory requirement is number_of_keys * 10 bits + 64 MB for each column family. So, for 1 billion keys, it will use <1.5 GB RAM.
Source:
For testing, I was able to ingest at 8-10K/sec, with 250 million records already loaded on my M1 mac with 10 cores and 64 GB RAM.