Skip to content

add preloading support for dedup tables#14187

Merged
klsince merged 3 commits intoapache:masterfrom
klsince:preload_dedup_table
Oct 22, 2024
Merged

add preloading support for dedup tables#14187
klsince merged 3 commits intoapache:masterfrom
klsince:preload_dedup_table

Conversation

@klsince
Copy link
Contributor

@klsince klsince commented Oct 8, 2024

Add preloading support for dedup tables to bootstrap dedup metadata faster during server restarts or table rebalances.

The overall preloading logic is similar with the one made for upsert table, but is a bit simpler because dedup table doesn't use validDocIds bitmaps to track PKs and it assumes all ingested immutable segments have no duplicate PKs. When preloading segments, we can update dedup metadata more efficiently, like simply doing map.put() instead of map.compute() which does get-check-set conditionally.

@klsince klsince force-pushed the preload_dedup_table branch from 3492d7a to 897fe74 Compare October 8, 2024 18:42
@codecov-commenter
Copy link

codecov-commenter commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 41.08527% with 76 lines in your changes missing coverage. Please review.

Project coverage is 63.73%. Comparing base (59551e4) to head (9f5cf09).
Report is 1214 commits behind head on master.

Files with missing lines Patch % Lines
...local/dedup/BasePartitionDedupMetadataManager.java 42.25% 33 Missing and 8 partials ⚠️
...ata/manager/realtime/RealtimeTableDataManager.java 0.00% 28 Missing ⚠️
...up/ConcurrentMapPartitionDedupMetadataManager.java 0.00% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14187      +/-   ##
============================================
+ Coverage     61.75%   63.73%   +1.98%     
- Complexity      207     1535    +1328     
============================================
  Files          2436     2626     +190     
  Lines        133233   144765   +11532     
  Branches      20636    22152    +1516     
============================================
+ Hits          82274    92272    +9998     
- Misses        44911    45686     +775     
- Partials       6048     6807     +759     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.70% <41.08%> (+1.99%) ⬆️
java-21 63.62% <41.08%> (+2.00%) ⬆️
skip-bytebuffers-false 63.73% <41.08%> (+1.98%) ⬆️
skip-bytebuffers-true 63.59% <41.08%> (+35.86%) ⬆️
temurin 63.73% <41.08%> (+1.98%) ⬆️
unittests 63.73% <41.08%> (+1.98%) ⬆️
unittests1 55.43% <4.65%> (+8.54%) ⬆️
unittests2 34.31% <40.31%> (+6.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang Jackie-Jiang added feature dedup Changes related to realtime ingestion dedup handling labels Oct 9, 2024
@klsince klsince force-pushed the preload_dedup_table branch from 897fe74 to fd5a1b6 Compare October 18, 2024 04:48
@klsince klsince force-pushed the preload_dedup_table branch from fd5a1b6 to bd3ad54 Compare October 22, 2024 04:49
@klsince klsince merged commit 767f32f into apache:master Oct 22, 2024
@klsince klsince deleted the preload_dedup_table branch October 22, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dedup Changes related to realtime ingestion dedup handling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants