Skip to content

Use upsert snapshot view in hasNoQueryableDocs to fix consistency-mode pruning#18478

Merged
Jackie-Jiang merged 23 commits into
apache:masterfrom
deepthi912:upsert-snapshot-aware-prune
May 15, 2026
Merged

Use upsert snapshot view in hasNoQueryableDocs to fix consistency-mode pruning#18478
Jackie-Jiang merged 23 commits into
apache:masterfrom
deepthi912:upsert-snapshot-aware-prune

Conversation

@deepthi912
Copy link
Copy Markdown
Collaborator

@deepthi912 deepthi912 commented May 12, 2026

Problem

SegmentPrunerService.removeEmptySegments calls UpsertUtils.hasNoQueryableDocs to drop empty segments before query execution. Today that method reads the segment's live ThreadSafeMutableRoaringBitmap via IndexSegment.getQueryableDocIds() / getValidDocIds().

For upsert tables running with consistency mode SYNC or SNAPSHOT, the live bitmap can diverge from the per-query snapshot maintained by UpsertViewManager between refreshes. So a "no queryable docs here" reading on the live bitmap can cause the pruner to drop a segment that is non-empty in the snapshot view the query is about to scan — producing wrong row counts.

Behavior on non-consistency-mode tables and on non-upsert tables is identical to today.

Tests

New UpsertUtilsTest with 7 cases — passing locally:

[INFO] Running org.apache.pinot.segment.local.upsert.UpsertUtilsTest
[INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
Case Expected
non-upsert, live queryable bitmap empty true
non-upsert, live queryable bitmap non-empty false
non-upsert, queryable null, valid bitmap empty true
non-upsert, both bitmaps null false
consistency mode, snapshot empty, live bitmap non-empty true (snapshot wins)
consistency mode, snapshot non-empty, live bitmap empty false (snapshot wins)
consistency mode, snapshot absent, live bitmap empty false (conservative deferral)

…e prune races

SegmentPrunerService.removeEmptySegments uses UpsertUtils.hasNoQueryableDocs to drop
empty segments before query execution. Today that method reads the segment's live
ThreadSafeMutableRoaringBitmap from getQueryableDocIds() / getValidDocIds(). For
upsert tables running with consistency mode SYNC or SNAPSHOT, the live bitmap can
diverge from the per-query snapshot maintained by UpsertViewManager between
refreshes — so pruning a segment as "empty here" can drop one that is non-empty in
the snapshot view the query will actually scan, producing wrong row counts.

Surface the snapshot to the pruner without breaking SPI layering:

  * IndexSegment (pinot-segment-spi) gains two default methods:
      - getQueryableDocIdsSnapshot() -> @nullable MutableRoaringBitmap
      - isUpsertConsistencyModeEnabled() -> boolean
    Both default to null / false so non-upsert segments are unaffected and no
    module dependency changes.

  * ImmutableSegmentImpl / MutableSegmentImpl (pinot-segment-local) override
    both, delegating to PartitionUpsertMetadataManager.getUpsertViewManager().
    Existence of the view manager is exactly equivalent to consistency mode
    being enabled (see BasePartitionUpsertMetadataManager#159-167).

  * PartitionUpsertMetadataManager gains @nullable UpsertViewManager
    getUpsertViewManager(); BasePartitionUpsertMetadataManager already
    implements it, so no impl changes are needed.

  * UpsertViewManager gains a narrow lookup
    getQueryableDocIdsSnapshot(IndexSegment) that returns the snapshot bitmap
    for the segment from the most recent refresh, or null. Lock-free: reads the
    volatile map reference; writers replace the map atomically under
    _upsertViewLock, never mutating the old one in place.

  * UpsertUtils.hasNoQueryableDocs now consults the snapshot first, returns
    false when consistency mode is on but the snapshot is not yet populated
    (first refresh hasn't run, or segment was just tracked), and otherwise
    falls back to live bitmaps as before.

Behavior on non-consistency-mode tables is unchanged.
Seven cases covered:
  * Non-upsert / non-consistency-mode tables (existing behavior preserved):
      - live queryable bitmap empty -> true
      - live queryable bitmap non-empty -> false
      - queryable null, valid bitmap empty -> true
      - both bitmaps null -> false
  * Consistency-mode upsert tables (new snapshot-aware behavior):
      - snapshot empty, live bitmap non-empty -> true (snapshot wins)
      - snapshot non-empty, live bitmap empty -> false (snapshot wins)
      - snapshot absent (first refresh hasn't run or segment just tracked),
        live bitmap empty -> false (conservative deferral)
…ke sure we don't see inconsistencies in queries
…ed to make sure we don't see inconsistencies in queries"

This reverts commit d61e953.
@deepthi912 deepthi912 added upsert Related to upsert functionality query Related to query processing labels May 12, 2026
@deepthi912 deepthi912 changed the title Use upsert snapshot view in hasNoQueryableDocs to fix consistency-mode prune races Use upsert snapshot view in hasNoQueryableDocs to fix consistency-mode pruning May 12, 2026
@deepthi912 deepthi912 requested a review from Jackie-Jiang May 12, 2026 21:13
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 12, 2026

Codecov Report

❌ Patch coverage is 46.66667% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.68%. Comparing base (1a313c3) to head (9e75be0).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...local/indexsegment/mutable/MutableSegmentImpl.java 0.00% 13 Missing ⚠️
.../pinot/segment/local/upsert/UpsertViewManager.java 0.00% 2 Missing ⚠️
...ava/org/apache/pinot/segment/spi/IndexSegment.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18478      +/-   ##
============================================
- Coverage     63.68%   63.68%   -0.01%     
  Complexity     1684     1684              
============================================
  Files          3266     3266              
  Lines        199836   199922      +86     
  Branches      31023    31055      +32     
============================================
+ Hits         127272   127321      +49     
- Misses        62424    62442      +18     
- Partials      10140    10159      +19     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 63.68% <46.66%> (-0.01%) ⬇️
temurin 63.68% <46.66%> (-0.01%) ⬇️
unittests 63.68% <46.66%> (-0.01%) ⬇️
unittests1 55.79% <30.00%> (-0.05%) ⬇️
unittests2 34.96% <40.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/IndexSegment.java Outdated
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal issue; see inline comment.

deepthi912 and others added 6 commits May 13, 2026 10:03
Fixes checkstyle UnusedImports violation introduced by an earlier
refactor; the symbol is no longer referenced after the simplification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/IndexSegment.java Outdated
deepthi912 and others added 8 commits May 13, 2026 17:39
Addresses PR review comment from @Jackie-Jiang — switch the new javadoc
blocks (getQueryableDocIdsSnapshot, isUpsertConsistencyModeEnabled,
hasNoQueryableDocs) from /** */ with <ol><li> to /// markdown style,
matching the style already used elsewhere in this file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Left over after the SPI was simplified back to a no-op default
hasNoQueryableDocs. Fixes the checkstyle UnusedImports violation that
broke the pinot-segment-spi build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier mock(IndexSegment.class) approach didn't actually exercise the
hasNoQueryableDocs default impl on the SPI (Mockito returns false for
abstract methods) and couldn't reach the consistency-mode branch.

Switch to constructing a minimal ImmutableSegmentImpl with mocked
SegmentDirectory + SegmentMetadataImpl, wire the upsert path via
enableUpsert(...), and stub PartitionUpsertMetadataManager +
UpsertViewManager only where each test needs them. Same pattern that
BasePartitionUpsertMetadataManagerTest already uses.

Covers all five hasNoQueryableDocs branches:
  - Non-upsert (manager null) — returns false
  - Non-consistency upsert: live queryable empty / non-empty / missing
    (falls back to validDocIds) / both bitmaps missing
  - Consistency-mode upsert: snapshot empty / non-empty / absent
8 tests, all green locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…segments

The upsert helper (mockUpsertIndexSegment) previously returned
mock(IndexSegment.class). SegmentPrunerService.removeEmptySegments calls
segment.hasNoQueryableDocs(), and Mockito returns the default (false)
for that method on a plain interface mock — so the "empty" tests
(emptyValidPruned, emptyQueryablePruned) never saw the segment as empty
and failed.

Construct a real ImmutableSegmentImpl(SegmentDirectory, SegmentMetadataImpl,
emptyMap, null), stub SegmentMetadataImpl#getTotalDocs, and wire the
upsert state via enableUpsert(mockManager, validDocIds, queryableDocIds)
where the manager's getUpsertViewManager() returns null (non-consistency
mode). The real hasNoQueryableDocs body then evaluates the live bitmaps
the test set up.

Same pattern BasePartitionUpsertMetadataManagerTest already uses.
All 10 tests pass locally; checkstyle + spotless clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Jackie-Jiang Jackie-Jiang merged commit d767fda into apache:master May 15, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

query Related to query processing upsert Related to upsert functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants