-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Remove preloading-type nature of enableSnapshot config #13579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove preloading-type nature of enableSnapshot config #13579
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13579 +/- ##
============================================
+ Coverage 61.75% 61.97% +0.22%
+ Complexity 207 198 -9
============================================
Files 2436 2555 +119
Lines 133233 140609 +7376
Branches 20636 21817 +1181
============================================
+ Hits 82274 87147 +4873
- Misses 44911 46851 +1940
- Partials 6048 6611 +563
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| recordInfoIterator = | ||
| UpsertUtils.getRecordInfoIterator(recordInfoReader, segment.getSegmentMetadata().getTotalDocs()); | ||
| } | ||
| Iterator<RecordInfo> recordInfoIterator = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to remove snapshot as in the else branch above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to remove it 🤔 ? Can there be a scenario where valid doc ids of snapshot differs from the loading segment? I am fine with removing it as well as for partial-upsert we will load it upfront again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let’s wrap this removeSnapshot method in a ‘if(!enableSnapshot) {}’ (as reminded by Qiaochu’s question on metadataTTL, although that shouldn’t be affected as you analyzed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I can think of a scenario where this would make sense. Say, a within-TTL segment gets snapshot removed (without this condition) but it restarts after moving out-of-TTL but before the next segment commit. This would help in that case.
|
|
||
| MutableRoaringBitmap validDocIds; | ||
| if (_enableSnapshot) { | ||
| validDocIds = segment.loadValidDocIdsFromSnapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please check if metadata TTL is enabled. i don't think we can remove it if metadata TTL is enabled.
by removing these lines, when metadataTTL is enabled but preloading is not enabled, we will have missing valid docIDs for historical segments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deemoliu this method is not called at all in case the segment is out of metadataTTL window. Ref:
Lines 389 to 406 in dacc6d0
| if (_metadataTTL > 0 && _largestSeenComparisonValue.get() > 0) { | |
| Preconditions.checkState(_enableSnapshot, "Upsert TTL must have snapshot enabled"); | |
| Preconditions.checkState(_comparisonColumns.size() == 1, | |
| "Upsert TTL does not work with multiple comparison columns"); | |
| Number maxComparisonValue = | |
| (Number) segment.getSegmentMetadata().getColumnMetadataMap().get(_comparisonColumns.get(0)).getMaxValue(); | |
| if (maxComparisonValue.doubleValue() < _largestSeenComparisonValue.get() - _metadataTTL) { | |
| _logger.info("Skip adding segment: {} because it's out of TTL", segmentName); | |
| MutableRoaringBitmap validDocIdsSnapshot = immutableSegment.loadValidDocIdsFromSnapshot(); | |
| if (validDocIdsSnapshot != null) { | |
| MutableRoaringBitmap queryableDocIds = getQueryableDocIds(segment, validDocIdsSnapshot); | |
| immutableSegment.enableUpsert(this, new ThreadSafeMutableRoaringBitmap(validDocIdsSnapshot), | |
| queryableDocIds != null ? new ThreadSafeMutableRoaringBitmap(queryableDocIds) : null); | |
| } else { | |
| _logger.warn("Failed to find snapshot from segment: {} which is out of TTL, treating all documents as valid", | |
| segmentName); | |
| } | |
| return; |
We are returning in the caller method above for out of metadataTTL segments.
This flow affects the segments which are in metadataTTL window, in that case it is okay to read the whole segment and not just from snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall the prior discussions with @klsince and @Jackie-Jiang, upsert TTL (metadataTTL) snapshotting supposed to be atomic at partition level instead of segment level. We need to avoid keeping some snapshot meanwhile delete the other snapshots for the same partitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think calling removeSnapshot inside ‘if(!enableSnapshot) {}’ would help avoid this, and thanks for your early questions for the reminder.
* Remove preloading-type nature of enableSnapshot config * Remove snapshot when adding segment if snapshot is not enabled
Addresses #13466.
When
enableSnapshotis enabled, we only iterate through the validDocIds during segment-addition. This is counter-intuitive as we haveenablePreloadfor the same.In order to make any upsert config change we have to do 2 server restarts currently (first by disabling enableSnapshot and then to enable it again). We can make this restart behaviour only limited to
enablePreloadmethod rather than tying it withenableSnapshotconfig as well.