Skip to content

Conversation

@tibrewalpratik17
Copy link
Contributor

Addresses #13466.

When enableSnapshot is enabled, we only iterate through the validDocIds during segment-addition. This is counter-intuitive as we have enablePreload for the same.
In order to make any upsert config change we have to do 2 server restarts currently (first by disabling enableSnapshot and then to enable it again). We can make this restart behaviour only limited to enablePreload method rather than tying it with enableSnapshot config as well.

@codecov-commenter
Copy link

codecov-commenter commented Jul 10, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.97%. Comparing base (59551e4) to head (1764fdf).
Report is 750 commits behind head on master.

Files Patch % Lines
...cal/upsert/BasePartitionUpsertMetadataManager.java 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13579      +/-   ##
============================================
+ Coverage     61.75%   61.97%   +0.22%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2555     +119     
  Lines        133233   140609    +7376     
  Branches      20636    21817    +1181     
============================================
+ Hits          82274    87147    +4873     
- Misses        44911    46851    +1940     
- Partials       6048     6611     +563     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.91% <66.66%> (+0.21%) ⬆️
java-21 61.84% <66.66%> (+0.21%) ⬆️
skip-bytebuffers-false 61.94% <66.66%> (+0.19%) ⬆️
skip-bytebuffers-true 61.79% <66.66%> (+34.07%) ⬆️
temurin 61.97% <66.66%> (+0.22%) ⬆️
unittests 61.97% <66.66%> (+0.22%) ⬆️
unittests1 46.46% <0.00%> (-0.43%) ⬇️
unittests2 27.73% <66.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review July 11, 2024 03:39
recordInfoIterator =
UpsertUtils.getRecordInfoIterator(recordInfoReader, segment.getSegmentMetadata().getTotalDocs());
}
Iterator<RecordInfo> recordInfoIterator =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to remove snapshot as in the else branch above

Copy link
Contributor Author

@tibrewalpratik17 tibrewalpratik17 Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to remove it 🤔 ? Can there be a scenario where valid doc ids of snapshot differs from the loading segment? I am fine with removing it as well as for partial-upsert we will load it upfront again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let’s wrap this removeSnapshot method in a ‘if(!enableSnapshot) {}’ (as reminded by Qiaochu’s question on metadataTTL, although that shouldn’t be affected as you analyzed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I can think of a scenario where this would make sense. Say, a within-TTL segment gets snapshot removed (without this condition) but it restarts after moving out-of-TTL but before the next segment commit. This would help in that case.


MutableRoaringBitmap validDocIds;
if (_enableSnapshot) {
validDocIds = segment.loadValidDocIdsFromSnapshot();
Copy link
Contributor

@deemoliu deemoliu Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please check if metadata TTL is enabled. i don't think we can remove it if metadata TTL is enabled.

by removing these lines, when metadataTTL is enabled but preloading is not enabled, we will have missing valid docIDs for historical segments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deemoliu this method is not called at all in case the segment is out of metadataTTL window. Ref:

if (_metadataTTL > 0 && _largestSeenComparisonValue.get() > 0) {
Preconditions.checkState(_enableSnapshot, "Upsert TTL must have snapshot enabled");
Preconditions.checkState(_comparisonColumns.size() == 1,
"Upsert TTL does not work with multiple comparison columns");
Number maxComparisonValue =
(Number) segment.getSegmentMetadata().getColumnMetadataMap().get(_comparisonColumns.get(0)).getMaxValue();
if (maxComparisonValue.doubleValue() < _largestSeenComparisonValue.get() - _metadataTTL) {
_logger.info("Skip adding segment: {} because it's out of TTL", segmentName);
MutableRoaringBitmap validDocIdsSnapshot = immutableSegment.loadValidDocIdsFromSnapshot();
if (validDocIdsSnapshot != null) {
MutableRoaringBitmap queryableDocIds = getQueryableDocIds(segment, validDocIdsSnapshot);
immutableSegment.enableUpsert(this, new ThreadSafeMutableRoaringBitmap(validDocIdsSnapshot),
queryableDocIds != null ? new ThreadSafeMutableRoaringBitmap(queryableDocIds) : null);
} else {
_logger.warn("Failed to find snapshot from segment: {} which is out of TTL, treating all documents as valid",
segmentName);
}
return;

We are returning in the caller method above for out of metadataTTL segments.

This flow affects the segments which are in metadataTTL window, in that case it is okay to read the whole segment and not just from snapshot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall the prior discussions with @klsince and @Jackie-Jiang, upsert TTL (metadataTTL) snapshotting supposed to be atomic at partition level instead of segment level. We need to avoid keeping some snapshot meanwhile delete the other snapshots for the same partitions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think calling removeSnapshot inside ‘if(!enableSnapshot) {}’ would help avoid this, and thanks for your early questions for the reminder.

@klsince klsince merged commit abbe10c into apache:master Jul 15, 2024
rajagopr pushed a commit to rajagopr/pinot that referenced this pull request Jul 17, 2024
* Remove preloading-type nature of enableSnapshot config

* Remove snapshot when adding segment if snapshot is not enabled
@tibrewalpratik17 tibrewalpratik17 deleted the remove_preloading_on_snapshot branch September 24, 2024 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants