-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set processingException when all queried segments cannot be acquired #3942
Set processingException when all queried segments cannot be acquired #3942
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3942 +/- ##
=============================================
- Coverage 67.23% 46.26% -20.97%
=============================================
Files 1032 1032
Lines 51147 51172 +25
Branches 7155 7161 +6
=============================================
- Hits 34387 23677 -10710
- Misses 14382 25470 +11088
+ Partials 2378 2025 -353
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise.
/** | ||
* Track a deleted segment. | ||
*/ | ||
void deleteSegment(@Nonnull String segmentName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this function? deleteSegment
sounds like we are deleting segments instead of book-keeping. It's confusing since we have another method called removeSegment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Renamed to trackDeletedSegment
@@ -42,9 +44,13 @@ | |||
@ThreadSafe | |||
public abstract class BaseTableDataManager implements TableDataManager { | |||
private static final Logger LOGGER = LoggerFactory.getLogger(BaseTableDataManager.class); | |||
// cache atmost a given max of deleted segments | |||
private static final int MAX_DELETED_SEGMENT_CACHE_SIZE = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. We can probably add the time based eviction and increase MAX_DELETED_SEGMENT_CACHE_SIZE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
/** | ||
* Track a deleted segment. | ||
*/ | ||
void deleteSegment(@Nonnull String segmentName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
if (segmentDataManager != null) { | ||
segmentDataManagers.add(segmentDataManager); | ||
} else { | ||
if (!tableDataManager.isDeleted(segmentName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isRecentlyDeleted()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed.
@@ -195,6 +211,13 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e | |||
long queryProcessingTime = queryProcessingTimer.getDurationMs(); | |||
dataTable.getMetadata().put(DataTable.NUM_SEGMENTS_QUERIED, Integer.toString(numSegmentsQueried)); | |||
dataTable.getMetadata().put(DataTable.TIME_USED_MS_METADATA_KEY, Long.toString(queryProcessingTime)); | |||
|
|||
if (missingSegments > 0) { | |||
dataTable.addException(QueryException.getException(QueryException.SEGMENTS_MISSING_ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth adding a log message here with the name of the segments missing, Can help us debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; will add it around line 138
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* | ||
* This method performs book keeping of deleted segments. | ||
*/ | ||
void deleteSegment(@Nonnull String tableNameWithType, @Nonnull String segmentName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with this method name. It is not really deleting a segment, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change the method name.
@@ -151,6 +151,15 @@ public void removeSegment(@Nonnull String tableNameWithType, @Nonnull String seg | |||
} | |||
} | |||
|
|||
@Override | |||
public void deleteSegment(@Nonnull String tableNameWithType, @Nonnull String segmentName) { | |||
LOGGER.info("Tracking deleted segment: {} from table: {}", segmentName, tableNameWithType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have other logs for dropping a segment, do we need this one as well? If we really do, then it should go inside the 'if' statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed log.
@@ -196,6 +196,8 @@ public void onBecomeDroppedFromOffline(Message message, NotificationContext cont | |||
String tableNameWithType = message.getResourceName(); | |||
String segmentName = message.getPartitionName(); | |||
|
|||
_instanceDataManager.deleteSegment(tableNameWithType, segmentName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be inserted after line 184, not here. The segment is not available for any new queries sometime after line 184 starts execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can we not get the table data manager directly from here? Do we need to go through the instanceDataManager? Dependency issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After line 184, the segment is unavailable correct; and if it is due to a move/rebalance, we should count it as an error. We only want to suppress it when the segment has been 'deleted'.
Agree with comments. Looking at the existing code, I felt it could be handled within BaseTableDataManager - have the cache (you have it), just update it in removeSegment and use that in handleMissingSegment (don't increment the metric). Am I missing something? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise.
Another edge case that can be covered in the future.
- explicitly delete a segment
- upload the new segment with the same name
- that segment becomes missing
In the above case, we won't report the segment as missing because the cache will still contain the recently deleted segments (for 6hrs). For completeness, I think that we should also cover that case.
@@ -79,6 +79,16 @@ void addSegment(@Nonnull String segmentName, @Nonnull TableConfig tableConfig, | |||
*/ | |||
void removeSegment(@Nonnull String segmentName); | |||
|
|||
/** | |||
* Track a deleted segment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add @param
here to follow javadoc convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a table config that disables message-based refresh for refresh use case. In this case, the old segment is dropped and a new segment is created with the same name.
See PinotHelixManager.refreshSegment().
In this case, we should be removing a newly added segment from the deleted tracker cache? The deletion tracker is getting out of hand, but I don't have an alternative solution.
Also, we need to make a note somewhere that the deletion cache is unreliable (i.e. we lose the cache upon restart). I suggest in ServerQueryExecutorV1Impl (unless we decide to persist the deletion cache....)
* | ||
* @param segmentName name of the segment that needs to removed from the cache (if needed) | ||
*/ | ||
private void untrackIfDeleted(@Nonnull String segmentName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, untrackDeletedSegment()
seems to be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ambivalent about this. If you feel strongly, I can change it. Will see if Subbu has any input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or simply, notifySegmentDeleted() and notifySegmentAdded()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think thats a better a choice. Thanks for the suggestion!
* | ||
* @param segmentName name of the segment that needs to removed from the cache (if needed) | ||
*/ | ||
private void untrackIfDeleted(@Nonnull String segmentName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or simply, notifySegmentDeleted() and notifySegmentAdded()?
@@ -110,6 +118,7 @@ public void shutDown() { | |||
@Override | |||
public void addSegment(@Nonnull ImmutableSegment immutableSegment) { | |||
String segmentName = immutableSegment.getSegmentName(); | |||
untrackIfDeleted(segmentName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should move down to be after line 128 (when we populate the _segmentDataManagerMap). The segment is available for query only after that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…ect to tracking segments added and deleted
e522b2c
to
ee244c5
Compare
…cquired (apache#3942)" (apache#4632) This reverts commit d4f2ece.
Motivation: We have seen cases where due to bugs or delays wrt EV processing on the broker, broker sends a list of segments to the servers that are not hosted by it. This is technically a partial-response, but we currently don't indicate it as such. We would like to track this to the most extent possible.
Details: There are cases where segments can be legitimately missing on the server; the known reason for this is when segments are deleted (either through retention or manually) and the broker's EV isn't updated yet. To not count partial responses in this scenario, we maintain an in-memory cache of deleted segment names. If a recently deleted segment is queried, we check our cache and not count it as an error if the segment-name exists in the cache.
NOTE: This can still miss cases where the server is restarted in the window between deletion and broker EV update. However, the chances of this are deemed low (and cost of catching this error deemed high).
Testing done: Added unit tests to cover the new paths.
Impact: Once this change is rolled out, num-segments-missing server metric should report lower numbers; we should see partial-responses broker metric go up if we hit missing-segments issue on the server.
Please see #4007 for details of the broader effort for this PR.