Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set processingException when all queried segments cannot be acquired #3942

Merged
merged 9 commits into from
Mar 22, 2019

Conversation

sunithabeeram
Copy link
Contributor

@sunithabeeram sunithabeeram commented Mar 8, 2019

Motivation: We have seen cases where due to bugs or delays wrt EV processing on the broker, broker sends a list of segments to the servers that are not hosted by it. This is technically a partial-response, but we currently don't indicate it as such. We would like to track this to the most extent possible.

Details: There are cases where segments can be legitimately missing on the server; the known reason for this is when segments are deleted (either through retention or manually) and the broker's EV isn't updated yet. To not count partial responses in this scenario, we maintain an in-memory cache of deleted segment names. If a recently deleted segment is queried, we check our cache and not count it as an error if the segment-name exists in the cache.
NOTE: This can still miss cases where the server is restarted in the window between deletion and broker EV update. However, the chances of this are deemed low (and cost of catching this error deemed high).

Testing done: Added unit tests to cover the new paths.

Impact: Once this change is rolled out, num-segments-missing server metric should report lower numbers; we should see partial-responses broker metric go up if we hit missing-segments issue on the server.

Please see #4007 for details of the broader effort for this PR.

@codecov-io
Copy link

codecov-io commented Mar 8, 2019

Codecov Report

Merging #3942 into master will decrease coverage by 20.96%.
The diff coverage is 87.09%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master    #3942       +/-   ##
=============================================
- Coverage     67.23%   46.26%   -20.97%     
=============================================
  Files          1032     1032               
  Lines         51147    51172       +25     
  Branches       7155     7161        +6     
=============================================
- Hits          34387    23677    -10710     
- Misses        14382    25470    +11088     
+ Partials       2378     2025      -353
Impacted Files Coverage Δ Complexity Δ
.../apache/pinot/common/exception/QueryException.java 88.57% <100%> (+0.16%) 0 <0> (ø) ⬇️
...r/helix/SegmentOnlineOfflineStateModelFactory.java 67.96% <100%> (+0.63%) 0 <0> (ø) ⬇️
...server/starter/helix/HelixInstanceDataManager.java 85.6% <75%> (-0.69%) 0 <0> (ø)
.../pinot/core/data/manager/BaseTableDataManager.java 76.82% <83.33%> (-17%) 0 <0> (ø)
...core/query/executor/ServerQueryExecutorV1Impl.java 79.5% <92.85%> (+0.6%) 0 <0> (ø) ⬇️
.../org/apache/pinot/common/http/MultiGetRequest.java 0% <0%> (-100%) 0% <0%> (ø)
...che/pinot/common/restlet/resources/TablesList.java 0% <0%> (-100%) 0% <0%> (ø)
...che/pinot/pql/parsers/pql2/ast/OptionsAstNode.java 0% <0%> (-100%) 0% <0%> (ø)
...ers/pql2/ast/PredicateParenthesisGroupAstNode.java 0% <0%> (-100%) 0% <0%> (ø)
...ot/core/query/scheduler/TableBasedGroupMapper.java 0% <0%> (-100%) 0% <0%> (ø)
... and 523 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dfaaf11...ee244c5. Read the comment docs.

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

/**
* Track a deleted segment.
*/
void deleteSegment(@Nonnull String segmentName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this function? deleteSegment sounds like we are deleting segments instead of book-keeping. It's confusing since we have another method called removeSegment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed to trackDeletedSegment

@@ -42,9 +44,13 @@
@ThreadSafe
public abstract class BaseTableDataManager implements TableDataManager {
private static final Logger LOGGER = LoggerFactory.getLogger(BaseTableDataManager.class);
// cache atmost a given max of deleted segments
private static final int MAX_DELETED_SEGMENT_CACHE_SIZE = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. We can probably add the time based eviction and increase MAX_DELETED_SEGMENT_CACHE_SIZE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

/**
* Track a deleted segment.
*/
void deleteSegment(@Nonnull String segmentName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

if (segmentDataManager != null) {
segmentDataManagers.add(segmentDataManager);
} else {
if (!tableDataManager.isDeleted(segmentName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRecentlyDeleted()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

@@ -195,6 +211,13 @@ public DataTable processQuery(ServerQueryRequest queryRequest, ExecutorService e
long queryProcessingTime = queryProcessingTimer.getDurationMs();
dataTable.getMetadata().put(DataTable.NUM_SEGMENTS_QUERIED, Integer.toString(numSegmentsQueried));
dataTable.getMetadata().put(DataTable.TIME_USED_MS_METADATA_KEY, Long.toString(queryProcessingTime));

if (missingSegments > 0) {
dataTable.addException(QueryException.getException(QueryException.SEGMENTS_MISSING_ERROR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding a log message here with the name of the segments missing, Can help us debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; will add it around line 138

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*
* This method performs book keeping of deleted segments.
*/
void deleteSegment(@Nonnull String tableNameWithType, @Nonnull String segmentName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with this method name. It is not really deleting a segment, is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change the method name.

@@ -151,6 +151,15 @@ public void removeSegment(@Nonnull String tableNameWithType, @Nonnull String seg
}
}

@Override
public void deleteSegment(@Nonnull String tableNameWithType, @Nonnull String segmentName) {
LOGGER.info("Tracking deleted segment: {} from table: {}", segmentName, tableNameWithType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have other logs for dropping a segment, do we need this one as well? If we really do, then it should go inside the 'if' statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed log.

@@ -196,6 +196,8 @@ public void onBecomeDroppedFromOffline(Message message, NotificationContext cont
String tableNameWithType = message.getResourceName();
String segmentName = message.getPartitionName();

_instanceDataManager.deleteSegment(tableNameWithType, segmentName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be inserted after line 184, not here. The segment is not available for any new queries sometime after line 184 starts execution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we not get the table data manager directly from here? Do we need to go through the instanceDataManager? Dependency issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After line 184, the segment is unavailable correct; and if it is due to a move/rebalance, we should count it as an error. We only want to suppress it when the segment has been 'deleted'.

@kishoreg
Copy link
Member

Agree with comments. Looking at the existing code, I felt it could be handled within BaseTableDataManager - have the cache (you have it), just update it in removeSegment and use that in handleMissingSegment (don't increment the metric). Am I missing something?

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

Another edge case that can be covered in the future.

  1. explicitly delete a segment
  2. upload the new segment with the same name
  3. that segment becomes missing

In the above case, we won't report the segment as missing because the cache will still contain the recently deleted segments (for 6hrs). For completeness, I think that we should also cover that case.

@@ -79,6 +79,16 @@ void addSegment(@Nonnull String segmentName, @Nonnull TableConfig tableConfig,
*/
void removeSegment(@Nonnull String segmentName);

/**
* Track a deleted segment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add @param here to follow javadoc convention?

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a table config that disables message-based refresh for refresh use case. In this case, the old segment is dropped and a new segment is created with the same name.
See PinotHelixManager.refreshSegment().
In this case, we should be removing a newly added segment from the deleted tracker cache? The deletion tracker is getting out of hand, but I don't have an alternative solution.

Also, we need to make a note somewhere that the deletion cache is unreliable (i.e. we lose the cache upon restart). I suggest in ServerQueryExecutorV1Impl (unless we decide to persist the deletion cache....)

*
* @param segmentName name of the segment that needs to removed from the cache (if needed)
*/
private void untrackIfDeleted(@Nonnull String segmentName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, untrackDeletedSegment() seems to be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ambivalent about this. If you feel strongly, I can change it. Will see if Subbu has any input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply, notifySegmentDeleted() and notifySegmentAdded()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thats a better a choice. Thanks for the suggestion!

*
* @param segmentName name of the segment that needs to removed from the cache (if needed)
*/
private void untrackIfDeleted(@Nonnull String segmentName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply, notifySegmentDeleted() and notifySegmentAdded()?

@@ -110,6 +118,7 @@ public void shutDown() {
@Override
public void addSegment(@Nonnull ImmutableSegment immutableSegment) {
String segmentName = immutableSegment.getSegmentName();
untrackIfDeleted(segmentName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should move down to be after line 128 (when we populate the _segmentDataManagerMap). The segment is available for query only after that point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sunithabeeram sunithabeeram merged commit d4f2ece into apache:master Mar 22, 2019
@sunithabeeram sunithabeeram deleted the trackDeletedSegments branch March 22, 2019 16:33
jenniferdai added a commit that referenced this pull request Sep 23, 2019
jenniferdai added a commit that referenced this pull request Sep 24, 2019
chenboat pushed a commit to chenboat/incubator-pinot that referenced this pull request Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants