-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce segment limit for indexing tasks #7514
Introduce segment limit for indexing tasks #7514
Conversation
} | ||
} else { | ||
boolean rowInNewSegment = true; | ||
for (SegmentIdWithShardSpec segment : appenderator.getUnpublishedSegments()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done O(log(n)) instead of O(n) if the list of segments is sorted by time. If a reviewer thinks this approach for enforcing max total segments is cool, I can optimize this part (it'll take a bit more work).
@@ -313,6 +316,23 @@ public TaskStatus run(final TaskToolbox toolbox) | |||
log.debug("Discarded null row, considering thrownAway."); | |||
rowIngestionMeters.incrementThrownAway(); | |||
} else { | |||
// Check if the upcoming row will result in the creation of a new segment. If so and the new number of segments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the added complexity and the code duplication (this similar block is repeated 4 times) introduced by trying to pre-determine whether the row will cause a new segment to be generated before it's sent to the appenderator. Also, AppenderatorDriverAddResult.isPushRequired()
feels like the right place for this decision to be made, and not in every indexing task type that uses an appenderator.
I understand that after Stream/BatchAppenderatorDriver.add()
is called, if you determine that you're now over maxTotalSegments
, you wind up pushing a segment that has only one row. But that is only true if you try to push all of the open segments. What if instead you made appropriate modifications to methods like StreamAppenderatorDriver.publish()
and BatchAppenderatorDriver.pushAllAndClear()
(or introduced new methods) that allow you to provide a list of segmentIds to exclude, which you would use to exclude the newly created segment that pushed you over the limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's doable
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
[WIP]
Closes #7238. See #7238 for proposal.