-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for bootstrap segments #16609
Support for bootstrap segments #16609
Conversation
- Adds a new API in the coordinator. - All processes that have storage locations configured (including tasks) talk to the coordinator if they can, and fetch bootstrap segments from it. - Then load the segments onto the segment cache as part of startup. - This addresses the segment bootstrapping logic required by processes before they can start serving queries or ingesting. This patch also lays the foundation to speed up upgrades.
c9dcb02
to
a4ed4b1
Compare
9034d84
to
7459a8a
Compare
The rules aren't evaluated if there are no clusters.
server/src/main/java/org/apache/druid/server/http/MetadataResource.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/client/coordinator/CoordinatorClientImpl.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/http/MetadataResource.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/client/coordinator/CoordinatorClient.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/rpc/indexing/OverlordClientImpl.java
Show resolved
Hide resolved
Thanks for the feature, @abhishekrb19 ! It would be very useful for task/historical startup. |
…entLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
d746aee
to
04793b6
Compare
Thanks for the review, @kfaraz! I've responded to the comments and added code comments/tests where applicable for further clarification. Could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final nitpicks, rest looks good.
server/src/main/java/org/apache/druid/client/BootstrapSegmentsInfo.java
Outdated
Show resolved
Hide resolved
bootstrapSegments = ImmutableList.copyOf(iterator); | ||
try { | ||
final BootstrapSegmentsInfo response = | ||
FutureUtils.getUnchecked(coordinatorClient.fetchBootstrapSegments(), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is going to block startup, should there be a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this uses the CoordinatorClient injected here, which has a standard retry policy with a max of 6 retries (~3 seconds total wait time) on transient errors. So processes will come up after ~3 seconds if the failures are persistent. There's one test testLoadBootstrapSegmentsWhenExceptionThrown()
in this PR that verifies this fail open strategy that the server just comes up when an error occurs talking to the coordinator during startup.
I plan to add an optional fail close strategy for tasks in a follow up, which will likely need a different retry policy than the standard retry policy in this patch. I will try adding some more scenarios to verify the different behaviors like startup doesn't block indefinitely, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. What about success scenarios, where a historical or task is busy loading a lot of segments and thus may not respond to liveness probes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is currently no overall startup timeout. If this is an issue, particularly for tasks, we may want to adjust the bootstrapper such that it can at least respond to liveness probes.
emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/time", fetchRunMillis)); | ||
emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/count", bootstrapSegments.size())); | ||
emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/time", fetchRunMillis)); | ||
emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/count", bootstrapSegments.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Non blocker] It would be nice to add the dataSource
dimension to this metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I will save this for a future patch
Problem
Currently, broadcast segments are assigned to data servers by the coordinator. This usual route alone can cause data correctness issues because if a process (either a task or server) is using broadcast segments for ingestion or to serve queries, the ingestion or query results can be invalid due to the asynchronous nature of segment assignment and loading.
Description
This PR builds on top of #16475.
To address this data correctness problem, a process when it starts up attempts to fetch and load any broadcast (bootstrap) segments from the coordinator first before it can proceed. If there are any errors in this bootstrapping process talking to the coordinator, the process just comes up in a "fail open" state and doesn't block start up.
Changes:
/v1/metadata/bootstrapSegments
in the coordinator that returns bootstrap segments. Currently, the set of bootstrap segments only contains broadcast segments. In the future, we can expand on this mechanism to speed up historical upgrades as well.SegmentLoadDropHandler
. The handler then loads any previously cached segments along with the bootstrap segments onto its storage location.Misc changes:
@JacksonInject PruneSpecsHolder
argument in theLoadableDataSegment
constructor. It always defaults toPruneSpecsHolder.DEFAULT
and doesn't depend on the injected valueLocalDataSegmentPuller
DruidException
Future changes
Release note
Added a coordinator API
POST /v1/metadata/bootstrapSegments
that returns the set of bootstrap segments. Currently, the set only contains broadcast segments. When a process (either a server or task) comes up, it attempts to query the coordinator to fetch all bootstrap segments and loads them before the process can proceed doing its thing. This primarily addresses a data correctness issue where a process doesn't have broadcast segments assigned yet and starts processing data. In the event of any errors, the process just comes up in a fail open state and doesn't block start up.To this effect, new metrics have been added in the bootstrap segments flow:
segment/bootstrap/time
: Total time taken to fetch bootstrap segments from the coordinatorsegment/bootstrap/count
: Total count of bootstrap segments fetched from the coordinatorThis PR has: