Support for bootstrap segments #16609

abhishekrb19 · 2024-06-14T18:13:11Z

Problem

Currently, broadcast segments are assigned to data servers by the coordinator. This usual route alone can cause data correctness issues because if a process (either a task or server) is using broadcast segments for ingestion or to serve queries, the ingestion or query results can be invalid due to the asynchronous nature of segment assignment and loading.

Description

This PR builds on top of #16475.

To address this data correctness problem, a process when it starts up attempts to fetch and load any broadcast (bootstrap) segments from the coordinator first before it can proceed. If there are any errors in this bootstrapping process talking to the coordinator, the process just comes up in a "fail open" state and doesn't block start up.

Changes:

Add a new API /v1/metadata/bootstrapSegments in the coordinator that returns bootstrap segments. Currently, the set of bootstrap segments only contains broadcast segments. In the future, we can expand on this mechanism to speed up historical upgrades as well.
When a process starts up, it reaches out to the coordinator (if it can find one) to fetch the bootstrap segments. This is done in SegmentLoadDropHandler. The handler then loads any previously cached segments along with the bootstrap segments onto its storage location.

Misc changes:

Remove @JacksonInject PruneSpecsHolder argument in the LoadableDataSegment constructor. It always defaults to PruneSpecsHolder.DEFAULT and doesn't depend on the injected value
Remove an unused test in LocalDataSegmentPuller
Fix a typo and remove white spaces in DruidException

Future changes

Add an optional fail close strategy for tasks via a task runtime config. By default, tasks will still fail open as implemented in this patch. Servers will only have the option to fail open.
Optimize the loading of bootstrap segments for tasks. For example, compaction and kill tasks can skip loading any bootstrap segments, while MSQ ingest and query tasks can selectively load the ones specified in the SQL. Real-time tasks, similar to servers, should always load all bootstrap segments.
Add more metrics in the existing bootstrapping flow for visibility. For example, count of cached segments loaded, how many have failed to load, total bootstrap load time, etc. Document all these metrics under a separate category in the user-facing docs.

Release note

Added a coordinator API POST /v1/metadata/bootstrapSegments that returns the set of bootstrap segments. Currently, the set only contains broadcast segments. When a process (either a server or task) comes up, it attempts to query the coordinator to fetch all bootstrap segments and loads them before the process can proceed doing its thing. This primarily addresses a data correctness issue where a process doesn't have broadcast segments assigned yet and starts processing data. In the event of any errors, the process just comes up in a fail open state and doesn't block start up.

To this effect, new metrics have been added in the bootstrap segments flow:

segment/bootstrap/time: Total time taken to fetch bootstrap segments from the coordinator
segment/bootstrap/count: Total count of bootstrap segments fetched from the coordinator

This PR has:

- Adds a new API in the coordinator. - All processes that have storage locations configured (including tasks) talk to the coordinator if they can, and fetch bootstrap segments from it. - Then load the segments onto the segment cache as part of startup. - This addresses the segment bootstrapping logic required by processes before they can start serving queries or ingesting. This patch also lays the foundation to speed up upgrades.

The rules aren't evaluated if there are no clusters.

…dException.

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

server/src/main/java/org/apache/druid/client/coordinator/CoordinatorClientImpl.java

server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java

server/src/main/java/org/apache/druid/server/http/MetadataResource.java

server/src/main/java/org/apache/druid/client/coordinator/CoordinatorClient.java

server/src/main/java/org/apache/druid/rpc/indexing/OverlordClientImpl.java

kfaraz · 2024-06-21T14:22:48Z

Thanks for the feature, @abhishekrb19 ! It would be very useful for task/historical startup.
The changes look good to me for the most part, but there are some concerns that need to be addressed.

…entLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

…ensibility.

abhishekrb19 · 2024-06-22T02:53:36Z

Thanks for the review, @kfaraz! I've responded to the comments and added code comments/tests where applicable for further clarification. Could you take another look?

kfaraz

Final nitpicks, rest looks good.

docs/api-reference/legacy-metadata-api.md

server/src/main/java/org/apache/druid/client/BootstrapSegmentsInfo.java

kfaraz · 2024-06-22T03:24:03Z

server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java

-      bootstrapSegments = ImmutableList.copyOf(iterator);
+    try {
+      final BootstrapSegmentsInfo response =
+          FutureUtils.getUnchecked(coordinatorClient.fetchBootstrapSegments(), true);


Since this is going to block startup, should there be a timeout?

Yeah, this uses the CoordinatorClient injected here, which has a standard retry policy with a max of 6 retries (~3 seconds total wait time) on transient errors. So processes will come up after ~3 seconds if the failures are persistent. There's one test testLoadBootstrapSegmentsWhenExceptionThrown() in this PR that verifies this fail open strategy that the server just comes up when an error occurs talking to the coordinator during startup.

I plan to add an optional fail close strategy for tasks in a follow up, which will likely need a different retry policy than the standard retry policy in this patch. I will try adding some more scenarios to verify the different behaviors like startup doesn't block indefinitely, etc.

Sounds good. What about success scenarios, where a historical or task is busy loading a lot of segments and thus may not respond to liveness probes?

There is currently no overall startup timeout. If this is an issue, particularly for tasks, we may want to adjust the bootstrapper such that it can at least respond to liveness probes.

kfaraz · 2024-06-22T03:24:49Z

server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java

-      emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/time", fetchRunMillis));
-      emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/count", bootstrapSegments.size()));
+      emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/time", fetchRunMillis));
+      emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/count", bootstrapSegments.size()));


[Non blocker] It would be nice to add the dataSource dimension to this metric.

Good idea, I will save this for a future patch

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java

server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java

server/src/main/java/org/apache/druid/client/coordinator/CoordinatorClient.java

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java

server/src/main/java/org/apache/druid/client/BootstrapSegmentsInfo.java

abhishekrb19 added 4 commits June 14, 2024 10:47

Fail open by default if there are any errors talking to the coordinator.

a363511

Add test for failure scenario and cleanup logs.

97e9734

Cleanup and add debug log

a4ed4b1

abhishekrb19 force-pushed the bootstrap_segments_load_api branch from c9dcb02 to a4ed4b1 Compare June 17, 2024 15:05

abhishekrb19 added 2 commits June 17, 2024 09:47

Merge branch 'master' into bootstrap_segments_load_api

8ec621a

Assert the events so we know the list exactly.

7459a8a

abhishekrb19 force-pushed the bootstrap_segments_load_api branch from 9034d84 to 7459a8a Compare June 17, 2024 18:42

abhishekrb19 added 8 commits June 17, 2024 12:40

Revert RunRules test.

f0b874a

The rules aren't evaluated if there are no clusters.

Revert RunRulesTest too.

d2a1817

Merge branch 'master' into bootstrap_segments_load_api

e10565d

Remove debug info.

6a85f99

Make the API POST and update log.

a8316da

Fix up UTs.

cbd4c4e

Throw 503 from MetadataResource; clean up exception handling and Drui…

9086f88

…dException.

Remove unused logger, add verification of metrics and docs.

e090da9

github-actions bot added the Area - Documentation label Jun 18, 2024

abhishekrb19 requested review from cheddar, jon-wei and kfaraz June 18, 2024 14:56

Merge branch 'master' into bootstrap_segments_load_api

a88f6f6

jon-wei reviewed Jun 20, 2024

View reviewed changes

server/src/main/java/org/apache/druid/server/http/MetadataResource.java Outdated Show resolved Hide resolved

jon-wei approved these changes Jun 20, 2024

View reviewed changes

Update error message

e8e2a21

kfaraz reviewed Jun 21, 2024

View reviewed changes

kfaraz requested changes Jun 21, 2024

View reviewed changes

server/src/main/java/org/apache/druid/rpc/indexing/OverlordClientImpl.java Show resolved Hide resolved

abhishekrb19 and others added 3 commits June 21, 2024 08:22

Update server/src/main/java/org/apache/druid/server/coordination/Segm…

05dab3a

…entLoadDropHandler.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Apply suggestions from code review

98e2fa6

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Adjust test metric expectations with the rename.

3a26f0f

abhishekrb19 added 4 commits June 21, 2024 09:33

Add BootstrapSegmentResponse container in the response for future ext…

860d813

…ensibility.

Merge branch 'master' into bootstrap_segments_load_api

db1614b

Rename to BootstrapSegmentsInfo for internal consistency.

cf525fa

Remove unused log.

04793b6

abhishekrb19 force-pushed the bootstrap_segments_load_api branch from d746aee to 04793b6 Compare June 21, 2024 17:06

abhishekrb19 added 4 commits June 21, 2024 12:32

Use a member variable for broadcast segments instead of segmentAssigner.

b8680d8

Minor cleanup

01b81d8

Merge branch 'master' into bootstrap_segments_load_api

7bcf303

Add test for loadable bootstrap segments and clarify comment.

a6ad9fe

kfaraz reviewed Jun 22, 2024

View reviewed changes

Review suggestions.

43b8702

kfaraz approved these changes Jun 24, 2024

View reviewed changes

abhishekrb19 merged commit 7463589 into apache:master Jun 24, 2024
87 checks passed

abhishekrb19 deleted the bootstrap_segments_load_api branch June 24, 2024 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for bootstrap segments #16609

Support for bootstrap segments #16609

abhishekrb19 commented Jun 14, 2024 •

edited

Loading

kfaraz commented Jun 21, 2024

abhishekrb19 commented Jun 22, 2024

kfaraz left a comment

kfaraz Jun 22, 2024

abhishekrb19 Jun 22, 2024 •

edited

Loading

kfaraz Jun 24, 2024

abhishekrb19 Jun 24, 2024 •

edited

Loading

kfaraz Jun 22, 2024

abhishekrb19 Jun 22, 2024

Support for bootstrap segments #16609

Support for bootstrap segments #16609

Conversation

abhishekrb19 commented Jun 14, 2024 • edited Loading

Problem

Description

Future changes

Release note

kfaraz commented Jun 21, 2024

abhishekrb19 commented Jun 22, 2024

kfaraz left a comment

Choose a reason for hiding this comment

kfaraz Jun 22, 2024

Choose a reason for hiding this comment

abhishekrb19 Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

kfaraz Jun 24, 2024

Choose a reason for hiding this comment

abhishekrb19 Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

kfaraz Jun 22, 2024

Choose a reason for hiding this comment

abhishekrb19 Jun 22, 2024

Choose a reason for hiding this comment

abhishekrb19 commented Jun 14, 2024 •

edited

Loading

abhishekrb19 Jun 22, 2024 •

edited

Loading

abhishekrb19 Jun 24, 2024 •

edited

Loading