Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for bootstrap segments #16609

Merged
merged 28 commits into from
Jun 24, 2024

Conversation

abhishekrb19
Copy link
Contributor

@abhishekrb19 abhishekrb19 commented Jun 14, 2024

Problem

Currently, broadcast segments are assigned to data servers by the coordinator. This usual route alone can cause data correctness issues because if a process (either a task or server) is using broadcast segments for ingestion or to serve queries, the ingestion or query results can be invalid due to the asynchronous nature of segment assignment and loading.

Description

This PR builds on top of #16475.

To address this data correctness problem, a process when it starts up attempts to fetch and load any broadcast (bootstrap) segments from the coordinator first before it can proceed. If there are any errors in this bootstrapping process talking to the coordinator, the process just comes up in a "fail open" state and doesn't block start up.

Changes:

  • Add a new API /v1/metadata/bootstrapSegments in the coordinator that returns bootstrap segments. Currently, the set of bootstrap segments only contains broadcast segments. In the future, we can expand on this mechanism to speed up historical upgrades as well.
  • When a process starts up, it reaches out to the coordinator (if it can find one) to fetch the bootstrap segments. This is done in SegmentLoadDropHandler. The handler then loads any previously cached segments along with the bootstrap segments onto its storage location.

Misc changes:

  • Remove @JacksonInject PruneSpecsHolder argument in the LoadableDataSegment constructor. It always defaults to PruneSpecsHolder.DEFAULT and doesn't depend on the injected value
  • Remove an unused test in LocalDataSegmentPuller
  • Fix a typo and remove white spaces in DruidException

Future changes

  1. Add an optional fail close strategy for tasks via a task runtime config. By default, tasks will still fail open as implemented in this patch. Servers will only have the option to fail open.
  2. Optimize the loading of bootstrap segments for tasks. For example, compaction and kill tasks can skip loading any bootstrap segments, while MSQ ingest and query tasks can selectively load the ones specified in the SQL. Real-time tasks, similar to servers, should always load all bootstrap segments.
  3. Add more metrics in the existing bootstrapping flow for visibility. For example, count of cached segments loaded, how many have failed to load, total bootstrap load time, etc. Document all these metrics under a separate category in the user-facing docs.

Release note

Added a coordinator API POST /v1/metadata/bootstrapSegments that returns the set of bootstrap segments. Currently, the set only contains broadcast segments. When a process (either a server or task) comes up, it attempts to query the coordinator to fetch all bootstrap segments and loads them before the process can proceed doing its thing. This primarily addresses a data correctness issue where a process doesn't have broadcast segments assigned yet and starts processing data. In the event of any errors, the process just comes up in a fail open state and doesn't block start up.

To this effect, new metrics have been added in the bootstrap segments flow:

  1. segment/bootstrap/time: Total time taken to fetch bootstrap segments from the coordinator
  2. segment/bootstrap/count: Total count of bootstrap segments fetched from the coordinator

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

  - Adds a new API in the coordinator.
  - All processes that have storage locations configured (including tasks)
    talk to the coordinator if they can, and fetch bootstrap segments from it.
  - Then load the segments onto the segment cache as part of startup.
  - This addresses the segment bootstrapping logic required by processes before
    they can start serving queries or ingesting.

    This patch also lays the foundation to speed up upgrades.
@kfaraz
Copy link
Contributor

kfaraz commented Jun 21, 2024

Thanks for the feature, @abhishekrb19 ! It would be very useful for task/historical startup.
The changes look good to me for the most part, but there are some concerns that need to be addressed.

abhishekrb19 and others added 3 commits June 21, 2024 08:22
…entLoadDropHandler.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
@abhishekrb19
Copy link
Contributor Author

Thanks for the review, @kfaraz! I've responded to the comments and added code comments/tests where applicable for further clarification. Could you take another look?

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final nitpicks, rest looks good.

docs/api-reference/legacy-metadata-api.md Outdated Show resolved Hide resolved
bootstrapSegments = ImmutableList.copyOf(iterator);
try {
final BootstrapSegmentsInfo response =
FutureUtils.getUnchecked(coordinatorClient.fetchBootstrapSegments(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is going to block startup, should there be a timeout?

Copy link
Contributor Author

@abhishekrb19 abhishekrb19 Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this uses the CoordinatorClient injected here, which has a standard retry policy with a max of 6 retries (~3 seconds total wait time) on transient errors. So processes will come up after ~3 seconds if the failures are persistent. There's one test testLoadBootstrapSegmentsWhenExceptionThrown() in this PR that verifies this fail open strategy that the server just comes up when an error occurs talking to the coordinator during startup.

I plan to add an optional fail close strategy for tasks in a follow up, which will likely need a different retry policy than the standard retry policy in this patch. I will try adding some more scenarios to verify the different behaviors like startup doesn't block indefinitely, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. What about success scenarios, where a historical or task is busy loading a lot of segments and thus may not respond to liveness probes?

Copy link
Contributor Author

@abhishekrb19 abhishekrb19 Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no overall startup timeout. If this is an issue, particularly for tasks, we may want to adjust the bootstrapper such that it can at least respond to liveness probes.

emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/time", fetchRunMillis));
emitter.emit(new ServiceMetricEvent.Builder().setMetric("bootstrapSegments/fetch/count", bootstrapSegments.size()));
emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/time", fetchRunMillis));
emitter.emit(new ServiceMetricEvent.Builder().setMetric("segment/bootstrap/count", bootstrapSegments.size()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Non blocker] It would be nice to add the dataSource dimension to this metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will save this for a future patch

@abhishekrb19 abhishekrb19 merged commit 7463589 into apache:master Jun 24, 2024
87 checks passed
@abhishekrb19 abhishekrb19 deleted the bootstrap_segments_load_api branch June 24, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants