Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make batched segment sampling the default, minor cleanup of coordinator config #13391

Merged
merged 6 commits into from
Nov 21, 2022

Conversation

kfaraz
Copy link
Contributor

@kfaraz kfaraz commented Nov 20, 2022

Description

The batch segment sampling added in #11257 performs significantly better than the older method of sampling if there are a large number of used segments. It also avoid duplicate results in sampling.

Changes

  • Make batch segment sampling the default
  • Deprecate the property useBatchedSegmentSampler
  • Remove unused coordinator config druid.coordinator.loadqueuepeon.repeatDelay
  • Cleanup KillUnusedSegments
  • Simplify KillUnusedSegmentsTest, add better tests, remove redundant tests

Release note

Batch sampling has been made the default method for sampling segments during balancing as it performs significantly better than the alternative when there is a large number of used segments in the cluster.

The following have been deprecated and will be removed in future releases:

  • coordinator dynamic config useBatchedSegmentSampler
  • coordinator dynamic config percentOfSegmentsToConsiderPerMove
  • non-batch method of sampling segments used during coordinator duty BalanceSegments

The unused coordinator property druid.coordinator.loadqueuepeon.repeatDelay has been removed.
Use only druid.coordinator.loadqueuepeon.http.repeatDelay to configure repeat delay for the HTTP-based segment loading queue.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -115,7 +116,7 @@ public CoordinatorDynamicConfig(
@JsonProperty("mergeSegmentsLimit") int mergeSegmentsLimit,
@JsonProperty("maxSegmentsToMove") int maxSegmentsToMove,
@Deprecated @JsonProperty("percentOfSegmentsToConsiderPerMove") @Nullable Double percentOfSegmentsToConsiderPerMove,
@JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
@Deprecated @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is setting the default value of useBatchedSegmentSampler to true missing?

Copy link
Contributor Author

@kfaraz kfaraz Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, @rohangarg ! It was in a different patch.
Updated to use the default of true both when

  • deserializing using the constructor. This would mean that configs already stored in the DB with no explicit value of useBatchedSegmentSampler would now start using true.
  • creating/deserializing using the Builder. So newly created/updated configs would now use true.

Also updated tests to verify this.

{
@Test
public void testFindIntervalForKill()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test essentially just verifies the JodaUtils.umbrellaInterval() method and these cases are already being verified in JodaUtilsTest.
The required test cases from here have been merged into the other tests in this class.

@@ -115,7 +116,7 @@ public CoordinatorDynamicConfig(
@JsonProperty("mergeSegmentsLimit") int mergeSegmentsLimit,
@JsonProperty("maxSegmentsToMove") int maxSegmentsToMove,
@Deprecated @JsonProperty("percentOfSegmentsToConsiderPerMove") @Nullable Double percentOfSegmentsToConsiderPerMove,
@JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
@Deprecated @JsonProperty("useBatchedSegmentSampler") boolean useBatchedSegmentSampler,
Copy link
Contributor Author

@kfaraz kfaraz Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, @rohangarg ! It was in a different patch.
Updated to use the default of true both when

  • deserializing using the constructor. This would mean that configs already stored in the DB with no explicit value of useBatchedSegmentSampler would now start using true.
  • creating/deserializing using the Builder. So newly created/updated configs would now use true.

Also updated tests to verify this.

@kfaraz kfaraz added this to the 25.0 milestone Nov 21, 2022
@AmatyaAvadhanula
Copy link
Contributor

I think the default value for useBatchedSegmentSampler must also be changed in coordinator-dynamic-config.tsx

Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @kfaraz.

Copy link
Member

@rohangarg rohangarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the changes

@kfaraz
Copy link
Contributor Author

kfaraz commented Nov 21, 2022

Thanks for the reviews, @rohangarg , @AmatyaAvadhanula !

@kfaraz kfaraz changed the title Make reservoir segment sampling the default, minor cleanup of coordinator config Make batched segment sampling the default, minor cleanup of coordinator config Nov 21, 2022
@kfaraz kfaraz merged commit 133054b into apache:master Nov 21, 2022
@kfaraz kfaraz deleted the cleanup_coord_config branch May 3, 2023 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants