Allow coordinator to be configured to kill segments in future #10877

capistrant · 2021-02-10T23:53:31Z

Release Notes Rough Draft

Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.

A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.

Description

I have marked this as requiring "Design Review" because the default config change has meaningful impact on end users who choose to use the kill feature of the coordinator. Current users will already have created their own overrides that will continue to function as expected. However, operators turning on this feature could experience surprise by the default so I want to make sure multiple committers approve of the approach.

I won't remove the "design review" label, but I did make a change to bring the default behavior back in line with what is available in our releases today. The default value for druid.coordinator.kill.durationToRetain is now invalid again, it has just changed. Since PT-1S is valid with my change, I made it so the default is null. This causes a similar precondition failure that is currently there today. Operators who already have an override of this config will see zero change in behavior.

This PR allows cluster operators to configure their Coordinator to kill segments whose "end" date is in the future as compared to the time when the coordinator is executing kill logic.

The first change allows an operator to specify a negative duration. A negative duration subtracted from now will be a future date. This is what allows an operator to choose some duration into the future where they will let segments be killed. ~~A side effect of this is the invalid default of the past is no longer a good default.~~ The idea of using a negative duration to force the operator to consciously choose a valid possible duration is no longer an option so I switched the default to null. This mimics the same behavior as the previous default. ~~I changed the default to PT0s and future users of the kill docs will have to read the documentation to know this value must be changed accordingly if they don't like the default.~~

I also added a new configuration to the coordinator, druid.coordinator.kill.ignoreDurationToRetain. This configuration allows an operator to just tell druid that it does not need to care for the End time of candidates for killing. Any segment that is marked unused is a valid candidate that can be killed.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

Key changed/added classes in this PR

DruidCoordinatorConfig
KillUnusedSegments

…in is ignored

danprince1 · 2021-02-11T21:05:23Z

server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java

+  /**
+   * Calculate the {@link DateTime} that wil form the upper bound when looking for segments that are
+   * eligible to be killed. If ignoreDurationToRetain is true, we have no upper bound and return a DateTime object
+   * for 9999-12-31T23:59. This static date has to be used becuse the metasore is not comparing date objects, but rather


Suggested change

* for 9999-12-31T23:59. This static date has to be used becuse the metasore is not comparing date objects, but rather

* for 9999-12-31T23:59. This static date has to be used because the metastore is not comparing date objects, but rather

suneet-s · 2021-04-08T00:30:05Z

I have not read the change yet, but thought about what you've mentioned in the PR description.

What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...

I will try to look into this over the next 2 weeks

…or before this PR

capistrant · 2021-04-08T19:53:17Z

I have not read the change yet, but thought about what you've mentioned in the PR description.

What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...

I will try to look into this over the next 2 weeks

I appreciate the feedback! Based on it, I made some changes to better reflect the way things work today. I did not create a new config, but I changed the default of this config to be null. null is invalid and will cause an exception, like the negative Duration does in master today. This way, when an operator turns on kill for the first time, they will have the same experience as they do today if they do not explicitly add the durationToRetain config. Current users of the kill features will also see no difference since they have already added their explicit override of the config which will behave the same with my code as it currently does in master.

capistrant · 2021-05-12T21:33:30Z

I have not read the change yet, but thought about what you've mentioned in the PR description.

What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...

I will try to look into this over the next 2 weeks

Hey suneet, reaching out to see if you've had a chance to think more about this proposed change since we last talked

suneet-s · 2021-05-12T21:39:09Z

I have not read the change yet, but thought about what you've mentioned in the PR description.
What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...
I will try to look into this over the next 2 weeks

Hey suneet, reaching out to see if you've had a chance to think more about this proposed change since we last talked

Thanks for the reminder - this totally fell off my radar. I'll do a deeper dive into this on Friday

capistrant · 2021-11-17T21:30:16Z

@suneet-s Circling back to see if you are still interested in reviewing this one. I have merged master into the code within the last week so everything is up to date and functional. Thanks!

stale · 2022-04-19T05:56:01Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

capistrant · 2022-04-19T21:07:10Z

dont close

stale · 2022-04-19T21:07:12Z

This issue is no longer marked as stale.

…ew defaults

a2l007

Tagging this under Release Notes since operators need to be aware of this feature while upgrading

a2l007 · 2022-05-03T02:04:18Z

server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java

- * Completely removes information about unused segments whose end time is older than {@link #retainDuration} from now
- * from the metadata store. This action is called "to kill a segment".
+ * Completely removes information about unused segments who have an interval end that comes before
+ * now - {@link #retainDuration}. retainDuration can be a positive or negative duration, negative meaning the interval


nit: metadata store part missing after rewrite.

a2l007 · 2022-05-03T02:20:16Z

server/src/main/java/org/apache/druid/server/coordinator/duty/KillUnusedSegments.java

+  DateTime getEndTimeUpperLimit()
+  {
+    return ignoreRetainDuration
+           ? DateTimes.of(9999, 12, 31, 23, 59)


Can we use DateTimes.CAN_COMPARE_AS_YEAR_MAX instead? Might be worth renaming that property as well for readability purposes.

jihoonson · 2022-05-03T18:45:10Z

docs/configuration/index.md

@@ -812,7 +812,8 @@ These Coordinator static configurations can be defined in the `coordinator/runti
 |`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all dataSources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true|
 |`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator should submit kill task for unused segments, that is, hard delete them from metadata store and deep storage. If set to true, then for all whitelisted dataSources (or optionally all), Coordinator will submit tasks periodically based on `period` specified. These kill tasks will delete all unused segments except for the last `durationToRetain` period. A whitelist can be set via dynamic configuration `killDataSourceWhitelist` described later.|true|
 |`druid.coordinator.kill.period`|How often to send kill tasks to the indexing service. Value must be greater than `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|P1D (1 Day)|
-|`druid.coordinator.kill.durationToRetain`| Do not kill unused segments in last `durationToRetain`, must be greater or equal to 0. Only applies and MUST be specified if kill is turned on.|`P90D`|
+|`druid.coordinator.kill.durationToRetain`|Only applies if you set `druid.coordinator.kill.on` to `true`. This value is ignored if `druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations must be a ISO8601 period. Druid will not kill unused segments whose interval end date is beyond `now - durationToRetain`. `durationToRetain` can be a negative ISO8601 period, which would result in `now - durationToRetain` to be in the future.|`P90D`|


I wonder if we should introduce a concept of last used time which is a timestamp when the segment is used last. The last used time would be a state and can be updated whenever the segment used flag is set from true to false. The coordinator can use the last used time instead of the end time of the segment to determine whether the segment is eligible for cleanup.

@jihoonson I like that idea. The current strategy always seemed odd to me. I see in the code where we mark segments in SqlSegmentsMetadataQuery and that would be easy to change. However, I'm unsure of how we'd safely handle the migration to having a new column in druid_segments table. If you have insight into that piece, I could work on an implementation plan

Good question. I would like to avoid adding a new column in druid_segments table if possible since it will complicate the migration process. We can perhaps add a new field in DataSegment instead which is stored in the payload column in the druid_segments table. This will increase the cost of SqlSegmentsMetadataManager.getUnusedSegmentIntervals() since it will no longer be able to use the index on the end date, but have to read each row, deserialize payload, and check the last update time in it. I guess it would be worth to run some benchmark to see how expensive this would be. I guess it will be OK if this ends up a bit expensive as the coordinator doesn't do much work. But, if you think this is going to be too expensive, then I think we have to either add a new column in druid_segments table or just stick to your change in this PR. If we are going to add a new column, we will have to document a right upgrade/downgrade path. Providing a script for auto migration would be nice, but not mandatory.

Another thing we should think about is what the last update time will be for existing unused segments as the new field will be missing for them. Perhaps we can keep the current behavior for the segments that the last update time is missing.

Agreed that a schema update is a bit intrusive to the user and should be avoided in all but the most necessary instances. Which this doesn't seem to be since we have multiple known work-a-rounds to a solution including my current one and the one you propose. I'll do a little poking around to see what a payload change would look like and report back. I appreciate the thoughts

I think I remember @imply-cheddar mentioning something the other day about adding a field to one of the tables, and building a system to enable the migrations. I don't remember anything more than that, but maybe he can illuminate it for us.

At any rate the "update time" thing could be a different patch.

Yes, I'd be 100% okay with this larger change in the approach moving to another patch. I'm definitely interested to hear more of @imply-cheddar thoughts on what the new column would entail. @danprince1 is playing around with ideas regarding a new column in druid_segments as well. His idea is a last_modified_date column. However, he is working on a whole separate thread regarding the coordinator polling of used segments, and making that more efficient by only polling what has changed. So the functionality of the column would probably be unrelated to a column regarding "last_used" date that this segment kill stuff could need, but perhaps there is collaboration opportunity on how to handle migrations so there is a standard process. Just a thought.

There is an ongoing PR by @AmatyaAvadhanula that adds a new field to the tasks table. There is going to be code, that runs in a separate thread during startup and does the migration behind the scenes.
https://github.com/apache/druid/pull/12404/files

gianm

This change looks good to me. I like the adjustments to improve compatibility with the older configs. Thanks @capistrant!

gianm · 2022-05-06T00:05:31Z

docs/configuration/index.md

@@ -812,7 +812,8 @@ These Coordinator static configurations can be defined in the `coordinator/runti
 |`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all dataSources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true|
 |`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator should submit kill task for unused segments, that is, hard delete them from metadata store and deep storage. If set to true, then for all whitelisted dataSources (or optionally all), Coordinator will submit tasks periodically based on `period` specified. These kill tasks will delete all unused segments except for the last `durationToRetain` period. A whitelist can be set via dynamic configuration `killDataSourceWhitelist` described later.|true|
 |`druid.coordinator.kill.period`|How often to send kill tasks to the indexing service. Value must be greater than `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|P1D (1 Day)|
-|`druid.coordinator.kill.durationToRetain`| Do not kill unused segments in last `durationToRetain`, must be greater or equal to 0. Only applies and MUST be specified if kill is turned on.|`P90D`|
+|`druid.coordinator.kill.durationToRetain`|Only applies if you set `druid.coordinator.kill.on` to `true`. This value is ignored if `druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations must be a ISO8601 period. Druid will not kill unused segments whose interval end date is beyond `now - durationToRetain`. `durationToRetain` can be a negative ISO8601 period, which would result in `now - durationToRetain` to be in the future.|`P90D`|


I think I remember @imply-cheddar mentioning something the other day about adding a field to one of the tables, and building a system to enable the migrations. I don't remember anything more than that, but maybe he can illuminate it for us.

At any rate the "update time" thing could be a different patch.

benkrug · 2022-05-19T19:14:03Z

druid.coordinator.kill.durationToRetain can only be set to one value at a time. Wouldn't it make sense, in the spirit of this PR, to have a "+ or - my-ISO-period" option?

I'm late to the game, but this PR allows us to use durationToRetain to kill past OR future segments. (Or, as mentioned in #12526 we can ignoreDurationToRetain, yolo.) It would be nice to have an AND option, ie, kill past and future segments. Ie, have a rolling window, keep a week back and a week ahead, other outlying data, go ahead and clean up. If someone is ingesting data that for whatever reason includes past and future, as it is now, they have to choose between positive and negative durationToRetain, and I guess handle the other option manually.

capistrant added 2 commits February 10, 2021 17:42

Allow coordinator to be configured to kill segments in future

d2b7009

update KillUnusedSegments

23e50f7

capistrant added the Design Review label Feb 10, 2021

capistrant added 3 commits February 11, 2021 11:03

Add durationToRetain to dictionary

3e3261c

improve docs

efe29f9

change the date used as upper bound in jdbc query when durationToReta…

a6a518f

…in is ignored

danprince1 approved these changes Feb 11, 2021

View reviewed changes

capistrant added 2 commits February 11, 2021 15:25

docs fixes

c451541

Merge branch 'master' into implements-10876

3b6df20

capistrant added 2 commits April 8, 2021 12:53

Merge branch 'master' into implements-10876

8a3ac3a

Improve how the system deafult works to keep things similar to behavi…

f0fd679

…or before this PR

capistrant added the Area - Segment Balancing/Coordination label Apr 27, 2021

capistrant added 11 commits April 27, 2021 16:33

Merge branch 'master' into implements-10876

0a57e2e

fixup some tests that were missed during merge with master

97f4ce0

Merge branch 'master' into implements-10876

265687d

fixup some tests after merge with master

3474f7e

Merge branch 'master' into implements-10876

1cc1e79

Merge branch 'master' into implements-10876

be71340

reconcile some compile errors after merging master into branch

48e884b

Merge branch 'master' into implements-10876

aa37b06

reconcile some compile errors after merge with master

b7c2c7c

Merge branch 'master' into implements-10876

f5bb80f

reconcile compile issues after merge with master

e2b71ac

capistrant added 3 commits June 15, 2021 14:13

Merge branch 'master' into implements-10876

89d9357

Merge branch 'master' into implements-10876

725404d

Merge branch 'master' into implements-10876

c9f1153

capistrant added 6 commits July 28, 2021 11:57

Merge branch 'master' into implements-10876

1bf4b4d

Improve config docs

1d3b246

further clarification on the docs

575b24e

fix some config bugs in code and improve tests

5dee283

Merge branch 'master' into implements-10876

ffca0bb

Merge branch 'master' into implements-10876

ac3bb7e

Merge branch 'master' into implements-10876

39a6398

stale bot added the stale label Apr 19, 2022

stale bot removed the stale label Apr 19, 2022

capistrant added 2 commits April 27, 2022 15:04

Merge branch 'master' into implements-10876

4ac8f0c

reconciliations post merge with master to bring my code inline with n…

c67b98a

…ew defaults

capistrant removed the Design Review label Apr 27, 2022

capistrant added 3 commits April 28, 2022 11:13

fixes in response to TravisCI

007b193

Fix docs

20446dd

correct maxSegments config descrption in docs

747c21e

a2l007 reviewed May 3, 2022

View reviewed changes

a2l007 added the Release Notes label May 3, 2022

Improvements/fixes after code review

48aab49

jihoonson reviewed May 3, 2022

View reviewed changes

gianm approved these changes May 6, 2022

View reviewed changes

abhishekagarwal87 merged commit deb69d1 into apache:master May 11, 2022

capistrant mentioned this pull request May 16, 2022

Enhance the logic used by the Coordinator when determining if an unused segment can be killed (permanently deleted) #12526

Closed

abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow coordinator to be configured to kill segments in future #10877

Allow coordinator to be configured to kill segments in future #10877

capistrant commented Feb 10, 2021 •

edited

danprince1 Feb 11, 2021

suneet-s commented Apr 8, 2021

capistrant commented Apr 8, 2021 •

edited

capistrant commented May 12, 2021

suneet-s commented May 12, 2021

capistrant commented Nov 17, 2021

stale bot commented Apr 19, 2022

capistrant commented Apr 19, 2022

stale bot commented Apr 19, 2022

a2l007 left a comment

a2l007 May 3, 2022

a2l007 May 3, 2022

jihoonson May 3, 2022

capistrant May 3, 2022

jihoonson May 3, 2022

capistrant May 4, 2022

gianm May 6, 2022

capistrant May 9, 2022

abhishekagarwal87 May 10, 2022

gianm left a comment

gianm May 6, 2022

benkrug commented May 19, 2022

	* for 9999-12-31T23:59. This static date has to be used becuse the metasore is not comparing date objects, but rather
	* for 9999-12-31T23:59. This static date has to be used because the metastore is not comparing date objects, but rather

Allow coordinator to be configured to kill segments in future #10877

Allow coordinator to be configured to kill segments in future #10877

Conversation

capistrant commented Feb 10, 2021 • edited

Release Notes Rough Draft

Description

Key changed/added classes in this PR

Choose a reason for hiding this comment

suneet-s commented Apr 8, 2021

capistrant commented Apr 8, 2021 • edited

capistrant commented May 12, 2021

suneet-s commented May 12, 2021

capistrant commented Nov 17, 2021

stale bot commented Apr 19, 2022

capistrant commented Apr 19, 2022

stale bot commented Apr 19, 2022

a2l007 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benkrug commented May 19, 2022

capistrant commented Feb 10, 2021 •

edited

capistrant commented Apr 8, 2021 •

edited