Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow coordinator to be configured to kill segments in future #10877

Merged
merged 36 commits into from May 11, 2022

Conversation

capistrant
Copy link
Contributor

@capistrant capistrant commented Feb 10, 2021

Fixes #10876 .

Release Notes Rough Draft

Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.

A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.

Description

I have marked this as requiring "Design Review" because the default config change has meaningful impact on end users who choose to use the kill feature of the coordinator. Current users will already have created their own overrides that will continue to function as expected. However, operators turning on this feature could experience surprise by the default so I want to make sure multiple committers approve of the approach.

I won't remove the "design review" label, but I did make a change to bring the default behavior back in line with what is available in our releases today. The default value for druid.coordinator.kill.durationToRetain is now invalid again, it has just changed. Since PT-1S is valid with my change, I made it so the default is null. This causes a similar precondition failure that is currently there today. Operators who already have an override of this config will see zero change in behavior.

This PR allows cluster operators to configure their Coordinator to kill segments whose "end" date is in the future as compared to the time when the coordinator is executing kill logic.

The first change allows an operator to specify a negative duration. A negative duration subtracted from now will be a future date. This is what allows an operator to choose some duration into the future where they will let segments be killed. A side effect of this is the invalid default of the past is no longer a good default. The idea of using a negative duration to force the operator to consciously choose a valid possible duration is no longer an option so I switched the default to null. This mimics the same behavior as the previous default. I changed the default to PT0s and future users of the kill docs will have to read the documentation to know this value must be changed accordingly if they don't like the default.

I also added a new configuration to the coordinator, druid.coordinator.kill.ignoreDurationToRetain. This configuration allows an operator to just tell druid that it does not need to care for the End time of candidates for killing. Any segment that is marked unused is a valid candidate that can be killed.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • DruidCoordinatorConfig
  • KillUnusedSegments

/**
* Calculate the {@link DateTime} that wil form the upper bound when looking for segments that are
* eligible to be killed. If ignoreDurationToRetain is true, we have no upper bound and return a DateTime object
* for 9999-12-31T23:59. This static date has to be used becuse the metasore is not comparing date objects, but rather

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* for 9999-12-31T23:59. This static date has to be used becuse the metasore is not comparing date objects, but rather
* for 9999-12-31T23:59. This static date has to be used because the metastore is not comparing date objects, but rather

@suneet-s
Copy link
Contributor

suneet-s commented Apr 8, 2021

I have not read the change yet, but thought about what you've mentioned in the PR description.

What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...

I will try to look into this over the next 2 weeks

@capistrant
Copy link
Contributor Author

capistrant commented Apr 8, 2021

I have not read the change yet, but thought about what you've mentioned in the PR description.

What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...

I will try to look into this over the next 2 weeks

I appreciate the feedback! Based on it, I made some changes to better reflect the way things work today. I did not create a new config, but I changed the default of this config to be null. null is invalid and will cause an exception, like the negative Duration does in master today. This way, when an operator turns on kill for the first time, they will have the same experience as they do today if they do not explicitly add the durationToRetain config. Current users of the kill features will also see no difference since they have already added their explicit override of the config which will behave the same with my code as it currently does in master.

@capistrant
Copy link
Contributor Author

I have not read the change yet, but thought about what you've mentioned in the PR description.

What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...

I will try to look into this over the next 2 weeks

Hey suneet, reaching out to see if you've had a chance to think more about this proposed change since we last talked

@suneet-s
Copy link
Contributor

I have not read the change yet, but thought about what you've mentioned in the PR description.
What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of PT0S. And add code that validates the config that only one of the old and new property are set so that we can resolve this behavior. This way, operators can opt in to the functionality you're providing here without being forced to make a config change on an upgrade - just food for thought...
I will try to look into this over the next 2 weeks

Hey suneet, reaching out to see if you've had a chance to think more about this proposed change since we last talked

Thanks for the reminder - this totally fell off my radar. I'll do a deeper dive into this on Friday

@capistrant
Copy link
Contributor Author

@suneet-s Circling back to see if you are still interested in reviewing this one. I have merged master into the code within the last week so everything is up to date and functional. Thanks!

@stale
Copy link

stale bot commented Apr 19, 2022

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Apr 19, 2022
@capistrant
Copy link
Contributor Author

dont close

@stale
Copy link

stale bot commented Apr 19, 2022

This issue is no longer marked as stale.

@stale stale bot removed the stale label Apr 19, 2022
Copy link
Contributor

@a2l007 a2l007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tagging this under Release Notes since operators need to be aware of this feature while upgrading

* Completely removes information about unused segments whose end time is older than {@link #retainDuration} from now
* from the metadata store. This action is called "to kill a segment".
* Completely removes information about unused segments who have an interval end that comes before
* now - {@link #retainDuration}. retainDuration can be a positive or negative duration, negative meaning the interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: metadata store part missing after rewrite.

DateTime getEndTimeUpperLimit()
{
return ignoreRetainDuration
? DateTimes.of(9999, 12, 31, 23, 59)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use DateTimes.CAN_COMPARE_AS_YEAR_MAX instead? Might be worth renaming that property as well for readability purposes.

@@ -812,7 +812,8 @@ These Coordinator static configurations can be defined in the `coordinator/runti
|`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all dataSources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true|
|`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator should submit kill task for unused segments, that is, hard delete them from metadata store and deep storage. If set to true, then for all whitelisted dataSources (or optionally all), Coordinator will submit tasks periodically based on `period` specified. These kill tasks will delete all unused segments except for the last `durationToRetain` period. A whitelist can be set via dynamic configuration `killDataSourceWhitelist` described later.|true|
|`druid.coordinator.kill.period`|How often to send kill tasks to the indexing service. Value must be greater than `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|P1D (1 Day)|
|`druid.coordinator.kill.durationToRetain`| Do not kill unused segments in last `durationToRetain`, must be greater or equal to 0. Only applies and MUST be specified if kill is turned on.|`P90D`|
|`druid.coordinator.kill.durationToRetain`|Only applies if you set `druid.coordinator.kill.on` to `true`. This value is ignored if `druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations must be a ISO8601 period. Druid will not kill unused segments whose interval end date is beyond `now - durationToRetain`. `durationToRetain` can be a negative ISO8601 period, which would result in `now - durationToRetain` to be in the future.|`P90D`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should introduce a concept of last used time which is a timestamp when the segment is used last. The last used time would be a state and can be updated whenever the segment used flag is set from true to false. The coordinator can use the last used time instead of the end time of the segment to determine whether the segment is eligible for cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson I like that idea. The current strategy always seemed odd to me. I see in the code where we mark segments in SqlSegmentsMetadataQuery and that would be easy to change. However, I'm unsure of how we'd safely handle the migration to having a new column in druid_segments table. If you have insight into that piece, I could work on an implementation plan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I would like to avoid adding a new column in druid_segments table if possible since it will complicate the migration process. We can perhaps add a new field in DataSegment instead which is stored in the payload column in the druid_segments table. This will increase the cost of SqlSegmentsMetadataManager.getUnusedSegmentIntervals() since it will no longer be able to use the index on the end date, but have to read each row, deserialize payload, and check the last update time in it. I guess it would be worth to run some benchmark to see how expensive this would be. I guess it will be OK if this ends up a bit expensive as the coordinator doesn't do much work. But, if you think this is going to be too expensive, then I think we have to either add a new column in druid_segments table or just stick to your change in this PR. If we are going to add a new column, we will have to document a right upgrade/downgrade path. Providing a script for auto migration would be nice, but not mandatory.

Another thing we should think about is what the last update time will be for existing unused segments as the new field will be missing for them. Perhaps we can keep the current behavior for the segments that the last update time is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that a schema update is a bit intrusive to the user and should be avoided in all but the most necessary instances. Which this doesn't seem to be since we have multiple known work-a-rounds to a solution including my current one and the one you propose. I'll do a little poking around to see what a payload change would look like and report back. I appreciate the thoughts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember @imply-cheddar mentioning something the other day about adding a field to one of the tables, and building a system to enable the migrations. I don't remember anything more than that, but maybe he can illuminate it for us.

At any rate the "update time" thing could be a different patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd be 100% okay with this larger change in the approach moving to another patch. I'm definitely interested to hear more of @imply-cheddar thoughts on what the new column would entail. @danprince1 is playing around with ideas regarding a new column in druid_segments as well. His idea is a last_modified_date column. However, he is working on a whole separate thread regarding the coordinator polling of used segments, and making that more efficient by only polling what has changed. So the functionality of the column would probably be unrelated to a column regarding "last_used" date that this segment kill stuff could need, but perhaps there is collaboration opportunity on how to handle migrations so there is a standard process. Just a thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an ongoing PR by @AmatyaAvadhanula that adds a new field to the tasks table. There is going to be code, that runs in a separate thread during startup and does the migration behind the scenes.
https://github.com/apache/druid/pull/12404/files

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me. I like the adjustments to improve compatibility with the older configs. Thanks @capistrant!

@@ -812,7 +812,8 @@ These Coordinator static configurations can be defined in the `coordinator/runti
|`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all dataSources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true|
|`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator should submit kill task for unused segments, that is, hard delete them from metadata store and deep storage. If set to true, then for all whitelisted dataSources (or optionally all), Coordinator will submit tasks periodically based on `period` specified. These kill tasks will delete all unused segments except for the last `durationToRetain` period. A whitelist can be set via dynamic configuration `killDataSourceWhitelist` described later.|true|
|`druid.coordinator.kill.period`|How often to send kill tasks to the indexing service. Value must be greater than `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|P1D (1 Day)|
|`druid.coordinator.kill.durationToRetain`| Do not kill unused segments in last `durationToRetain`, must be greater or equal to 0. Only applies and MUST be specified if kill is turned on.|`P90D`|
|`druid.coordinator.kill.durationToRetain`|Only applies if you set `druid.coordinator.kill.on` to `true`. This value is ignored if `druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations must be a ISO8601 period. Druid will not kill unused segments whose interval end date is beyond `now - durationToRetain`. `durationToRetain` can be a negative ISO8601 period, which would result in `now - durationToRetain` to be in the future.|`P90D`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I remember @imply-cheddar mentioning something the other day about adding a field to one of the tables, and building a system to enable the migrations. I don't remember anything more than that, but maybe he can illuminate it for us.

At any rate the "update time" thing could be a different patch.

@benkrug
Copy link
Contributor

benkrug commented May 19, 2022

druid.coordinator.kill.durationToRetain can only be set to one value at a time. Wouldn't it make sense, in the spirit of this PR, to have a "+ or - my-ISO-period" option?

I'm late to the game, but this PR allows us to use durationToRetain to kill past OR future segments. (Or, as mentioned in #12526 we can ignoreDurationToRetain, yolo.) It would be nice to have an AND option, ie, kill past and future segments. Ie, have a rolling window, keep a week back and a week ahead, other outlying data, go ahead and clean up. If someone is ingesting data that for whatever reason includes past and future, as it is now, they have to choose between positive and negative durationToRetain, and I guess handle the other option manually.

@abhishekagarwal87 abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Druid Coordinator automated segment kill utility should allow killing segments with future intervals
8 participants