New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow coordinator to be configured to kill segments in future #10877
Allow coordinator to be configured to kill segments in future #10877
Conversation
/** | ||
* Calculate the {@link DateTime} that wil form the upper bound when looking for segments that are | ||
* eligible to be killed. If ignoreDurationToRetain is true, we have no upper bound and return a DateTime object | ||
* for 9999-12-31T23:59. This static date has to be used becuse the metasore is not comparing date objects, but rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* for 9999-12-31T23:59. This static date has to be used becuse the metasore is not comparing date objects, but rather | |
* for 9999-12-31T23:59. This static date has to be used because the metastore is not comparing date objects, but rather |
I have not read the change yet, but thought about what you've mentioned in the PR description. What do you think of following a similar pattern to what we do for json config properties. Essentially, introduce a new config property with the new default of I will try to look into this over the next 2 weeks |
I appreciate the feedback! Based on it, I made some changes to better reflect the way things work today. I did not create a new config, but I changed the default of this config to be |
Hey suneet, reaching out to see if you've had a chance to think more about this proposed change since we last talked |
Thanks for the reminder - this totally fell off my radar. I'll do a deeper dive into this on Friday |
@suneet-s Circling back to see if you are still interested in reviewing this one. I have merged master into the code within the last week so everything is up to date and functional. Thanks! |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
dont close |
This issue is no longer marked as stale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagging this under Release Notes since operators need to be aware of this feature while upgrading
* Completely removes information about unused segments whose end time is older than {@link #retainDuration} from now | ||
* from the metadata store. This action is called "to kill a segment". | ||
* Completely removes information about unused segments who have an interval end that comes before | ||
* now - {@link #retainDuration}. retainDuration can be a positive or negative duration, negative meaning the interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: metadata store part missing after rewrite.
DateTime getEndTimeUpperLimit() | ||
{ | ||
return ignoreRetainDuration | ||
? DateTimes.of(9999, 12, 31, 23, 59) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use DateTimes.CAN_COMPARE_AS_YEAR_MAX
instead? Might be worth renaming that property as well for readability purposes.
@@ -812,7 +812,8 @@ These Coordinator static configurations can be defined in the `coordinator/runti | |||
|`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all dataSources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true| | |||
|`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator should submit kill task for unused segments, that is, hard delete them from metadata store and deep storage. If set to true, then for all whitelisted dataSources (or optionally all), Coordinator will submit tasks periodically based on `period` specified. These kill tasks will delete all unused segments except for the last `durationToRetain` period. A whitelist can be set via dynamic configuration `killDataSourceWhitelist` described later.|true| | |||
|`druid.coordinator.kill.period`|How often to send kill tasks to the indexing service. Value must be greater than `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|P1D (1 Day)| | |||
|`druid.coordinator.kill.durationToRetain`| Do not kill unused segments in last `durationToRetain`, must be greater or equal to 0. Only applies and MUST be specified if kill is turned on.|`P90D`| | |||
|`druid.coordinator.kill.durationToRetain`|Only applies if you set `druid.coordinator.kill.on` to `true`. This value is ignored if `druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations must be a ISO8601 period. Druid will not kill unused segments whose interval end date is beyond `now - durationToRetain`. `durationToRetain` can be a negative ISO8601 period, which would result in `now - durationToRetain` to be in the future.|`P90D`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should introduce a concept of last used time which is a timestamp when the segment is used last. The last used time would be a state and can be updated whenever the segment used flag is set from true
to false
. The coordinator can use the last used time instead of the end time of the segment to determine whether the segment is eligible for cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jihoonson I like that idea. The current strategy always seemed odd to me. I see in the code where we mark segments in SqlSegmentsMetadataQuery and that would be easy to change. However, I'm unsure of how we'd safely handle the migration to having a new column in druid_segments table. If you have insight into that piece, I could work on an implementation plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I would like to avoid adding a new column in druid_segments
table if possible since it will complicate the migration process. We can perhaps add a new field in DataSegment
instead which is stored in the payload
column in the druid_segments
table. This will increase the cost of SqlSegmentsMetadataManager.getUnusedSegmentIntervals()
since it will no longer be able to use the index on the end date, but have to read each row, deserialize payload
, and check the last update time in it. I guess it would be worth to run some benchmark to see how expensive this would be. I guess it will be OK if this ends up a bit expensive as the coordinator doesn't do much work. But, if you think this is going to be too expensive, then I think we have to either add a new column in druid_segments
table or just stick to your change in this PR. If we are going to add a new column, we will have to document a right upgrade/downgrade path. Providing a script for auto migration would be nice, but not mandatory.
Another thing we should think about is what the last update time will be for existing unused segments as the new field will be missing for them. Perhaps we can keep the current behavior for the segments that the last update time is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that a schema update is a bit intrusive to the user and should be avoided in all but the most necessary instances. Which this doesn't seem to be since we have multiple known work-a-rounds to a solution including my current one and the one you propose. I'll do a little poking around to see what a payload change would look like and report back. I appreciate the thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remember @imply-cheddar mentioning something the other day about adding a field to one of the tables, and building a system to enable the migrations. I don't remember anything more than that, but maybe he can illuminate it for us.
At any rate the "update time" thing could be a different patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd be 100% okay with this larger change in the approach moving to another patch. I'm definitely interested to hear more of @imply-cheddar thoughts on what the new column would entail. @danprince1 is playing around with ideas regarding a new column in druid_segments as well. His idea is a last_modified_date
column. However, he is working on a whole separate thread regarding the coordinator polling of used segments, and making that more efficient by only polling what has changed. So the functionality of the column would probably be unrelated to a column regarding "last_used" date that this segment kill stuff could need, but perhaps there is collaboration opportunity on how to handle migrations so there is a standard process. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an ongoing PR by @AmatyaAvadhanula that adds a new field to the tasks table. There is going to be code, that runs in a separate thread during startup and does the migration behind the scenes.
https://github.com/apache/druid/pull/12404/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me. I like the adjustments to improve compatibility with the older configs. Thanks @capistrant!
@@ -812,7 +812,8 @@ These Coordinator static configurations can be defined in the `coordinator/runti | |||
|`druid.coordinator.kill.pendingSegments.on`|Boolean flag for whether or not the Coordinator clean up old entries in the `pendingSegments` table of metadata store. If set to true, Coordinator will check the created time of most recently complete task. If it doesn't exist, it finds the created time of the earliest running/pending/waiting tasks. Once the created time is found, then for all dataSources not in the `killPendingSegmentsSkipList` (see [Dynamic configuration](#dynamic-configuration)), Coordinator will ask the Overlord to clean up the entries 1 day or more older than the found created time in the `pendingSegments` table. This will be done periodically based on `druid.coordinator.period.indexingPeriod` specified.|true| | |||
|`druid.coordinator.kill.on`|Boolean flag for whether or not the Coordinator should submit kill task for unused segments, that is, hard delete them from metadata store and deep storage. If set to true, then for all whitelisted dataSources (or optionally all), Coordinator will submit tasks periodically based on `period` specified. These kill tasks will delete all unused segments except for the last `durationToRetain` period. A whitelist can be set via dynamic configuration `killDataSourceWhitelist` described later.|true| | |||
|`druid.coordinator.kill.period`|How often to send kill tasks to the indexing service. Value must be greater than `druid.coordinator.period.indexingPeriod`. Only applies if kill is turned on.|P1D (1 Day)| | |||
|`druid.coordinator.kill.durationToRetain`| Do not kill unused segments in last `durationToRetain`, must be greater or equal to 0. Only applies and MUST be specified if kill is turned on.|`P90D`| | |||
|`druid.coordinator.kill.durationToRetain`|Only applies if you set `druid.coordinator.kill.on` to `true`. This value is ignored if `druid.coordinator.kill.ignoreDurationToRetain` is `true`. Valid configurations must be a ISO8601 period. Druid will not kill unused segments whose interval end date is beyond `now - durationToRetain`. `durationToRetain` can be a negative ISO8601 period, which would result in `now - durationToRetain` to be in the future.|`P90D`| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remember @imply-cheddar mentioning something the other day about adding a field to one of the tables, and building a system to enable the migrations. I don't remember anything more than that, but maybe he can illuminate it for us.
At any rate the "update time" thing could be a different patch.
druid.coordinator.kill.durationToRetain can only be set to one value at a time. Wouldn't it make sense, in the spirit of this PR, to have a "+ or - my-ISO-period" option? I'm late to the game, but this PR allows us to use durationToRetain to kill past OR future segments. (Or, as mentioned in #12526 we can ignoreDurationToRetain, yolo.) It would be nice to have an AND option, ie, kill past and future segments. Ie, have a rolling window, keep a week back and a week ahead, other outlying data, go ahead and clean up. If someone is ingesting data that for whatever reason includes past and future, as it is now, they have to choose between positive and negative durationToRetain, and I guess handle the other option manually. |
Fixes #10876 .
Release Notes Rough Draft
Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting
druid.coordinator.kill.durationToRetain
to a negative period. For examplePT-24H
would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.A cluster operator can also disregard the
druid.coordinator.kill.durationToRetain
entirely by setting a new configuration,druid.coordinator.kill.ignoreDurationToRetain=true
. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.Description
I have marked this as requiring "Design Review" because the default config change has meaningful impact on end users who choose to use the kill feature of the coordinator. Current users will already have created their own overrides that will continue to function as expected. However, operators turning on this feature could experience surprise by the default so I want to make sure multiple committers approve of the approach.I won't remove the "design review" label, but I did make a change to bring the default behavior back in line with what is available in our releases today. The default value fordruid.coordinator.kill.durationToRetain
is now invalid again, it has just changed. SincePT-1S
is valid with my change, I made it so the default isnull
. This causes a similar precondition failure that is currently there today. Operators who already have an override of this config will see zero change in behavior.This PR allows cluster operators to configure their Coordinator to kill segments whose "end" date is in the future as compared to the time when the coordinator is executing kill logic.
The first change allows an operator to specify a negative duration. A negative duration subtracted from now will be a future date. This is what allows an operator to choose some duration into the future where they will let segments be killed.
A side effect of this is the invalid default of the past is no longer a good default.The idea of using a negative duration to force the operator to consciously choose a valid possible duration is no longer an option so I switched the default tonull
. This mimics the same behavior as the previous default.I changed the default toPT0s
and future users of the kill docs will have to read the documentation to know this value must be changed accordingly if they don't like the default.I also added a new configuration to the coordinator,
druid.coordinator.kill.ignoreDurationToRetain
. This configuration allows an operator to just tell druid that it does not need to care for the End time of candidates for killing. Any segment that is marked unused is a valid candidate that can be killed.This PR has:
Key changed/added classes in this PR
DruidCoordinatorConfig
KillUnusedSegments