Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable auto kill segments by default #12187

Merged
merged 11 commits into from
Feb 7, 2022
Merged

Conversation

suneet-s
Copy link
Contributor

@suneet-s suneet-s commented Jan 21, 2022

Description

To help prevent the metadata store from growing unbounded, enable kill unused segments by default on all datasources.

This is helpful when users have auto-compaction running or they age out data. The old segments are marked as unused but live on in the metadata store and in deep storage.

The new defaults should kill all unused segments older than 90 days. If users do not want this behavior on an upgrade, they should explicitly disable the behavior, or change the specific datasources they would want to run on.

killAllDataSources is removed and is no longer respected. This means users who are upgrading with kill.on set to true, but killAllDataSources set to false AND killDataSourceWhitelist set to empty, will start to see unused segments older than 90 days being killed after an upgrade that were not being deleted before the upgrade.

This seems like a reasonable risk - since it is a strange combination to have kill enabled via kill.on, but then disabled killing all datasources as described above. Based on this I think this is a reasonable thing to call out in the release notes to give operators a heads up of this edge case, since killing segments deletes them from deep storage as well.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Comment on lines 321 to 323
return killUnusedSegmentsInAllDataSources != null
? killUnusedSegmentsInAllDataSources
: specificDataSourcesToKillUnusedSegmentsIn.isEmpty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified as?

Suggested change
return killUnusedSegmentsInAllDataSources != null
? killUnusedSegmentsInAllDataSources
: specificDataSourcesToKillUnusedSegmentsIn.isEmpty();
return specificDataSourcesToKillUnusedSegmentsIn.isEmpty();

The method returns true when

  • either killUnusedSegmentsInAllDataSources = true which implies that specifiedDataSourcesToKillUnusedSegmentsIn is empty, otherwise we would have thrown an exception
  • or killUnusedSegmentsInAllDataSources = null and specifiedDataSourcesToKillUnusedSegmentsIn is empty

With these new defaults, maybe we don't even need killAll anymore?
If killList is empty, we kill all
Otherwise, we kill only the ones specified.
Unless I am missing a case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. This should simplify the user experience. I will do this

@@ -65,7 +65,8 @@
/**
* If true {@link KillUnusedSegments} sends kill tasks for unused segments in all data sources.
*/
private final boolean killUnusedSegmentsInAllDataSources;
@Nullable
private final Boolean killUnusedSegmentsInAllDataSources;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we compute its default per the given killDataSourceWhitelist instead of making it nullable? It could simplify the code a bit.

@maytasm
Copy link
Contributor

maytasm commented Jan 27, 2022

Can you set druid_coordinator_kill_period=PT360000S in integration-tests/docker/environment-configs/coordinator too? Thanks

docs/configuration/index.md Outdated Show resolved Hide resolved
@suneet-s
Copy link
Contributor Author

suneet-s commented Feb 5, 2022

@kfaraz I should have removed all references to killUnusedSegmentsInAllDataSources in the docs and the code. It is now implicit. Can you take another look when you get a chance please?

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@suneet-s suneet-s merged commit ced1389 into apache:master Feb 7, 2022
@suneet-s suneet-s deleted the auto-kill branch February 7, 2022 14:57
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
@gianm
Copy link
Contributor

gianm commented Jun 22, 2022

IMO without the logic in #12526, enabling autokill by default is too risky. Marking segments unused can happen by mistake: someone might activate the "mark unused" functionality (via API or web console) for the wrong datasource, or the wrong interval. Then autokilling can turn that small mistake into a big mistake.

I raised a PR to set it back: #12693

abhishekrb19 added a commit to abhishekrb19/incubator-druid that referenced this pull request May 3, 2024
This parameter has been removed for awhile now as of Druid 0.23.0
apache#12187.

The code was only used in tests to verify that serialization works.
Now remove all references to avoid any confusion.
kfaraz pushed a commit that referenced this pull request May 5, 2024
…ces. (#16387)

This parameter has been removed for awhile now as of Druid 0.23.0
#12187.

The code was only used in tests to verify that serialization works.
Now remove all references to avoid any confusion.
IgorBerman pushed a commit to IgorBerman/druid that referenced this pull request May 5, 2024
…ces. (apache#16387)

This parameter has been removed for awhile now as of Druid 0.23.0
apache#12187.

The code was only used in tests to verify that serialization works.
Now remove all references to avoid any confusion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants