Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds cluster level idleConfig setting for supervisor #13311

Merged

Conversation

tejaswini-imply
Copy link
Member

@tejaswini-imply tejaswini-imply commented Nov 4, 2022

This PR enables cluster admin to define default idle config for all supervisors. The new cluster-level settings and their defaults are:

druid.supervisor.idleConfig.enabled = false
druid.supervisor.idleConfig.inactiveAfterMillis = 600_000

This configuration can be overridden on individual basis through ingestionSpec.ioConfig.idleConfig.


Key changed/added classes in this PR
  • SupervisorStateManagerConfig
  • SeekableStreamSupervisor

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.

docs/configuration/index.md Outdated Show resolved Hide resolved
@@ -46,6 +46,12 @@
@JsonProperty
private int maxStoredExceptionEvents = Math.max(unhealthinessThreshold, healthinessThreshold);

@JsonProperty("idleConfig.enabled")
private boolean idleConfigEnabled = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of flattening the spec, keep an idleConfig itself here and handle it correctly inside the SeekableStreamSupervisor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're passing 2 spec mappings druid.supervisor.idleConfig.enabled, druid.supervisor.idleConfig.inactiveAfterMillis rather than an idleConfig json object, I am not sure if it can be deserialised to idleConfig mapping.

@tejaswini-imply
Copy link
Member Author

Thanks for the review @kfaraz, I have addressed your comments in the latest commit. Please take a look.

docs/configuration/index.md Show resolved Hide resolved
@@ -804,6 +805,15 @@ public SeekableStreamSupervisor(
: Math.min(10, this.ioConfig.getTaskCount() * this.ioConfig.getReplicas()));
}

if (spec.getIoConfig().getIdleConfig() != null) {
idleConfig = spec.getIoConfig().getIdleConfig();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that users have only specified enabled/disabled at the supervisor level, but not the timeout value. Shouldn't we pick up the timeout value defined at the cluster level then?

website/.spelling Outdated Show resolved Hide resolved
@tejaswini-imply
Copy link
Member Author

I have addressed your comments in the latest commit @kfaraz. PTAL.

@@ -804,6 +805,23 @@ public SeekableStreamSupervisor(
: Math.min(10, this.ioConfig.getTaskCount() * this.ioConfig.getReplicas()));
}

IdleConfig specIdleConfig = spec.getIoConfig().getIdleConfig();
if (specIdleConfig != null) {
if (specIdleConfig.getInactiveAfterMillis() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we might still incorrectly handle the case where user has specified inactiveAfterMillis at the supervisor level but not enabled. In that case, we should just override the inactiveAfterMillis and take the cluster level value for enabled.

But I guess that's a corner case and we can do it in a later PR.

@kfaraz
Copy link
Contributor

kfaraz commented Nov 8, 2022

@tejaswini-imply , thanks for the fix. A couple of ITs seem to be failing consistently. Could you please check if the failures are genuine?

@tejaswini-imply
Copy link
Member Author

@kfaraz The transactional kafka index integration test seems to be flaky. It passed in the other run, and the changes have nothing to do with these tests.

@abhishekagarwal87 abhishekagarwal87 merged commit 594545d into apache:master Nov 8, 2022
@abhishekagarwal87
Copy link
Contributor

Merged since IT failure is unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants