Skip to content

Wire in the logic for TaskExecutor reset timeout parameter#2176

Closed
qqu0127 wants to merge 2 commits intoapache:masterfrom
qqu0127:reset-timeout
Closed

Wire in the logic for TaskExecutor reset timeout parameter#2176
qqu0127 wants to merge 2 commits intoapache:masterfrom
qqu0127:reset-timeout

Conversation

@qqu0127
Copy link
Contributor

@qqu0127 qqu0127 commented Jul 13, 2022

Issues

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    The timeout logic introduced in Make theadpool shutdown timeout configurable for the HelixTaskExecutor. #1920, failed to provide a client side entry point. As a followup, this PR address the issue by passing down the parameter from HelixManagerFactory down to HelixTaskExecutor so that helix users can specify.
    The parameter is encapsulated in HelixManagerProperty to enable client to specify value and pass down from HelixManagerFactory

Update:

We add zkAddr to HelixManagerProperty and parameter validation to resolve param conflicts. Rollback the new constructor in HelixManagerFactory.
Validations in two places:

  1. HelixManagerProperty construction, at most one of {zkAddr and ZkConnectionConfig} shall be set. They can both be empty though, a warning is logged but won't block.
  2. (Existing logic) ZKHelixManager:validateZkConnectionSettings, if ZkConnectionConfig is set, zkAddress from ZKHelixManager constructor must be null, otherwise early stop.

With above two checks, we can ensure: if ZkConnectionConfig is set => the zkAddress must be null, either from HelixManagerProperty or from ZKHelixManager.
We set higher precedence for zkAddr from HelixManagerProperty, if this is empty, zkAddress passed from constructor is used.

Tests

  • The following tests are written for this issue:
    Modified existing test case in TestParticipantManager.java

  • The following is the result of the "mvn test" command on the appropriate module:

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

Pass down the parameter from HelixManagerFactorye down to
HelixTaskExecutor so that helix users can specify
@qqu0127 qqu0127 marked this pull request as ready for review July 13, 2022 21:29
Copy link
Contributor

@NealSun96 NealSun96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, I don't think HelixManagerProperty is the best place to store the reset timeout - the problem is not with your design, but unfortunately our existing logic will cause confusions to customers; without changing the existing logic, that's not the best place.

The customer did talk about storing the reset timeout in ClusterConfig, and I think that's a better choice. Can we load the config automatically from ClusterConfig, and not modify the HelixManager initialization?

@mgao0
Copy link
Contributor

mgao0 commented Jul 14, 2022

You'll need to modify more logic to enable the constructor with both zk address and helix manager property. Please take a look at this PR, its purpose was to add a constructor like that.

@qqu0127
Copy link
Contributor Author

qqu0127 commented Jul 14, 2022

You'll need to modify more logic to enable the constructor with both zk address and helix manager property. Please take a look at this PR, its purpose was to add a constructor like that.

Thanks @mgao0 for the pointer. Is this the same thing Neal mentioned? I feel if we make sure _zkConnectionConfig and _zkClientConfig are null in HelixManagerProperty, while a valid zkAddr is provided, we should be good to go. Could you please confirm?

@mgao0
Copy link
Contributor

mgao0 commented Jul 15, 2022

You'll need to modify more logic to enable the constructor with both zk address and helix manager property. Please take a look at this PR, its purpose was to add a constructor like that.

Thanks @mgao0 for the pointer. Is this the same thing Neal mentioned? I feel if we make sure _zkConnectionConfig and _zkClientConfig are null in HelixManagerProperty, while a valid zkAddr is provided, we should be good to go. Could you please confirm?

I saw your latest commit that added zkAddr to HelixManagerProperty, it makes sense.

@qqu0127 qqu0127 closed this Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable setting msgHandlerResetTimeout through HelixManagerFactory

4 participants