Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task Framework IdealState Removal #1326

Merged
merged 8 commits into from
Sep 14, 2020

Conversation

NealSun96
Copy link
Contributor

@NealSun96 NealSun96 commented Aug 27, 2020

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

Fixes #1323, #1324

Description

  • Here are some details about my PR, including screenshots of any UI changes:

This PR removes IdealState usage from the task framework pipeline. Instead, now workflow resources are created directly using WorkflowConfig and JobConfig. As a result, the legacy pipeline logic in TaskSchedulingStage is removed: it was there for the case of "IdealState exists but not WorkflowConfig", which is no longer possible now.

After the removal of legacy pipeline logic, numerous unintended usage of the legacy pipeline were uncovered in the form of broken tests; these tests are addressed and fixed.

At the same time, 2 bugs in the pipeline were also uncovered and fixed: the first one is about tasks being incorrectly rejected; the second one is about negative scheduling delay not properly rejected.

Tests

  • The following is the result of the "mvn test" command on the appropriate module:
[ERROR] Tests run: 1173, Failures: 3, Errors: 0, Skipped: 1, Time elapsed: 4,390.976 s <<< FAILURE! - in TestSuite
[ERROR] testEnableCompressionResource(org.apache.helix.integration.TestEnableCompression)  Time elapsed: 149.639 s  <<< FAILURE!
java.lang.AssertionError: expected:<true> but was:<false>
        at org.apache.helix.integration.TestEnableCompression.testEnableCompressionResource(TestEnableCompression.java:117)

[ERROR] testStateTransitionTimeOut(org.apache.helix.integration.paticipant.TestStateTransitionTimeoutWithResource)  Time elapsed: 7.422 s  <<< FAILURE!
java.lang.NullPointerException
        at org.apache.helix.integration.paticipant.TestStateTransitionTimeoutWithResource.verify(TestStateTransitionTimeoutWithResource.java:209)
        at org.apache.helix.integration.paticipant.TestStateTransitionTimeoutWithResource.lambda$testStateTransitionTimeOut$0(TestStateTransitionTimeoutWithResource.java:173)
        at org.apache.helix.integration.paticipant.TestStateTransitionTimeoutWithResource.testStateTransitionTimeOut(TestStateTransitionTimeoutWithResource.java:173)

[ERROR] testPeriodicRefresh(org.apache.helix.integration.spectator.TestRoutingTableProviderPeriodicRefresh)  Time elapsed: 2.011 s  <<< FAILURE!
java.lang.AssertionError: expected:<4> but was:<3>
        at org.apache.helix.integration.spectator.TestRoutingTableProviderPeriodicRefresh.testPeriodicRefresh(TestRoutingTableProviderPeriodicRefresh.java:211)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestEnableCompression.testEnableCompressionResource:117 expected:<true> but was:<false>
[ERROR]   TestStateTransitionTimeoutWithResource.testStateTransitionTimeOut:173->lambda$testStateTransitionTimeOut$0:173->verify:209 NullPointer
[ERROR]   TestRoutingTableProviderPeriodicRefresh.testPeriodicRefresh:211 expected:<4> but was:<3>
[INFO] 
[ERROR] Tests run: 1173, Failures: 3, Errors: 0, Skipped: 1
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:13 h
[INFO] Finished at: 2020-09-14T13:33:37-07:00
[INFO] ------------------------------------------------------------------------

Rerun

[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 49.247 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  54.713 s
[INFO] Finished at: 2020-09-14T14:18:32-07:00
[INFO] ------------------------------------------------------------------------

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

https://github.com/apache/helix/wiki/Task-Framework-IdealState-Dependency-Removal-Progression

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@alirezazamani
Copy link

@NealSun96 Thanks for your PR. I think the PR is in good shape now. Can you please add an integration test for resource to drop? That can conclude this PR. In the meanwhile, I will review the code again.

Copy link

@alirezazamani alirezazamani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Please run mvn test multiple times just to make sure everything is in order.

Copy link
Contributor

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing the code, but here is one concerning point. Please double check. Thanks.

Copy link
Contributor

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Please address the remaining comments.
BTW, please remove helix-core/test-results.txt : )

Copy link
Contributor

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me.

But do you want to keep some test logic to verify that IS does NOT exist anymore even the jobs are running normally? Or do we already have a test case about it?

@NealSun96
Copy link
Contributor Author

Code looks good to me.

But do you want to keep some test logic to verify that IS does NOT exist anymore even the jobs are running normally? Or do we already have a test case about it?

After an offline discussion, we came to the conclusion that writing a test for feature removal is awkward and tricky to do right, and therefore it's not worth the effort at this moment.

@NealSun96
Copy link
Contributor Author

NealSun96 commented Sep 14, 2020

This PR is ready to be merged, approved by @jiajunwang
Final commit message:

Remove IdealState Dependency from Task Framework

This PR removes IdealState usage from Task Framework. The participant-side no longer creates IdealState when workflows/jobs are created. The controller-side no longer reads IdealState to create resources for Task Framework.

@alirezazamani alirezazamani merged commit 027481f into apache:master Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task Framework IdealState Removal
5 participants