Skip to content

Recover Workflow GC Logic#1181

Merged
junkaixue merged 13 commits intoapache:task-improvementfrom
NealSun96:nealsun/task-improvement-wf-gc
Jul 29, 2020
Merged

Recover Workflow GC Logic#1181
junkaixue merged 13 commits intoapache:task-improvementfrom
NealSun96:nealsun/task-improvement-wf-gc

Conversation

@NealSun96
Copy link
Contributor

@NealSun96 NealSun96 commented Jul 28, 2020

Issues

  • My PR addresses the following Helix issues and references them in the PR description:

Fixes #1075

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Previously, the workflow garbage collection logic was removed in fear of race conditions (PR: #803 ). As we see value in bringing back workflow garbage collection, we are getting rid of the race conditions and bringing back the logic.

Specifically, two items need to be completed:

  1. Avoid direct dependency of garbage collection stage on the cache. We can instead deliver the relevant content of the cache into the stage during sync time, and use it during async time. This way the race condition with cache is no longer an issue.
  2. We are getting rid of an unnecessary JobConfig write that serves no purpose to avoid possible race conditions.

Tests

  • The following tests are written for this issue:

testGetExpiredJobsFromCache, testWorkflowContextGarbageCollection(recovered from old PR), testWorkflowGarbageCollection

  • The following is the result of the "mvn test" command on the appropriate module:
[ERROR] Tests run: 1155, Failures: 3, Errors: 0, Skipped: 1, Time elapsed: 4,495.443 s <<< FAILURE! - in TestSuite
[ERROR] testEnableCompressionResource(org.apache.helix.integration.TestEnableCompression)  Time elapsed: 160.699 s  <<< FAILURE!
java.lang.AssertionError: expected:<true> but was:<false>
        at org.apache.helix.integration.TestEnableCompression.testEnableCompressionResource(TestEnableCompression.java:117)

[ERROR] testDisablingTopStateReplicaByDisablingInstance(org.apache.helix.integration.TestNoThrottleDisabledPartitions)  Time elapsed: 2.3 s  <<< FAILURE!
java.lang.AssertionError: expected:<false> but was:<true>
        at org.apache.helix.integration.TestNoThrottleDisabledPartitions.testDisablingTopStateReplicaByDisablingInstance(TestNoThrottleDisabledPartitions.java:97)

[ERROR] testDisableFullAuto(org.apache.helix.integration.TestDisablePartition)  Time elapsed: 5.599 s  <<< FAILURE!
java.lang.AssertionError: expected:<OFFLINE> but was:<LEADER>
        at org.apache.helix.integration.TestDisablePartition.testDisableFullAuto(TestDisablePartition.java:202)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR] org.apache.helix.integration.TestDisablePartition.testDisableFullAuto(org.apache.helix.integration.TestDisablePartition)
[ERROR]   Run 1: TestDisablePartition.testDisableFullAuto:202 expected:<OFFLINE> but was:<LEADER>
[INFO]   Run 2: PASS
[INFO] 
[ERROR]   TestEnableCompression.testEnableCompressionResource:117 expected:<true> but was:<false>
[ERROR]   TestNoThrottleDisabledPartitions.testDisablingTopStateReplicaByDisablingInstance:97 expected:<false> but was:<true>
[INFO] 
[ERROR] Tests run: 1154, Failures: 3, Errors: 0, Skipped: 1
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:15 h
[INFO] Finished at: 2020-07-29T01:25:06-07:00
[INFO] ------------------------------------------------------------------------

Rerun

[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.825 s - in TestSuite
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  43.271 s
[INFO] Finished at: 2020-07-29T09:37:30-07:00
[INFO] ------------------------------------------------------------------------

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@NealSun96
Copy link
Contributor Author

This PR is ready to be merged, approved by @alirezazamani
Final commit message:

Recover Workflow Garbage Collection Logic

Recover Workflow Garbage Collection Logic

@junkaixue junkaixue merged commit ddde2f0 into apache:task-improvement Jul 29, 2020
alirezazamani pushed a commit to alirezazamani/helix that referenced this pull request Jul 29, 2020
Recover Workflow Garbage Collection Logic
Recover Workflow Garbage Collection Logic
alirezazamani pushed a commit to alirezazamani/helix that referenced this pull request Aug 4, 2020
Recover Workflow Garbage Collection Logic
Recover Workflow Garbage Collection Logic
junkaixue pushed a commit that referenced this pull request Aug 4, 2020
Recover Workflow Garbage Collection Logic
Recover Workflow Garbage Collection Logic
junkaixue pushed a commit to junkaixue/helix that referenced this pull request Aug 11, 2020
Recover Workflow Garbage Collection Logic
Recover Workflow Garbage Collection Logic
huizhilu pushed a commit to huizhilu/helix that referenced this pull request Aug 16, 2020
Recover Workflow Garbage Collection Logic
Recover Workflow Garbage Collection Logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants