Skip to content

fix: SubProcess task may fall into an endless-loop by fault-tolerant#6063

Closed
reele wants to merge 3 commits intoapache:devfrom
reele:dev
Closed

fix: SubProcess task may fall into an endless-loop by fault-tolerant#6063
reele wants to merge 3 commits intoapache:devfrom
reele:dev

Conversation

@reele
Copy link
Contributor

@reele reele commented Aug 30, 2021

Purpose of the pull request

  • When sub-process task submitted with a finished state, the SubProcessTaskExecThread.waitTaskQuit() will return directly, and set task state with sub-process's state (even if the sub-process is running, such as READY_PAUSE/READY_STOP or others), so the sub-process-task will ended with an unfinished state (READY_PAUSE or READY_STOP),
    so the parent thread MasterExecThread will fall into an endless-loop.

Brief change log

SubProcessTaskExecThread.waitTaskQuit() should check subProcessInstance's state first:

  • If subProcessInstance is not null and its state is unfinished, it still enters the waiting loop.

Issue

#6062

@CalvinKirs CalvinKirs requested a review from lenboo August 30, 2021 08:07
@CalvinKirs CalvinKirs added the first time contributor First-time contributor label Aug 30, 2021
@CalvinKirs
Copy link
Member

hi, can you resolve the CI check Fail?

@reele
Copy link
Contributor Author

reele commented Sep 1, 2021

OK.

@codecov-commenter
Copy link

codecov-commenter commented Sep 1, 2021

Codecov Report

Merging #6063 (16b2f1a) into dev (dc85e1a) will decrease coverage by 0.00%.
The diff coverage is 50.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev    #6063      +/-   ##
============================================
- Coverage     46.36%   46.36%   -0.01%     
- Complexity     3733     3738       +5     
============================================
  Files           608      608              
  Lines         25062    25065       +3     
  Branches       2866     2867       +1     
============================================
+ Hits          11621    11622       +1     
- Misses        12295    12299       +4     
+ Partials       1146     1144       -2     
Impacted Files Coverage Δ
...server/master/runner/SubProcessTaskExecThread.java 49.36% <50.00%> (+11.20%) ⬆️
...r/plugin/registry/zookeeper/ZookeeperRegistry.java 39.00% <0.00%> (-5.68%) ⬇️
...dolphinscheduler/remote/future/ResponseFuture.java 81.35% <0.00%> (-1.70%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dc85e1a...16b2f1a. Read the comment docs.

@sonarqubecloud
Copy link

sonarqubecloud bot commented Sep 2, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

* (when recover a stopping instance, subProcessInstance may still in running by fault-tolerant)
*/
private boolean isTaskInstanceFinished() {
boolean isSubProcessFinished = (subProcessInstance == null || subProcessInstance.getState().typeIsFinished());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have already checked the sub process is not null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to ensure that subProcessInstance.getState() is valid, it may still be null at this position.

Copy link
Contributor Author

@reele reele Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After submitting the task by super.submit(), the subprocess will be submitted to the command list. and the subprocess has not been established.

*/
private void waitSubProcessEstablishment() throws InterruptedException {
if (subProcessInstance == null) {
while (Stopper.isRunning()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's better to merge this 'while' into the other.

Copy link
Contributor Author

@reele reele Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the beginning, I just inserted a condition, by '&&', it caused SonarCloud smells of cognitive complexity.
this segment is the most complicated in waitTaskQuit() (+3), so i cut it into a new method.
And the old segment caused another code smell, merge means a big change.

@lenboo
Copy link
Contributor

lenboo commented Sep 21, 2021

@reele Thank you very much for your contribution, the implementation of sub-process is reconstructed.
Would you please checkout the new code and check this problem exists?

@reele
Copy link
Contributor Author

reele commented Oct 16, 2021

well, because of the json-split is not completed, the depenent node still used 'id' to reference the process_id, i can't import the 6000 tasks' dag of dw to test the stability, i may try it at some other times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

first time contributor First-time contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants