[Fix-10854] Fix database restart may lost task instance status#10866
Merged
ruanwenjun merged 2 commits intoapache:devfrom Jul 11, 2022
Merged
[Fix-10854] Fix database restart may lost task instance status#10866ruanwenjun merged 2 commits intoapache:devfrom
ruanwenjun merged 2 commits intoapache:devfrom
Conversation
25fb177 to
9d475fe
Compare
Codecov Report
@@ Coverage Diff @@
## dev #10866 +/- ##
============================================
- Coverage 40.60% 40.37% -0.23%
- Complexity 4829 4843 +14
============================================
Files 915 933 +18
Lines 36436 36759 +323
Branches 4000 4025 +25
============================================
+ Hits 14794 14842 +48
- Misses 20160 20433 +273
- Partials 1482 1484 +2
Continue to review full report at Codecov.
|
350de5b to
b007369
Compare
b007369 to
dd7d41f
Compare
Member
Author
|
@caishunfeng This PR is ready to review, please take a look. |
|
SonarCloud Quality Gate failed. |
caishunfeng
approved these changes
Jul 11, 2022
Contributor
caishunfeng
left a comment
There was a problem hiding this comment.
LGTM overall, some nip.
| if (!taskInstanceOptional.isPresent()) { | ||
| sendAckToWorker(taskEvent); | ||
| throw new TaskEventHandleError( | ||
| "Handle task result event error, cannot find the taskInstance from cache, will discord this event"); |
Contributor
There was a problem hiding this comment.
Suggested change
| "Handle task result event error, cannot find the taskInstance from cache, will discord this event"); | |
| "Handle task result event error, cannot find the taskInstance from cache, will discare this event"); |
|
|
||
| package org.apache.dolphinscheduler.server.master.event; | ||
|
|
||
| public class WorkflowEventHandleException extends Exception { |
Contributor
There was a problem hiding this comment.
please add some comments
Member
Author
|
I will fix this in another PR |
ruanwenjun
added a commit
to ruanwenjun/dolphinscheduler
that referenced
this pull request
Jul 12, 2022
…correct issue (apache#17) * [Fix-10842] Fix master/worker failover will cause status incorrect (apache#10839) * Fix master failover will not update task instance status * Add some failover log * Fix worker failover will rerun task more than once * Fix workflowInstance failover may rerun already success taskInstance (cherry picked from commit 3f69ec8) * [Fix-10854] Fix database restart may lost task instance status (apache#10866) * Fix database update error doesn't rollback the task instance status * Fix database error may cause workflow dead with running status (cherry picked from commit f639a2e)
ruanwenjun
added a commit
that referenced
this pull request
Jul 19, 2022
* Fix database update error doesn't rollback the task instance status * Fix database error may cause workflow dead with running status (cherry picked from commit f639a2e)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.








I have tested this PR, and need to point out, that right now there are still exist some problems when the database restarts, we may still lose some status, this is caused by right now our state handle is not idempotent, we need to split the
statesuch when we finished, we may need to do many step, clear map, update db, xx, when we failed in a step, we will retry next time, and when we retry, we need to know we failed on which step, then just recover on this step.Purpose of the pull request
close #10854
Brief change log
Verify this pull request