Skip to content

[AMORO-2623] Avoid deadlock among TaskRuntime objects#2790

Merged
majin1102 merged 3 commits intoapache:masterfrom
majin1102:AMORO-2623
May 9, 2024
Merged

[AMORO-2623] Avoid deadlock among TaskRuntime objects#2790
majin1102 merged 3 commits intoapache:masterfrom
majin1102:AMORO-2623

Conversation

@majin1102
Copy link
Copy Markdown
Contributor

Why are the changes needed?

Close #2623.

Brief change log

  • avoid deadlock among TaskRuntimes

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@majin1102 majin1102 requested a review from zhoujinsong April 27, 2024 15:04
@majin1102
Copy link
Copy Markdown
Contributor Author

@link3280

Copy link
Copy Markdown
Contributor

@link3280 link3280 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @majin1102 , IIUC the main change is to check the process status and release tasks early right after every completed task (in TaskRuntime), but I don't get how it avoids the deadlock. Please take a look at the comments inline.

token = null;
threadId = -1;
});
owner.releaseResourcesIfNecessary();
Copy link
Copy Markdown
Contributor

@link3280 link3280 Apr 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate more on how does it avoid the deadlock. I suppose the deadlock would happen when 2 thread block on owner.acceptResult(this) before this line.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two threads are blocked inside cancelTasks within owner.acceptResult(this), so wrapping cancelTask within releaseResourcesIfNecessary and placing it outside owner.acceptResult(this) should indeed solve the deadlock issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfyu It sounds like what #2684 did.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two threads are blocked inside cancelTasks within owner.acceptResult(this), so wrapping cancelTask within releaseResourcesIfNecessary and placing it outside owner.acceptResult(this) should indeed solve the deadlock issue

Sorry for late replying.

Just like @rfyu said, place canceling All Tasks out of task and process lock could prevent all lock conflict cases involving canceling all tasks.

When two threads steping into owner.acceptResult(), one will wait for lock anyway, I don't quite get your question

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea now. The cancellation is now lock-free. No task lock is needed, thus no deadlock.

@link3280
Copy link
Copy Markdown
Contributor

link3280 commented May 7, 2024

ping @majin1102

Copy link
Copy Markdown
Contributor

@xxubai xxubai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between cancelTasks when 2 thread are failed at same time and releaseResourcesIfNecessary

@majin1102
Copy link
Copy Markdown
Contributor Author

majin1102 commented May 8, 2024

What's the difference between cancelTasks when 2 thread are failed at same time and releaseResourcesIfNecessary

If you mean the operation of canceling all tasks itself could lead to dead lock.

I think the answer is no, concurrent threads would step into task locks in the same order, which would not lead to any dead locks

Copy link
Copy Markdown
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@majin1102 majin1102 merged commit 1c5024c into apache:master May 9, 2024
@zhoujinsong zhoujinsong mentioned this pull request Jun 20, 2024
2 tasks
xxubai added a commit to xxubai/amoro that referenced this pull request Jul 17, 2024
* fix AMORO-2623

* spotless apply

---------

Co-authored-by: Xavier Bai <xuba@apache.org>
(cherry picked from commit 1c5024c)
(cherry picked from commit a0ee3f24d73fdaecd5b648785a54bb0db4c8942d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Optimizing process stuck after a few failures

5 participants