[bug](Fe) fix potential deadlock in show proc statement by xy720 · Pull Request #34988 · apache/doris

xy720 · 2024-05-16T19:36:44Z

Proposed changes

Issue Number: close #xxx

version: 1.2|2.0|2.1

We have encountered Fe hang and the number connections reach limits.
Unfortunately, I restarted FE without Jstack the Fe process.

But I found a lot of logs like:

2024-05-11 14:09:32,533 WARN (thrift-server-pool-424042|4684871) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:32,545 WARN (thrift-server-pool-392529|4367180) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:32,663 WARN (thrift-server-pool-421895|4663462) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:32,816 WARN (thrift-server-pool-379243|4167917) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101
2024-05-11 14:09:33,531 WARN (thrift-server-pool-420797|4652941) [ReportHandler.putToQueue():190] the report queue size exceeds the limit: 100.
 current: 101

The tablet report task is hang:

2024-05-11 14:07:57,671 INFO (Thread-57|105) [ReportHandler.tabletReport():250] backend[10009] reports 28982 tablet(s). repor
t version: 17094493386232
2024-05-11 14:24:15,323 INFO (Thread-57|105) [TabletInvertedIndex.tabletReport():308] finished to do tablet diff with backend[10009].
sync: 0. metaDel: 7. foundInMeta: 28953. migration: 0. found invalid transactions 0. found republish transactions 0. tabletInMemorySync:
0. need recovery: 0. cost: 66 ms

And the dynamic partition scheduler seems also hang:

I found a similar issue, as issue #11319 saying, we should avoid using common global ForkJoinPool to execute parallelStream tasks and prevent Fe deadlocks from occurring.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

doris-robot · 2024-05-16T19:36:49Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

xy720 · 2024-05-16T19:53:10Z

run buildall

lide-reed · 2024-05-17T08:03:48Z

It's said that java.util.concurrent.ForkJoinPool may degrade performance. Please confirm whether the changes here involve high-frequency operations.
https://stackoverflow.com/questions/20288379/analysis-performance-of-forkjoinpool

xy720 · 2024-05-17T10:18:02Z

It's said that java.util.concurrent.ForkJoinPool may degrade performance. Please confirm whether the changes here involve high-frequency operations.
https://stackoverflow.com/questions/20288379/analysis-performance-of-forkjoinpool

As we discussed, the pr will not affect performance. No longer use the global only thread pool here, but instead use its own thread pool separately.

lide-reed

LGTM

github-actions · 2024-05-27T09:24:41Z

PR approved by at least one committer and no changes requested.

github-actions · 2024-05-27T09:24:43Z

PR approved by anyone and no changes requested.

cambyzju

LGTM

### What problem does this PR solve? come from: #34988 Problem: Use JDK11 and call `show proc "/cluster_health/tablet_health"` frequently, make large number of ForkJoinPool thread leak

save

c14bb39

lide-reed approved these changes May 27, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label May 27, 2024

github-actions bot added the reviewed label May 27, 2024

cambyzju approved these changes May 27, 2024

View reviewed changes

xy720 merged commit 3664ca4 into apache:master May 27, 2024

yiguolei pushed a commit that referenced this pull request May 28, 2024

[bug](Fe) fix potential deadlock in show proc statement (#34988)

c38c939

dataroaring pushed a commit that referenced this pull request May 28, 2024

[bug](Fe) fix potential deadlock in show proc statement (#34988)

e72da10

dataroaring added the dev/2.0.x label Jun 14, 2024

dataroaring added the dev/3.0.0-merged label Aug 5, 2024

cambyzju mentioned this pull request Dec 3, 2024

[fix](ForkJoinPool) we should not new a thread pool every call #44891

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug](Fe) fix potential deadlock in show proc statement#34988

[bug](Fe) fix potential deadlock in show proc statement#34988
xy720 merged 1 commit intoapache:masterfrom
xy720:prevent-dead-lock-in-show-proc

xy720 commented May 16, 2024 •

edited

Loading

Uh oh!

doris-robot commented May 16, 2024

Uh oh!

xy720 commented May 16, 2024

Uh oh!

lide-reed commented May 17, 2024 •

edited

Loading

Uh oh!

xy720 commented May 17, 2024

Uh oh!

lide-reed left a comment

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

cambyzju left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

xy720 commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Further comments

Uh oh!

doris-robot commented May 16, 2024

Uh oh!

xy720 commented May 16, 2024

Uh oh!

lide-reed commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xy720 commented May 17, 2024

Uh oh!

lide-reed left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

cambyzju left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xy720 commented May 16, 2024 •

edited

Loading

lide-reed commented May 17, 2024 •

edited

Loading