Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] Only initialize the heartbeat thread pool when the role is driver #177

Merged
merged 1 commit into from
Jun 16, 2022

Conversation

zuston
Copy link
Contributor

@zuston zuston commented Jun 16, 2022

What changes were proposed in this pull request?

Only initialize the heartbeat thread pool when the role is driver

Why are the changes needed?

Just to avoid creating extra thread pool.

Does this PR introduce any user-facing change?

No

How was this patch tested?

No

@jerqi
Copy link
Collaborator

jerqi commented Jun 16, 2022

The spark2 also have similar problems. Could you modify the spark2 at the same time?

@zuston
Copy link
Contributor Author

zuston commented Jun 16, 2022

Updated @jerqi .

I'm confused that when heartbeat failed, it will do nothing. Right?

@colinmjj
Copy link
Collaborator

Updated @jerqi .

I'm confused that when heartbeat failed, it will do nothing. Right?

Heartbeat is to notify shuffle server app is alive, and every rpc also will do the notify. According to current implementation, ignore the failed rpc.

@jerqi
Copy link
Collaborator

jerqi commented Jun 16, 2022

If shuffle server hasn't received any heartbeat of application, shuffle server will delete the shuffle data of the application.

@jerqi jerqi closed this Jun 16, 2022
@jerqi jerqi reopened this Jun 16, 2022
@zuston
Copy link
Contributor Author

zuston commented Jun 16, 2022

Thanks for your explanation. @jerqi @colinmjj .

@jerqi jerqi changed the title Only initialize the heartbeat thread pool when the role is driver [Improvement] Only initialize the heartbeat thread pool when the role is driver Jun 16, 2022
Copy link
Collaborator

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks for your contribution.

@jerqi jerqi merged commit 7a5f943 into Tencent:master Jun 16, 2022
jerqi added a commit that referenced this pull request Jun 22, 2022
backport 0.5.0

### What changes were proposed in this pull request?
We need to judge heartbeatExecutorService whether is null when we will stop it.

### Why are the changes needed?
#177 pr introduce this problem, when we run Spark applications on our cluster, the executor will throw NPE when method `stop` is called.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test
jerqi added a commit that referenced this pull request Jun 22, 2022
### What changes were proposed in this pull request?
We need to judge heartbeatExecutorService whether is null when we will stop it.

### Why are the changes needed?
#177 pr introduce this problem, when we run Spark applications on our cluster, the executor will throw NPE when method `stop` is called.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test
jerqi added a commit that referenced this pull request Jun 22, 2022
backport 0.5.0

### What changes were proposed in this pull request?
We need to judge heartbeatExecutorService whether is null when we will stop it.

### Why are the changes needed?
#177 pr introduce this problem, when we run Spark applications on our cluster, the executor will throw NPE when method `stop` is called.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants