Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yarn-client模式下driver端进程一直不退出 #128

Closed
sfwang218 opened this issue May 7, 2022 · 9 comments
Closed

yarn-client模式下driver端进程一直不退出 #128

sfwang218 opened this issue May 7, 2022 · 9 comments

Comments

@sfwang218
Copy link

图片

如上图 任务结束后 driver端一直在上报心跳

我测试了下 把 scheduledExecutorService 改成守护线程可以解决该问题
scheduledExecutorService = Executors.newSingleThreadScheduledExecutor( new ThreadFactoryBuilder().setDaemon(true).setNameFormat("rss-heartbeat-%d").build());

@jerqi
Copy link
Collaborator

jerqi commented May 7, 2022

We shutdown the executorService in the methodRssShuffleManager.stop. The stop is not called in your situation. In my opinion the method must be called. Could you ensure RssShuffleManager.stop whether is be called or not.

@sfwang218
Copy link
Author

We shutdown the executorService in the methodRssShuffleManager.stop. The stop is not called in your situation. In my opinion the method must be called. Could you ensure RssShuffleManager.stop whether is be called or not.

RssShuffleManager.stop 方法没有被调用,这个方法应该是在 SparkSession.stop 方法中调用的,但是在我的程序里面没有显式的调用 SparkContext.stop 方法,所以也就没有调用 RssShuffleManager.stop 方法

@jerqi
Copy link
Collaborator

jerqi commented May 7, 2022

We shutdown the executorService in the methodRssShuffleManager.stop. The stop is not called in your situation. In my opinion the method must be called. Could you ensure RssShuffleManager.stop whether is be called or not.

RssShuffleManager.stop 方法没有被调用,这个方法应该是在 SparkSession.stop 方法中调用的,但是在我的程序里面没有显式的调用 SparkContext.stop 方法,所以也就没有调用 RssShuffleManager.stop 方法

我觉得更好的方式应该是在程序里调用stop方法,而不是把这个线程设置为守护线程。调用stop逻辑也能让Spark更好的执行一些清理逻辑。

@sfwang218
Copy link
Author

We shutdown the executorService in the methodRssShuffleManager.stop. The stop is not called in your situation. In my opinion the method must be called. Could you ensure RssShuffleManager.stop whether is be called or not.

RssShuffleManager.stop 方法没有被调用,这个方法应该是在 SparkSession.stop 方法中调用的,但是在我的程序里面没有显式的调用 SparkContext.stop 方法,所以也就没有调用 RssShuffleManager.stop 方法

我觉得更好的方式应该是在程序里调用stop方法,而不是把这个线程设置为守护线程。调用stop逻辑也能让Spark更好的执行一些清理逻辑。

是这样的,SparkContext 创建的时候是注册了 Shutdown Hook的,正常情况下 即使不调用stop方法,在任务执行完后也会由 ShutdownHookManager 来调用 stop 方法的。
但是由于这个线程没有设置成 守护线程,导致 jvm 误判程序还没有结束,因此就一直不会调用ShutdownHook,所以就会出现driver进程一直不退出的情况。

@jerqi
Copy link
Collaborator

jerqi commented May 9, 2022

这样话这个改成守护进程很合理,如果方便的话你可以提一个PR 修复这个问题。如果不方便的,我这边可以做一个PR修复这个问题。

@sfwang218
Copy link
Author

这样话这个改成守护进程很合理,如果方便的话你可以提一个PR 修复这个问题。如果不方便的,我这边可以做一个PR修复这个问题。

还是你们辛苦搞一下吧,我这边对提PR的规范不是很熟悉

@jerqi
Copy link
Collaborator

jerqi commented May 9, 2022

这样话这个改成守护进程很合理,如果方便的话你可以提一个PR 修复这个问题。如果不方便的,我这边可以做一个PR修复这个问题。

还是你们辛苦搞一下吧,我这边对提PR的规范不是很熟悉

OK,我这边修一下,其实这边有PR模版,照着填一下就可以。

@jerqi
Copy link
Collaborator

jerqi commented May 9, 2022

类似这种,介绍这个pr修改了什么,为什么需要这个修改,用户是否需要修改配置才能使用这个pr,以及怎么测试的?一般我们会是通过单测、手工测试等多种方式。

What changes were proposed in this pull request?

Heartbeat ExecutorService become daemon executor service.

Why are the changes needed?

If user don't call stop, the driver's shutdownHook won't called. The driver will hang.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual test

@sfwang218
Copy link
Author

这样话这个改成守护进程很合理,如果方便的话你可以提一个PR 修复这个问题。如果不方便的,我这边可以做一个PR修复这个问题。

还是你们辛苦搞一下吧,我这边对提PR的规范不是很熟悉

OK,我这边修一下,其实这边有PR模版,照着填一下就可以。

了解了,多谢

jerqi added a commit that referenced this issue May 9, 2022
### What changes were proposed in this pull request?
Heartbeat ExecutorService become daemon executor service. Resolve the issue #128

### Why are the changes needed?
If user don't call stop, the driver's shutdownHook won't called. The driver will hang.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manual test

Co-authored-by: roryqi <roryqi@tencent.com>
@jerqi jerqi closed this as completed May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants