Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10911] Executors should System.exit on clean shutdown. #9946

Closed
wants to merge 1 commit into from

Conversation

zhuoliu
Copy link

@zhuoliu zhuoliu commented Nov 24, 2015

Call system.exit explicitly to make sure non-daemon user threads terminate. Without this, user applications might live forever if the cluster manager does not appropriately kill them. E.g., YARN had this bug: HADOOP-12441.

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Nov 24, 2015

Test build #46627 has finished for PR 9946 at commit 2869d21.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Nov 25, 2015

See the JIRA. I still don't believe this is safe. At least, causes more problems than it seems to solve.

@tgravescs
Copy link
Contributor

@srowen do you know of any actual use cases this will break? We've finished running the user code and are exiting anyway so things should just be shutting down. At this point it shouldn't be committing anything and under normal circumstances if it takes to long its going to get shot anyway.

If we know it will affect stuff I'm fine with leaving it out because under normal circumstances the cluster manager handles removing these. But this particular condition happened when there was an issue with the cluster manager as well. Leaving around tasks is bad and anything we can do to protect from users in my opinion is good.

@srowen
Copy link
Member

srowen commented Dec 1, 2015

But isn't the scenario here that the user app isn't done, because it spawned non-daemon threads that are doing something? I agree it's not good practice, but if apps avoided doing this entirely we'd have no problem to begin with. The question is what to do if such a user thread does exist and runs long.

Killing it immediately could cause problems, and it's not terribly theoretical: imagine persisting data to disk or writing to a socket and failing to write all the bytes. Not-killing it of course opens the possibility that the thread never stops at all. Which is worse? The long-running user thread is the app's "fault" and is pretty easy to debug by looking at a stack dump. On the other hand killing threads straight away could cause problems for a sort of reasonably behaving app. (Also imagine this non-daemon thread could be in library code.)

Maybe some kind of timeout mostly mitigates the issue, but then I think it's just trying to save a fairly clearly misbehaving app from itself at a non-trivial cost.

You're always going to have the possibility of a stuck process (deadlock, infinite loop in a driver, etc) and need to be able to kill that if needed as an admin.

@tgravescs
Copy link
Contributor

Well the users shouldn't have been using non-daemon threads in the first place, they spawned them while a task was running and then never cleaned them up. We asked them to change that. I can't think of any scenarios where the user should be relying on anything running past when the users task code exits as it would just be a natural race condition. the task finishes and the driver could kill the container, it could finish the app and yarn could kill the container, etc. whatever they are doing may or may not finish anyway.

@vanzin
Copy link
Contributor

vanzin commented Dec 3, 2015

@zhuoliu given the discussion in SPARK-10911, could you write a proper PR description and reference HADOOP-12441, which is why this is needed on the YARN side?

@tgravescs
Copy link
Contributor

@vanzin So the bug is one thing it can actually happen in other ways too. For instance if someone messes up the node while upgrading the NM during rolling upgrade. Or it could happen if there is a bug in other resource managers (standalone, mesos, etc). I'm fine with updating the description but don't think its limited to HADOOP-12441. It should system exit to make sure users process really exits.

@vanzin
Copy link
Contributor

vanzin commented Dec 3, 2015

Sure, it's more for compleness; explaining why you need to explicitly exit, since I'd expect the cluster manager to kill those containers when the app finally finishes, yet there are cases when that doesn't seem to happen.

@andrewor14
Copy link
Contributor

I agree with @srowen. Just ending the executor can pose regressions in behavior that are difficult to debug. If the concern is that executor processes are undying, then setting a timeout of some reasonable time before calling System.exit(0) should do it. E.g. we probably don't expect anyone to want to leave their executors around for more than 2 hours after the executors have already stopped contact with the driver.

@zhuoliu
Copy link
Author

zhuoliu commented Jan 20, 2016

@vanzin @srowen @andrewor14 , I think your concerns make sense. Also, adding timeout might have another issue that resources cannot be released immediately even if tasks have completed. So for now, we may just close this pull request and the issue, and reopen it if the same problems repeat for other users.

@srowen
Copy link
Member

srowen commented Jan 20, 2016

@zhuoliu OK though you will have to close it

@zhuoliu zhuoliu closed this Jan 20, 2016
@zhuoliu
Copy link
Author

zhuoliu commented Jan 20, 2016

Thanks, closed.

@tgravescs
Copy link
Contributor

Just to point out here I may re-open this. I would still rather fix this, a known bug that I can reproduce then worry about a theoretical it might break something. Unless of course someone can give me a hard use case that this breaks.

At the point we call System.exit here all user code is done and we are terminating. If there is something you know of that the user should be allowed to do after this then we should create something like a shutdown hook so the user can properly clean it up.

A timeout would probably work here but I'm not fond of it here, it is going to delay it for everyone and you don't give the resources back as quickly. You could also end up in the same situation where if you choose a timeout, say 3 seconds and if this theoretical user code didn't finish within that 3 seconds you would kill it anyway.

Actually thinking about this more, maybe this is a perfect time as we can put it in 2.0 so change in behavior is more reasonable. thoughts ?

@tgravescs
Copy link
Contributor

Also note System.exit still goes through the normal JVM shutdown hooks, etc.

@vanzin
Copy link
Contributor

vanzin commented Jan 20, 2016

I'm not against this change; in fact, this is already the current behavior in healthy YARN installs (since YARN will kill your container). I just wanted the change description to be more descriptive of the underlying reason why the exit is being explicitly added.

I also don't buy the existence of any valid user case for a user thread to block container exit, exactly for the reason above.

@tgravescs
Copy link
Contributor

@andrewor14 @srowen thoughts or concerns?

@srowen
Copy link
Member

srowen commented Jan 20, 2016

I disagree with: "At the point we call System.exit here all user code is done and we are terminating" It should ideally be so, but, what happens when it isn't? A user shutdown hook could be the thing still executing (I think?) so I don't know if that's a solution.

It may be a common sin among fairly righteous apps, as I can imagine all kinds of third party libraries taking some short time to shutdown their pools and flush their whatever to disk. Uncommonly, something pathologically runs forever -- bad app. There's no way to distinguish them. Killing the JVM risks some bad end for the app's cleanup; not killing it risks a stuck JVM. A timeout doesn't solve it; the implicit timeout of 0 here is the most extreme choice in one direction. My gut is that it's better to avoid possibly harming several fairly innocent apps with this behavior change, understanding it means more manual work to chase down and kill the occasional errant bad app.

I'm OK being out-voted too, but given the discussion so far that remains my take.

@vanzin
Copy link
Contributor

vanzin commented Jan 20, 2016

It should ideally be so, but, what happens when it isn't?

What happens is that the user code will die anyway because YARN will kill the container. The explicit exit here is to work around a bug in certain versions of YARN (and some broken cluster configurations, e.g., unhealthy NMs).

@srowen
Copy link
Member

srowen commented Jan 20, 2016

Actually, what's the difference between letting main complete normally (in which case java already exits with status 0 right?) and exiting explicitly at the end of main with status 0? You still get shutdown hooks but System.exit kills the non-daemon threads?

I see you're saying it isn't really a behavior change. In the scenario here, the JVM would still be running because some non-daemon thread is alive, so how does YARN decide to forcibly kill it -- it's because the app has already told YARN it's done but hasn't completed?

There still seems to be a material difference between always immediately killing threads at the end, and maybe getting killed some short time later, since the former will always occur in normal operation. Could we emulate the delay as well as better-than-nothing?

@vanzin
Copy link
Contributor

vanzin commented Jan 20, 2016

You still get shutdown hooks but System.exit kills the non-daemon threads?

You get shutdown hooks when the JVM exits (nor matter how, except kill -9 of course). The explicit System.exit is to kill non-daemon threads.

it's because the app has already told YARN it's done but hasn't completed?

Precisely.

Could we emulate the delay as well as better-than-nothing?

Why? What would that really help with? Any user application relying on "container stays alive after YARN application is deemed finished" is doing something wrong, and cannot assume anything about how long containers will live after the app is finished.

I still haven't seen a single, valid use case for not killing these non-daemon threads.

@srowen
Copy link
Member

srowen commented Jan 21, 2016

I'm not suggesting that behavior is correct, but equally you could say an app that never terminates is doing something wrong or a YARN that doesn't stop it is doing something wrong. The lesser-of-two-evils 'use case' is as I say above: some app's cleanup fails not infrequently in an inconsistent state since it was immediately halted and that has its cost as does letting an app get stuck. Simply, which is worse?

@vanzin
Copy link
Contributor

vanzin commented Jan 21, 2016

a YARN that doesn't stop it is doing something wrong

But that's the case. It's a bug in YARN. See https://issues.apache.org/jira/browse/HADOOP-12441.

some app's cleanup fails not infrequently in an inconsistent state since it was immediately halted

If the app fails in that way, it is broken. The cleanup should be done before the driver's {{main()}} returns; otherwise, the app has relinquished control back to Spark / the cluster manager, and anything can happen.

@tgravescs
Copy link
Contributor

so I haven't seen any use cases this would break.

I would argue is they are relying on this behavior its a bug in the user code. They should be using shut down hook or other and that still work with system.exit. If they are doing this now they are getting undeterministic results because YARN and other cluster managers could shoot them at any time.

Personally I think inconsistent behavior is worse as it is much harder to debug.

@srowen
Copy link
Member

srowen commented Jan 21, 2016

I don't think this is quite addressing my point, but at best I'm asserting there's a choice between which "bad" behavior you want to deal with, not that either behavior is "OK". Exiting immediately might harm apps that are unintentionally bad (e.g. third party libs flushing); then again tackling stuck apps is protecting the cluster. I'd just reassert the above, but this is a matter of judgment and being outvoted seems OK to me.

@vanzin
Copy link
Contributor

vanzin commented Jan 21, 2016

Sorry Sean, I don't see how we're not addressing your comment. Without the change, the behavior you're concerned about already exists, because YARN kills containers when that bug is not present. It's, for all practical effects, exactly the same as calling System.exit explicitly.

@srowen
Copy link
Member

srowen commented Jan 21, 2016

Sure, but it becomes much more likely to bite if you always kill the threads immediately, if they're still running, rather than have them killed a little bit later by YARN. Are you saying YARN immediately kills the container here? then in practice agree. Otherwise this seems like a coherent concern; you may not think it's worth worrying about but it's not hard to grok

@vanzin
Copy link
Contributor

vanzin commented Jan 21, 2016

Are you saying YARN immediately kills the container here?

I'm saying that it's the user's fault if his application depends on that non-predictable behavior. We're talking milliseconds here, that might be affected by everything from YARN's internal code to network delays to kernel scheduling. There's absolutely no argument for someone depending on that for the correct behavior of their application.

@tgravescs
Copy link
Contributor

@zhuoliu please re-open this.

@zhuoliu
Copy link
Author

zhuoliu commented Jan 21, 2016

Now reopen.

@zhuoliu zhuoliu reopened this Jan 21, 2016
@srowen
Copy link
Member

srowen commented Jan 21, 2016

I'm on board with this if it's true that YARN virtually immediately kills a JVM like this if it's not done by the time the NM thinks it's done. Then indeed regardless of what it does to an app that's tardy in cleaning up, it's not a materially different behavior. If it helps handle another bad app behavior, there's no downside to doing that.

@tgravescs
Copy link
Contributor

+1. I'll give this a bit for others to look at and make sure we are done discussing.

@vanzin
Copy link
Contributor

vanzin commented Jan 21, 2016

I still would like to see a better commit message (the link to JIRA is implied by the title so is redundant in a commit message).

@zhuoliu
Copy link
Author

zhuoliu commented Jan 21, 2016

Hi @vanzin , do we want to amend the commit message to something like this?
"Call system.exit explicitly to make sure non-daemon user threads terminate.
Without this, user applications might live forever if the cluster manager does not appropriately kill them. E.g., YARN had this bug: HADOOP-12441."

@vanzin
Copy link
Contributor

vanzin commented Jan 21, 2016

@zhuoliu that sounds great. Just edit your very first comment (that's the commit message).

@zhuoliu
Copy link
Author

zhuoliu commented Jan 21, 2016

Thanks @vanzin , commit message updated.

@vanzin
Copy link
Contributor

vanzin commented Jan 21, 2016

Huh, no. You have to edit the very first comment on this page, not fix the commit message on your github branch.

@zhuoliu
Copy link
Author

zhuoliu commented Jan 21, 2016

Sorry for that. Just updated the first comment and changed the commit message back to original.

@vanzin
Copy link
Contributor

vanzin commented Jan 21, 2016

Great, thanks! LGTM.

@SparkQA
Copy link

SparkQA commented Jan 22, 2016

Test build #49891 has finished for PR 9946 at commit a3c5064.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in ae0309a Jan 26, 2016
@yhuai
Copy link
Contributor

yhuai commented Jan 26, 2016

Can you make a comment to give a summary of the discussion? I am not sure that the commit message has the enough info.

@zhuoliu
Copy link
Author

zhuoliu commented Jan 26, 2016

Sure.
A brief summary is that: (picked a few conclusive points from above):

"At the point we call System.exit here all user code is done and we are terminating. If there is something you know of that the user should be allowed to do after this then we should create something like a shutdown hook so the user can properly clean it up.
A timeout would probably work here but I'm not fond of it here, it is going to delay it for everyone and you don't give the resources back as quickly. You could also end up in the same situation where if you choose a timeout, say 3 seconds and if this theoretical user code didn't finish within that 3 seconds you would kill it anyway."

"I also don't buy the existence of any valid user case for a user thread to block container exit,"

"I'm saying that it's the user's fault if his application depends on that non-predictable behavior. We're talking milliseconds here, that might be affected by everything from YARN's internal code to network delays to kernel scheduling. There's absolutely no argument for someone depending on that for the correct behavior of their application."'

"I'm on board with this if it's true that YARN virtually immediately kills a JVM like this if it's not done by the time the NM thinks it's done. Then indeed regardless of what it does to an app that's tardy in cleaning up, it's not a materially different behavior. If it helps handle another bad app behavior, there's no downside to doing that."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants