-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZEPPELIN-2075] Can't stop infinite while
statement in pyspark Interpreter.
#1985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for great contribution. I left few comments. Please take a look.
And would it be too difficult to have an unittest?
@Override | ||
public void cancel(InterpreterContext context) { | ||
SparkInterpreter sparkInterpreter = getSparkInterpreter(); | ||
sparkInterpreter.cancel(context); | ||
try { | ||
interrupt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
In case of Spark job is running, need to wait sparkInterpreter.cancel(context) and shouldn't call interrupt.
In case of Spark job is not running, then interrupt() need to be called.
what do you think?
try { | ||
interrupt(); | ||
} catch (IOException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shell we use logger to print log?
|
||
|
||
signal.signal(signal.SIGINT, handler_stop_signals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it works without this line and handler_stop_signals() function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It seems that spark has already signal handler.
Thanks.
@astroshim can you take a look CI error ? |
@astroshim This would be a problem in case of scala snippet also(a loop with a print in other words , that which runs only within driver). Is there a possible fix such that all interpreters(scala, python) can benefit. |
Runtime.getRuntime().exec("kill -SIGINT " + pythonPid); | ||
} else { | ||
logger.warn("Non UNIX/Linux system, close the interpreter"); | ||
close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this leave the interpreter in a bad state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, but ERROR
status would be a problem?
logger.info("Sending SIGINT signal to PID : " + pythonPid); | ||
Runtime.getRuntime().exec("kill -SIGINT " + pythonPid); | ||
} else { | ||
logger.warn("Non UNIX/Linux system, close the interpreter"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure - why is it going to be non Unix
if pythonPid == -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because creating process in windows creates a pseudo process
(thread), and the pid
is a negative number. ``
Hello. Sorry for late response. |
} catch (InterruptedException e) { | ||
e.printStackTrace(); | ||
} | ||
pySparkInterpreter.cancel(context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to something that checks if thread is actually finished or not?
I think thread.join()
is required here. otherwise test will not fail even if job is not cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing properly out. let me fix it.
CI test failure is not related (fixed by #2033). LGTM and merge to master if no further discussions. |
@astroshim It looks like the python process will be killed, does it mean the interpreter is closed and the state of all the previous paragraphs is lost ? I try the cancel function in jupyter, it seems jupyter just cancel the current cell but keep the python process alive, the previous cell's state is still in the python process. Maybe there's a better approach for this. |
Sorry, it's my misunderstanding. It just interrupt the process rather than kill it. The behavior is consistent with jupyter. It's awesome. |
@zjffdu yes, this PR is not kill the python process. |
Sorry for the late comment. |
@FireArrow Note has permission in the zeppelin so if user doesn't have |
@astroshim That's not what I'm talking about. Assume the following: Interpreters will with this run as the user starting the process, and the zeppelin user will not have permission to send any signal to them. |
@FireArrow Yea, right. I understood. Let me fix it. |
@FireArrow I think it seems not can be possible to know if impersonation mode or not in Interpreter side(I couldn't find it) and checking |
I have basically only set up zeppelin in isolated mode with impersonation, so I'm not sure what is the "expected" behavior is outside of that. Someone else will have to pitch in here. |
@Leemoonsoo @zjffdu @felixcheung @karuppayya Could you guys help to review about the impersonation mode? |
Even in the impersonation mode, it is the process owner to interrupt the process. Here it is the interpreter process owner interrupt the process instead of the zeppelin server owner. But I can not verify it in impersonation mode. It seems the impersonation mode is broken. I got the following error, can anyone else verify the impersonation mode ?
|
@zjffdu I tested impersonation mode followed http://zeppelin.apache.org/docs/0.7.0/manual/userimpersonation.html and worked well. |
I think @FireArrow 's concern is that zeppelin user can kill the process. Actually it is not the user zeppelin send the signal, it is the interpreter process owner. So I don't think there's any permission issue here. @FireArrow Please help confirm. |
@FireArrow ping. |
Terribly sorry for late answer! Not sure why I didn't get any notifications about your answers 😕 |
@FireArrow Thank you for confirm your idea and thank @zjffdu help to review! |
Will merge it |
…rpreter. ### What is this PR for? If following code runs with Pyspark Interpreter, there is no way to cancel except Zeppelin Server restart. ``` %spark.pyspark import time while True: time.sleep(1) print("running..") ``` ### What type of PR is it? Bug Fix | Improvement ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-2075 ### How should this be tested? Run above code with Pyspark Interpreter and try to cancel. ### Screenshots (if appropriate) - before ![pyspark before](https://cloud.githubusercontent.com/assets/3348133/22696141/615c1206-ed90-11e6-9bbb-339ecdec73fc.gif) - after ![pyspark after](https://cloud.githubusercontent.com/assets/3348133/22696168/70899172-ed90-11e6-99e1-342eb4094b2c.gif) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no Author: astroshim <hsshim@zepl.com> Closes #1985 from astroshim/ZEPPELIN-2075 and squashes the following commits: 84bf09a [astroshim] fix testcase bc12eaa [astroshim] pass pid to java b60d89a [astroshim] Merge branch 'master' into ZEPPELIN-2075 f26eacf [astroshim] add test-case for canceling. c0cac4e [astroshim] fix logging 678c183 [astroshim] remove signal handler 65d8cc6 [astroshim] init python pid variable 6731e56 [astroshim] add signal to cancel job (cherry picked from commit 9f22db9) Signed-off-by: Jongyoul Lee <jongyoul@apache.org>
What is this PR for?
If following code runs with Pyspark Interpreter, there is no way to cancel except Zeppelin Server restart.
What type of PR is it?
Bug Fix | Improvement
What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-2075
How should this be tested?
Run above code with Pyspark Interpreter and try to cancel.
Screenshots (if appropriate)
before
after
Questions: