Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZEPPELIN-1984] Zeppelin Server doesn't catch all exceptions when launching interpreter.sh #1921

Closed
wants to merge 1 commit into from

Conversation

Tagar
Copy link
Contributor

@Tagar Tagar commented Jan 20, 2017

https://issues.apache.org/jira/browse/ZEPPELIN-1984

DEBUG [2017-01-20 13:56:37,688] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:189) - /opt/zeppelin/zeppelin-active/bin/interpreter.sh: line 207: return: can only `return' from a function or sourced script
INFO [2017-01-20 13:56:37,690] ({Exec Default Executor} RemoteInterpreterManagedProcess.java[onProcessComplete]:164) - Interpreter process exited 0

So return 1 outside of function is not correct and gets ignored by shell interpreters, also it causes
Zeppelin to not catch situations when interpreter hasn't started, as shown in ZEPPELIN-1984.
As you can see Zeppelin got exit status of interpreter.sh as 0 when it had to be 1 (error). Zeppelin then starts a loop to try to connect interpreter process, and fails half a minute later with connection refused

What is this PR for?

Fix for https://issues.apache.org/jira/browse/ZEPPELIN-1984

What type of PR is it?

Bug Fix

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-1984

How should this be tested?

On example of Spark interpreter, remove keytab file and keep --keytab reference to it in SPARK_SUBMIT_OPTIONS. Try to use Spark interpreter. spark-submit will fail with Exception in thread "main" org.apache.spark.SparkException: Keytab file: /home/someuser/.kt does not exist but Zeppelin (around RemoteInterpreterManagedProcess.java ) wouldn't capture this problem (or many other problems starting interpreter process as indicated here )

This fix for ZEPPELIN-1984 would allow for error to be captured by RemoteInterpreterManagedProcess.java.

A future improvement could be made, to carry forward exception from interpreter to RemoteInterpreterManagedProcess.java so end user could clearly see what's the problem, not just that the interpreter could not be started.

Questions:

  • Does the licenses files need update?
    No
  • Is there breaking changes for older versions?
    No
  • Does this needs documentation?
    No

@Tagar
Copy link
Contributor Author

Tagar commented Jan 20, 2017

I've tested Zeppelin with this one-line 42b02bd
commit change and now RemoteInterpreterManagedProcess.java correctly gets return code as 1, notice Process exited with an error: 1

INFO [2017-01-20 15:31:06,710] ({Exec Default Executor} RemoteInterpreterManagedProcess.java[onProcessFailed]:171) - Interpreter process failed {}
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
at java.lang.Thread.run(Thread.java:745)
DEBUG [2017-01-20 15:31:07,070] ({pool-2-thread-2} RemoteInterpreterUtils.java[checkIfRemoteEndpointAccessible]:53) - Remote endpoint 'localhost:3044' is not accessible (might be initializing): Connection refuse
d
DEBUG [2017-01-20 15:31:07,570] ({pool-2-thread-2} RemoteInterpreterUtils.java[checkIfRemoteEndpointAccessible]:53) - Remote endpoint 'localhost:3044' is not accessible (might be initializing): Connection refuse
d
... (connect attempts loop continues until timeout reached)

but still you see that RemoteInterpreterManagedProcess.java is still falling into connect attempts loop. So RemoteInterpreterManagedProcess.java ignores return code 1 (signaled from interpreter.sh as exit code that interpreter hasn't started). Will need somebody's help to add this logic too.

Another possible improvement is that RemoteInterpreterManagedProcess could also catch root cause from logger, as you can see it does show up in the log:

DEBUG [2017-01-20 15:31:06,685] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:189) - Exception in thread "main" org.apache.spark.SparkException: Keytab file: /home/someuser/.kt does not
exist

So then end-users wouldn't have to enable debugging to understand why interpreter isn't starting.

@Leemoonsoo
Copy link
Member

Change LGTM. Recently there were some fixes for CI.
@Tagar Could you rebase this PR and see if CI build goes green?

And i implemented propagating actual cause to front-end at #1931. @Tagar Could you try if it shows proper error on the notebook in your case?

https://issues.apache.org/jira/browse/ZEPPELIN-1984

{quote}
DEBUG [2017-01-20 13:56:37,688] ({Exec Stream Pumper} RemoteInterpreterManagedProcess.java[processLine]:189) - /opt/zeppelin/zeppelin-active/bin/interpreter.sh: line 207: return: can only `return' from a function or sourced script
 INFO [2017-01-20 13:56:37,690] ({Exec Default Executor} RemoteInterpreterManagedProcess.java[onProcessComplete]:164) - Interpreter process exited 0
{quote}

So `return 0` outside of function is not correct and gets ignored by shell interpreters, also it causes 
Zeppelin to not catch situations when interpreter hasn't started, as shown in ZEPPELIN-1984.
@Tagar
Copy link
Contributor Author

Tagar commented Jan 23, 2017

@Leemoonsoo I rebased this PR and it went green. Thanks!

@Leemoonsoo
Copy link
Member

@Tagar thanks, merge to master if no further discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants