[ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown #2322

andreaTP · 2017-05-04T15:57:07Z

What is this PR for?

There is the chance to have a RemoteServerInterpreter hang forever during shutdown

What type of PR is it?

[Bug Fix]

What is the Jira issue?

[ZEPPELIN-2502]

How should this be tested?

Unit test provided for the fix.

Questions:

Is there breaking changes for older versions?
Does this needs documentation?

felixcheung

looks like pollEvent() is also under synchronized (eventQueue) - doesn't that mean while it waits, nothing can remove items from the eventQueue?

andreaTP · 2017-05-05T06:54:30Z

@felixcheung good catch, after a second look I can see that while it is waiting on pollEvent it will also delay the sendEvent since it is synchronized too.

I guess all these problems are already solved by Java standard lib, if I have positive feedback I will go substituting eventQueue with a LinkedBlockingQueue and removing all synchronized statements since they become unuseful.

felixcheung · 2017-05-07T01:26:19Z

I think you are getting a 👍

FireArrow · 2017-05-08T07:21:02Z

...reter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterEventClient.java

@@ -477,15 +477,18 @@ public void onParaInfosReceived(Map<String, String> infos) {
  /**
   * Wait for eventQueue becomes empty
   */
-  public void waitForEventQueueBecomesEmpty() {
+  public void waitForEventQueueBecomesEmpty(long atMost) {


I think it would be nice if atMost indicated what time unit it wants. Rename it to atMostMilliseconds?

ok, and I harmonize this while addressing your comments below
( deadline etc.)

FireArrow · 2017-05-08T07:28:43Z

...nterpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterServer.java

@@ -111,7 +113,8 @@ public void shutdown() throws TException {
    // this case, need to force kill the process

    long startTime = System.currentTimeMillis();
-    while (System.currentTimeMillis() - startTime < 2000 && server.isServing()) {
+    while (System.currentTimeMillis() - startTime < DEFAULT_SHUTDOWN_TIMEOUT &&


While not really part of the scope of this PR, it is a chance to rework the logic to remove some unnecessary arithmetic and improve readability a bit.
Change long startTime = System.currentTimeMillis(); to long deadline = System.currentTimeMillis() + DEFAULT_SHUTDOWN_TIMEOUT;
Change while (System.currentTimeMillis() - startTime < DEFAULT_SHUTDOWN_TIMEOUT && to while (System.currentTimeMillis() < deadline) &&

andreaTP · 2017-05-08T15:44:59Z

Looking at the code it looks quite harmful also the pattern used here, here and here

if anything goes wrong the Thread can stay in wait status potentially forever and operations performed there are all but not transactional.

Could I ask if somebody can at least provide some kind of deadline also for those?
Better will be a little refactoring using something like ConcurrentHashMap for operations.

I can even do the refactoring if I have positive feedbacks, anyhow I believe this PR is ok as is and I open a new issue and new PR in case.

FireArrow · 2017-05-09T11:45:29Z

I'm all for refactoring the use of synchronized into using more proper data types instead. It's to easy to get stuck in a [live|dead]lock when doing the synchronized yourself

andreaTP · 2017-05-09T14:41:22Z

I have had to fix some escaping problems (I could reproduce the problem locally manually) to make travis happy, and it also required some retries ....
anyhow travis is now happy: https://travis-ci.org/nokia/zeppelin/builds/230331180

I hope this PR is addressed and we can move on.
I still wait hint from @felixcheung and @Leemoonsoo on how to proceed on the rest.

thanks in advance

Leemoonsoo · 2017-05-09T14:46:27Z

LGTM

Looking at the code it looks quite harmful also the pattern used here, here and here

Further refactoring on codes pointed out sounds good as well.

andreaTP · 2017-05-11T12:17:46Z

am I supposed to do anything else?

Leemoonsoo · 2017-05-14T21:23:42Z

@andreaTP Changes on file ZeppelinIT.java are required? And It'll be great if this branch is rebased (or merge) to master and see CI becomes green.

andreaTP · 2017-05-14T21:29:19Z

@Leemoonsoo I can reproduce the error locally running on Chrome (i.e. the current encoding of the test doesn't work for me out of the box with chrome), should I separate these modifications in a separate PR?

Leemoonsoo · 2017-05-14T21:32:29Z

@andreaTP Yes. separate PR would be more helpful.

andreaTP · 2017-05-14T21:35:20Z

@Leemoonsoo sounds good :-) I'll do that tomorrow!
and I will follow up with this PR. Thanks!

Leemoonsoo · 2017-05-14T21:39:39Z

Thanks @andreaTP !

andreaTP · 2017-05-16T10:44:15Z

I reverted back modifications to concurrency management since it looks like they result in Ui response instability, I'm not sure why this happens, I believe that we can conclude with a clean PR like this.
In case I manage to refactor and use proper java.util.concurrent into remote I will send another PR.
WDYT? @Leemoonsoo @felixcheung

andreaTP · 2017-05-17T13:42:20Z

ping

Leemoonsoo · 2017-05-19T19:47:31Z

Merge to master and branch-0.7 if no further comment

### What is this PR for? There is the chance to have a RemoteServerInterpreter hang forever during shutdown ### What type of PR is it? [Bug Fix] ### What is the Jira issue? [ZEPPELIN-2502] ### How should this be tested? Unit test provided for the fix. ### Questions: * Is there breaking changes for older versions? * Does this needs documentation? Author: andrea <andrea.peruffo1982@gmail.com> Closes #2322 from andreaTP/processHang and squashes the following commits: e58483e [andrea] [ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown (cherry picked from commit c7c9aa1) Signed-off-by: Lee moon soo <moon@apache.org>

felixcheung reviewed May 5, 2017

View reviewed changes

FireArrow reviewed May 8, 2017

View reviewed changes

andreaTP force-pushed the processHang branch from 19b99f1 to 5c12e1d Compare May 9, 2017 06:53

[ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown

e58483e

andreaTP force-pushed the processHang branch 2 times, most recently from c9e2a30 to e58483e Compare May 16, 2017 09:07

asfgit closed this in c7c9aa1 May 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown #2322

[ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown #2322

andreaTP commented May 4, 2017

felixcheung left a comment

andreaTP commented May 5, 2017

felixcheung commented May 7, 2017

FireArrow May 8, 2017

andreaTP May 8, 2017 •

edited

FireArrow May 8, 2017 •

edited

andreaTP May 8, 2017

andreaTP commented May 8, 2017

FireArrow commented May 9, 2017

andreaTP commented May 9, 2017 •

edited

Leemoonsoo commented May 9, 2017

andreaTP commented May 11, 2017

Leemoonsoo commented May 14, 2017

andreaTP commented May 14, 2017

Leemoonsoo commented May 14, 2017

andreaTP commented May 14, 2017

Leemoonsoo commented May 14, 2017

andreaTP commented May 16, 2017

andreaTP commented May 17, 2017

Leemoonsoo commented May 19, 2017

[ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown #2322

[ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown #2322

Conversation

andreaTP commented May 4, 2017

What is this PR for?

What type of PR is it?

What is the Jira issue?

How should this be tested?

Questions:

felixcheung left a comment

Choose a reason for hiding this comment

andreaTP commented May 5, 2017

felixcheung commented May 7, 2017

FireArrow May 8, 2017

Choose a reason for hiding this comment

andreaTP May 8, 2017 • edited

Choose a reason for hiding this comment

FireArrow May 8, 2017 • edited

Choose a reason for hiding this comment

andreaTP May 8, 2017

Choose a reason for hiding this comment

andreaTP commented May 8, 2017

FireArrow commented May 9, 2017

andreaTP commented May 9, 2017 • edited

Leemoonsoo commented May 9, 2017

andreaTP commented May 11, 2017

Leemoonsoo commented May 14, 2017

andreaTP commented May 14, 2017

Leemoonsoo commented May 14, 2017

andreaTP commented May 14, 2017

Leemoonsoo commented May 14, 2017

andreaTP commented May 16, 2017

andreaTP commented May 17, 2017

Leemoonsoo commented May 19, 2017

andreaTP May 8, 2017 •

edited

FireArrow May 8, 2017 •

edited

andreaTP commented May 9, 2017 •

edited