New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown #2322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like pollEvent()
is also under synchronized (eventQueue)
- doesn't that mean while it waits, nothing can remove items from the eventQueue?
@felixcheung good catch, after a second look I can see that while it is waiting on I guess all these problems are already solved by Java standard lib, if I have positive feedback I will go substituting |
I think you are getting a 👍 |
@@ -477,15 +477,18 @@ public void onParaInfosReceived(Map<String, String> infos) { | |||
/** | |||
* Wait for eventQueue becomes empty | |||
*/ | |||
public void waitForEventQueueBecomesEmpty() { | |||
public void waitForEventQueueBecomesEmpty(long atMost) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if atMost
indicated what time unit it wants. Rename it to atMostMilliseconds
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, and I harmonize this while
addressing your comments below
( deadline
etc.)
@@ -111,7 +113,8 @@ public void shutdown() throws TException { | |||
// this case, need to force kill the process | |||
|
|||
long startTime = System.currentTimeMillis(); | |||
while (System.currentTimeMillis() - startTime < 2000 && server.isServing()) { | |||
while (System.currentTimeMillis() - startTime < DEFAULT_SHUTDOWN_TIMEOUT && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not really part of the scope of this PR, it is a chance to rework the logic to remove some unnecessary arithmetic and improve readability a bit.
Change long startTime = System.currentTimeMillis();
to long deadline = System.currentTimeMillis() + DEFAULT_SHUTDOWN_TIMEOUT;
Change while (System.currentTimeMillis() - startTime < DEFAULT_SHUTDOWN_TIMEOUT &&
to while (System.currentTimeMillis() < deadline) &&
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Looking at the code it looks quite harmful also the pattern used here, here and here if anything goes wrong the Thread can stay in Could I ask if somebody can at least provide some kind of deadline also for those? I can even do the refactoring if I have positive feedbacks, anyhow I believe this PR is ok as is and I open a new issue and new PR in case. |
I'm all for refactoring the use of synchronized into using more proper data types instead. It's to easy to get stuck in a [live|dead]lock when doing the synchronized yourself |
I have had to fix some escaping problems (I could reproduce the problem locally manually) to make travis happy, and it also required some retries .... I hope this PR is addressed and we can move on. thanks in advance |
LGTM
Further refactoring on codes pointed out sounds good as well. |
am I supposed to do anything else? |
@andreaTP Changes on file ZeppelinIT.java are required? And It'll be great if this branch is rebased (or merge) to master and see CI becomes green. |
@Leemoonsoo I can reproduce the error locally running on Chrome (i.e. the current encoding of the test doesn't work for me out of the box with chrome), should I separate these modifications in a separate PR? |
@andreaTP Yes. separate PR would be more helpful. |
@Leemoonsoo sounds good :-) I'll do that tomorrow! |
Thanks @andreaTP ! |
c9e2a30
to
e58483e
Compare
I reverted back modifications to concurrency management since it looks like they result in Ui response instability, I'm not sure why this happens, I believe that we can conclude with a clean PR like this. |
ping |
Merge to master and branch-0.7 if no further comment |
### What is this PR for? There is the chance to have a RemoteServerInterpreter hang forever during shutdown ### What type of PR is it? [Bug Fix] ### What is the Jira issue? [ZEPPELIN-2502] ### How should this be tested? Unit test provided for the fix. ### Questions: * Is there breaking changes for older versions? * Does this needs documentation? Author: andrea <andrea.peruffo1982@gmail.com> Closes #2322 from andreaTP/processHang and squashes the following commits: e58483e [andrea] [ZEPPELIN-2502] RemoteInterpreterServer hang forever during shutdown (cherry picked from commit c7c9aa1) Signed-off-by: Lee moon soo <moon@apache.org>
What is this PR for?
There is the chance to have a RemoteServerInterpreter hang forever during shutdown
What type of PR is it?
[Bug Fix]
What is the Jira issue?
[ZEPPELIN-2502]
How should this be tested?
Unit test provided for the fix.
Questions: