-
Notifications
You must be signed in to change notification settings - Fork 722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable JitDump support on JITServer #12439
Conversation
3606ca5
to
4025f13
Compare
@mpirvu ready for review.
While on the server the jitdump part looks like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some inline questions/comments
Ideally, no compilation will completely fail at the client just because the server crashes. Therefore, the client should not empty its compilation queue forcefully failing compilations. Rather, after seeing that the JITServer is in a crashing mode, it should perform the compilations locally for some time, and only after a while retry the connection. |
I agree. I checked, and setting |
4025f13
to
6b3ff04
Compare
6b3ff04
to
129b09e
Compare
@mpirvu ready for another review. |
Actually, just found a bug, will mark it WIP while I fix it. |
129b09e
to
0491cf1
Compare
@mpirvu fixed a bug should be ready for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we prevent the client from continuing to send requests on sockets that are already open? Current changes modify CompilationInfo::requeueOutOfProcessEntry()
to return abruptly once the flag _disableFurtherCompilation
is true
. However, the socket is still open and the client could send another compilation request which will likely timeout, because there is nobody to serve it. The identity of the open socket is also lost.
Maybe it's better to leave CompilationInfo::requeueOutOfProcessEntry()
as is and at the point where requeueOutOfProcessEntry()
is called, if _disableFurtherCompilation
is true
, then do not requeue, but instead close the socket connection, something of this sort:
// Delete server stream
stream->~ServerStream();
TR::Compiler->persistentGlobalAllocator().deallocate(stream);
entry._stream = NULL;
0491cf1
to
8e3507a
Compare
I addressed the current comments, including the last one. I moved the checks for |
Another change that we should do is to move the setting of the disableFurtherCompilation flag into the critical section of compilation monitor, i.e. line https://github.com/eclipse/openj9/blob/master/runtime/compiler/control/JitDump.cpp#L319 moved at line https://github.com/eclipse/openj9/blob/master/runtime/compiler/control/JitDump.cpp#L336 For the same reason we should probably change the code of
|
bd64b1f
to
6144fae
Compare
@mpirvu ready for another review. Could you check if you can get a dump with AcmeAir now? It seems to work for me. |
6144fae
to
e6dc038
Compare
Linter says:
|
e6dc038
to
3d7dab0
Compare
Forgot an |
I ran liberty start-up and induced a crashed at the server. The JIT dump contains:
So there was some exception that prevented the dump from being generated. I cannot tell whether the client received the message to print the IL, but that did not happen. The client finished operating without a problem. Lines from the client:
It's possible that the default timeout at the client is too small (2 sec) and because during the coredump generation the server does not send anything to the client, the client just closes the connection. |
With a larger timeout at the client, the jitdump recompilation log is produced normally. |
Looking at the client, in my tests the diagnostic recompilation at the server succeeded, which is not unrealistic.
What I think may not be ok is |
Another example, where the diagnostic recompilation crashes due to a different cause. The jitDump is incomplete.
|
3d7dab0
to
5c500b3
Compare
@mpirvu pushed the latest changes. One thing I changed besides what we discussed is that the server will now ignore any CH table updates for a diagnostic recompilation, because it might have missed previous updates, which will lead to an assert triggered in |
Pushed the last updates as a new commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Before triggering jenkins testing could you please rebase to make sure we test the latest code? Thanks
Also, I think we could squash the 3 commits into one. |
This commit allows server to produce a JitDump log when a crash occurs during a compilation. Once the crash happens, the JitDump handler runs normally, producing an IL log of the crashed thread until it reaches the point when the problematic method needs to be recompiled. There, it sends the client a `compilationThreadCrashed` message, asking it to schedule a JitDump recompilation and blocks until the recompilation is done. When the client receives the error message, it temporarily stops all further compilations, purges the compilation queue and sends a request to the server to recompile the method as a JitDump compilation. The server will compile the method on the diagnostic thread, notify the waiting crashed thread, allowing the JitDump handler to finish and terminate the server. Once the diagnostic compilation is complete, the client will resume compilations locally until another server comes up. Closes: eclipse-openj9#12394 Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com>
70980ef
to
0c24714
Compare
Squashed and rebased. |
jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk11 |
This commit allows server to produce a JitDump log when a crash occurs
during a compilation.
Once the crash happens, the JitDump handler runs normally, producing
an IL log of the crashed thread until it reaches the point when
the problematic method needs to be recompiled. There, it sends
the client a
compilationThreadCrashed
message, asking it to schedulea JitDump recompilation and blocks until the recompilation is done.
When the client receives the error message, it temporarily stops all
further compilations, purges the compilation queue and sends a request
to the server to recompile the method as a JitDump compilation.
The server will compile the method on the diagnostic thread, notify
the waiting crashed thread, allowing the JitDump handler to finish and
terminate the server.
Once the diagnostic compilation is complete, the client will resume
compilations locally until another server comes up.
Closes: #12394