-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve client JVM disconnect messages #16394
Conversation
The messages that are printed by a client JVM when the `-XX:+JITServerLogConnections` option is specified can sometimes be misleading. For instance, if a client starts, but there is no server available, the client will print a message saying that it lost connection to the server, where in fact, it was never connected to a server. In other cases there could be a transient error that one of the compilation thread experiences and the client will issue a "Lost connection to the server" message, but the other compilation threads continue to compile remotely just fine. This commit fixes the issues mentioned above. Issue: eclipse-openj9#16381 Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
Example of messages when the server dies while the client compiles remotely:
Note that after the read error the client closes its socket and retries the same compilation remotely. THis is done because sometimes the networking error is cleared very fast. However, in this example the server is dead so the client fails to connect to the server and displays the "Lost connection" message. The compilation is tried again, this time locally. |
@dsouzai Could you please review/merge this PR? Thanks |
jenkins test sanity xlinuxjit jdk17 |
I had an experiment where the server threw an exception on purpose when a message with a certain seqNo is received. The client aborted the compilation, tried it again remotely, and the server finished successfully.
This behavior is better than the previous one where the client would disconnect and try connect after 2 seconds. |
5 failures: Again, a bunch of FSD mode failures and I am sure that this PR couldn't have caused them. |
10 grinders of each failed target shows no problems whatsoever: https://openj9-jenkins.osuosl.org/job/Grinder/1570/ |
Quite possibly, though I don't understand why they show up so often nowadays in "jenkins test sanity". I used to do such testing over the summer/fall without hitting them. |
I haven't been recording the newest instances of #14704, but such failures do still occur in the nightly tests. I think the most recent test failure in one of the |
jenkins test sanity plinuxjit zlinuxjit jdk17 |
jenkins test sanity plinuxjit,zlinuxjit jdk17 |
Started grinder of 100 runs of |
It only ran once in grinder and then it hit an infra issue:
|
100 grinder runs for |
Tests on Z/P failed due to infra
|
jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk17 |
Test on P and Z passed. Tests on x86 timed-out in
|
Since the error showing on xlinux are not generated by this PR, I think that this PR can be merged. |
The messages that are printed by a client JVM when the
-XX:+JITServerLogConnections
option is specified can sometimes be misleading. For instance, if a client starts, but there is no server available, the client will print a message saying that it lost connection to the server, where in fact, it was never connected to a server. In other cases there could be a transient error that one of the compilation thread experiences and the client will issue a "Lost connection to the server" message, but the other compilation threads continue to compile remotely just fine. This commit fixes the issues mentioned above.Issue: #16381
Signed-off-by: Marius Pirvu mpirvu@ca.ibm.com