Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve client JVM disconnect messages #16394

Merged
merged 1 commit into from
Dec 5, 2022

Conversation

mpirvu
Copy link
Contributor

@mpirvu mpirvu commented Nov 30, 2022

The messages that are printed by a client JVM when the -XX:+JITServerLogConnections option is specified can sometimes be misleading. For instance, if a client starts, but there is no server available, the client will print a message saying that it lost connection to the server, where in fact, it was never connected to a server. In other cases there could be a transient error that one of the compilation thread experiences and the client will issue a "Lost connection to the server" message, but the other compilation threads continue to compile remotely just fine. This commit fixes the issues mentioned above.

Issue: #16381

Signed-off-by: Marius Pirvu mpirvu@ca.ibm.com

The messages that are printed by a client JVM when the `-XX:+JITServerLogConnections`
option is specified can sometimes be misleading. For instance, if a client starts,
but there is no server available, the client will print a message saying that it
lost connection to the server, where in fact, it was never connected to a server.
In other cases there could be a transient error that one of the compilation thread
experiences and the client will issue a "Lost connection to the server" message,
but the other compilation threads continue to compile remotely just fine.
This commit fixes the issues mentioned above.

Issue: eclipse-openj9#16381

Signed-off-by: Marius Pirvu <mpirvu@ca.ibm.com>
@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 30, 2022

Example of messages when the server dies while the client compiles remotely:

#JITServer: Client sending compReq seqNo=256 to server for method java/lang/invoke/MethodType.hashCode()I @ cold.
 (cold) Compiling java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String;  OrdinaryMethod j9m=000000000008E420 remote t=1098 compThreadID=1 memLimit=262144 KB freePhysicalMemory=6165 MB
#JITServer: Client sending compReq seqNo=257 to server for method java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String; @ cold.
#FAILURE:  JITServer::StreamFailure: JITServer I/O error: read error: connection closed by peer for java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String; @ cold
#FAILURE:  JITServer::StreamFailure: JITServer I/O error: read error: connection closed by peer for java/lang/invoke/MethodType.hashCode()I @ cold
#JITServer: compThreadID=1 JITServer StreamFailure: Generic stream failure
#JITServer: compThreadID=0 JITServer StreamFailure: Generic stream failure
! (cold) java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String; Q_SZ=67 Q_SZI=67 QW=462 j9m=000000000008E420 time=5923us compilationStreamFailure memLimit=262144 KB freePhysicalMemory=6208 MB mem=[region=128 system=2048]KB compThreadID=1
! (cold) java/lang/invoke/MethodType.hashCode()I Q_SZ=67 Q_SZI=67 QW=462 j9m=000000000007E628 time=6102us compilationStreamFailure memLimit=262144 KB freePhysicalMemory=6208 MB mem=[region=64 system=2048]KB compThreadID=0
 (cold) Compiling java/lang/invoke/MethodType.hashCode()I  OrdinaryMethod j9m=000000000007E628 remote t=1108 compThreadID=0 memLimit=262144 KB freePhysicalMemory=6208 MB
 (cold) Compiling java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String;  OrdinaryMethod j9m=000000000008E420 remote t=1108 compThreadID=1 memLimit=262144 KB freePhysicalMemory=6208 MB
#FAILURE:  JITServer::StreamFailure: Connect failed: Connection refused for java/lang/invoke/MethodType.hashCode()I @ cold
#JITServer: t=  1108 Lost connection to the server (serverUID=13805847858514431150). Next attempt in 2000 ms.
#JITServer: t=  1108 Resetting activation policy to AGGRESSIVE because client has lost connection to server
#FAILURE:  JITServer::StreamFailure: Connect failed: Connection refused for java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String; @ cold
#JITServer: t=  1108 Could not connect to a server. Next attempt in 2000 ms.
#JITServer: t=  1108 Resetting activation policy to AGGRESSIVE because client has lost connection to server
#JITServer: compThreadID=0 JITServer StreamFailure: Generic stream failure
! (cold) java/lang/invoke/MethodType.hashCode()I Q_SZ=67 Q_SZI=67 QW=462 j9m=000000000007E628 time=403us compilationStreamFailure memLimit=262144 KB freePhysicalMemory=6208 MB mem=[region=64 system=2048]KB compThreadID=0
#JITServer: compThreadID=1 JITServer StreamFailure: Generic stream failure
! (cold) java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String; Q_SZ=67 Q_SZI=67 QW=462 j9m=000000000008E420 time=441us compilationStreamFailure memLimit=262144 KB freePhysicalMemory=6208 MB mem=[region=64 system=2048]KB compThreadID=1
 (cold) Compiling java/lang/invoke/MethodType.hashCode()I  OrdinaryMethod j9m=000000000007E628  t=1108 compThreadID=0 memLimit=262144 KB freePhysicalMemory=6208 MB
 (cold) Compiling java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String;  OrdinaryMethod j9m=000000000008E420  t=1108 compThreadID=1 memLimit=262144 KB freePhysicalMemory=6208 MB
+ (cold) java/lang/invoke/MethodType.hashCode()I @ 00007FE194F29260-00007FE194F2963B OrdinaryMethod - Q_SZ=67 Q_SZI=67 QW=462 j9m=000000000007E628 bcsz=68 GCR time=3573us mem=[region=1472 system=2048]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=157% queueTime=1067706us
 (cold) Compiling java/lang/invoke/MethodType.intern()Ljava/lang/invoke/MethodType;  OrdinaryMethod j9m=000000000007E748  t=1108 compThreadID=0 memLimit=262144 KB freePhysicalMemory=6208 MB
+ (cold) java/lang/invoke/MethodTypeHelper.getBytecodeStringName(Ljava/lang/Class;)Ljava/lang/String; @ 00007FE194F29680-00007FE194F29B24 OrdinaryMethod - Q_SZ=66 Q_SZI=66 QW=450 j9m=000000000008E420 bcsz=181 GCR time=4588us mem=[region=1408 system=2048]KB compThreadID=1 CpuLoad=0%(0%avg) JvmCpu=157% queueTime=1049227us

Note that after the read error the client closes its socket and retries the same compilation remotely. THis is done because sometimes the networking error is cleared very fast. However, in this example the server is dead so the client fails to connect to the server and displays the "Lost connection" message. The compilation is tried again, this time locally.

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Nov 30, 2022
@mpirvu mpirvu added this to In progress in JIT as a Service via automation Nov 30, 2022
@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 30, 2022

@dsouzai Could you please review/merge this PR? Thanks

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 30, 2022

jenkins test sanity xlinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Nov 30, 2022

I had an experiment where the server threw an exception on purpose when a message with a certain seqNo is received. The client aborted the compilation, tried it again remotely, and the server finished successfully.

 (cold) Compiling java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle;  OrdinaryMethod j9m=000000000013CDB8 remote t=2114 compThreadID=1 memLimit=262144 KB freePhysicalMemory=5769 MB
#JITServer: Client sending compReq seqNo=501 to server for method java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle; @ cold.
#FAILURE:  JITServer::StreamFailure: JITServer I/O error: read error: connection closed by peer for java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle; @ cold
#JITServer: compThreadID=1 JITServer StreamFailure: Generic stream failure
! (cold) java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle; Q_SZ=435 Q_SZI=435 QW=1628 j9m=000000000013CDB8 time=877us compilationStreamFailure memLimit=262144 KB freePhysicalMemory=5769 MB mem=[region=64 system=2048]KB compThreadID=1
 (cold) Compiling java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle;  OrdinaryMethod j9m=000000000013CDB8 remote t=2114 compThreadID=1 memLimit=262144 KB freePhysicalMemory=5769 MB
#JITServer: Client sending compReq seqNo=502 to server for method java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle; @ cold.
#JITServer: Client successfully loaded method java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle; @ cold following compilation request. [metaData=00007F34D8CE1378, startPC=00007F34DA092888]
+ (cold) java/lang/invoke/HandleCache.getMethodFromPerClassCache(Ljava/util/Map;Ljava/lang/String;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle; @ 00007F34DA092888-00007F34DA092980 OrdinaryMethod - Q_SZ=435 Q_SZI=435 QW=1624 j9m=000000000013CDB8 bcsz=8 remote time=6620us mem=[region=64 system=2048]KB compThreadID=1 CpuLoad=0%(0%avg) JvmCpu=144% queueTime=1367380us

This behavior is better than the previous one where the client would disconnect and try connect after 2 seconds.

@dsouzai dsouzai self-assigned this Nov 30, 2022
@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 1, 2022

5 failures:
cmdLineTester_jvmtitests_debug_5
cmdLineTester_jvmtitests_debug_10
cmdLineTester_jvmtitests_debug_11
cmdLineTester_jvmtitests_debug_openj9_none_SCC_5
cmdLineTester_jvmtitests_debug_openj9_none_SCC_10

Again, a bunch of FSD mode failures and I am sure that this PR couldn't have caused them.

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 1, 2022

10 grinders of each failed target shows no problems whatsoever: https://openj9-jenkins.osuosl.org/job/Grinder/1570/

@dsouzai
Copy link
Contributor

dsouzai commented Dec 1, 2022

@mpirvu are the failures related to #14704 maybe?

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 1, 2022

@mpirvu are the failures related to #14704 maybe?

Quite possibly, though I don't understand why they show up so often nowadays in "jenkins test sanity". I used to do such testing over the summer/fall without hitting them.
@cjjdespres Do any jvmtitests_debug failures appear in nightly testing?

@cjjdespres
Copy link
Contributor

I haven't been recording the newest instances of #14704, but such failures do still occur in the nightly tests. I think the most recent test failure in one of the jvmtitests_debug tests specifically was https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal/449/ a couple of weeks ago.

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 1, 2022

jenkins test sanity plinuxjit zlinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 1, 2022

jenkins test sanity plinuxjit,zlinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 1, 2022

Started grinder of 100 runs of cmdLineTester_jvmtitests_debug_10 here https://openj9-jenkins.osuosl.org/job/Grinder/1578/

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 1, 2022

It only ran once in grinder and then it hit an infra issue:

12:34:52  TESTING:
Creating placeholder flownodes because failed loading originals.
java.io.IOException: Tried to load head FlowNodes for execution Owner[Grinder_testList_0/185:Grinder_testList_0 #185] but FlowNode was not found in storage for head id:FlowNodeId 1:223
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.initializeStorage(CpsFlowExecution.java:679)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onLoad(CpsFlowExecution.java:716)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.getExecution(WorkflowRun.java:701)
	at org.jenkinsci.plugins.workflow.job.WorkflowRun.onLoad(WorkflowRun.java:560)

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 2, 2022

100 grinder runs for cmdLineTester_jvmtitests_debug_10 passed: https://openj9-jenkins.osuosl.org/job/Grinder/1589/

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 2, 2022

Tests on Z/P failed due to infra

Creating placeholder flownodes because failed loading originals.
java.io.IOException: Tried to load head FlowNodes for execution Owner[Pipeline_Build_Test_JDK17_ppc64le_linux_jit/81:Pipeline_Build_Test_JDK17_ppc64le_linux_jit #81] but FlowNode was not found in storage for head id:FlowNodeId 1:72

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 2, 2022

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk17

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 3, 2022

Test on P and Z passed. Tests on x86 timed-out in cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3 which is something we see from time to time:

 ===============================================
16:05:44  Running test cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3 ...
16:05:44  ===============================================
16:05:44  cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3 Start Time: Fri Dec  2 17:05:43 2022 Epoch Time (ms): 1670015143572
16:05:44  variation: Mode112
16:05:44  JVM_OPTIONS: -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs 
16:05:44  { \
16:05:44  echo "";	echo "TEST SETUP:"; \
16:05:44  "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done"; \
16:05:44  mkdir -p "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../TKG/output_16700097512952/cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3"; \
16:05:44  cd "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../TKG/output_16700097512952/cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3"; \
16:05:44  echo "";	echo "TESTING:"; \
16:05:44  export LD_LIBRARY_PATH="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/openjdk-test-image/openj9:/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/../lib/default:/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/../lib/j9vm:"; \
16:05:44  "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs  -Xshareclasses:none \
16:05:44  -DTEST_ROOT="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests" \
16:05:44  -DJAR="/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitest.jar:/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../TKG/lib/asm-all.jar" \
16:05:44  -DEXE='"/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/java" -XX:+UseJITServer -Xgcpolicy:gencon -Xjit:count=0 -Xnocompressedrefs  -Xdump ' \
16:05:44  -DEXTRA_Add_OPEN_OPTION='--add-opens=java.base/java.lang.reflect=ALL-UNNAMED' \
16:05:44  -jar "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdline_options_tester/cmdlinetester.jar" \
16:05:44  -config "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitests_hcr.xml" \
16:05:44  -explainExcludes -xids all,linux_x86-64 -xlist "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/jvmtitests/jvmtitests_excludes_17.xml" -nonZeroExitWhenError; \
16:05:44  if [ $? -eq 0 ]; then echo "-----------------------------------"; echo "cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3""_PASSED"; echo "-----------------------------------"; cd /home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/..; rm -f -r "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../TKG/output_16700097512952/cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3"; else echo "-----------------------------------"; echo "cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3""_FAILED"; echo "-----------------------------------"; fi; \
16:05:44  echo "";	echo "TEST TEARDOWN:"; \
16:05:44  "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done"; \
16:05:44   } 2>&1 | tee -a "/home/jenkins/workspace/Test_openjdk17_j9_sanity.functional_x86-64_linux_jit_Personal_testList_1/aqa-tests/TKG/../TKG/output_16700097512952/TestTargetResult";
16:05:44  
16:05:44  TEST SETUP:
16:05:44  JVMSHRC005I No shared class caches available
16:05:44  JVMSHRC005I No shared class caches available
16:05:44  cache cleanup done
16:05:44  
16:05:44  TESTING:
00:22:14  Cancelling nested steps due to timeout
00:22:14  Sending interrupt signal to process
00:22:16  Terminated
00:22:16  -----------------------------------
00:22:16  cmdLineTester_jvmtitests_hcr_openi9_none_SCC_3_FAILED
00:22:16  -----------------------------------

@mpirvu
Copy link
Contributor Author

mpirvu commented Dec 5, 2022

Since the error showing on xlinux are not generated by this PR, I think that this PR can be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants