Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmdLineTester_criu_jitserverAcrossCheckpoint Success condition was not found: [Output match: Connected to a server] #17324

Open
pshipton opened this issue May 2, 2023 · 8 comments · Fixed by #17313
Assignees
Labels
comp:jit criu Used to track CRIU snapshot related work test failure

Comments

@pshipton
Copy link
Member

pshipton commented May 2, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_0/318/
cmdLineTester_criu_jitserverAcrossCheckpoint_0

Testing: Portable CRIU Mode; Enable JITServer specified Pre-Checkpoint: Check connection in Post-Restore Verbose Log
Test start time: 2023/05/02 19:10:59 Coordinated Universal Time
Running command: cat postRestoreVlog
Time spent starting: 1 milliseconds
Time spent executing: 3 milliseconds
Test result: FAILED
Output from test:
 [OUT] #CHECKPOINT RESTORE: Start and elapsed time: startTime=3722441670, elapsedTime=   806
 [OUT] #CHECKPOINT RESTORE: Reset start and elapsed time: startTime=3722444732, elapsedTime=     0
 [OUT] #CHECKPOINT RESTORE: Ready for restore
 [OUT] #JITServer: Client sending compReq seqNo=149 to server for method openj9/internal/criu/InternalCRIUSupport.isCheckpointAllowedImpl()Z @ cold.
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000BF5900, thread ID 2
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000BF3300, thread ID 1
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000BF7F00, thread ID 3
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000BFA500, thread ID 4
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000BFCA00, thread ID 5
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000BFF000, thread ID 6
 [OUT] #FAILURE:  Interrupting remote compilation (interruptReason 2) in handleServerMessage(getUnloadedClassRangesAndCHTable) for openj9/internal/criu/InternalCRIUSupport.isCheckpointAllowedImpl()Z @ cold
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000BF0D00, thread ID 0
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000C14500, thread ID 15
 [OUT] #JITServer: Sent clientSessionTerminate message
>> Required condition was found: [Output match: CHECKPOINT RESTORE: Ready for restore]
>> Success condition was not found: [Output match: Connected to a server]
@pshipton pshipton added test failure criu Used to track CRIU snapshot related work labels May 2, 2023
@pshipton
Copy link
Member Author

pshipton commented May 2, 2023

@dsouzai is this the same issue as #17311 ?

@dsouzai
Copy link
Contributor

dsouzai commented May 3, 2023

Yeah I think this is the same issue; looks like the shutdown sequence interrupted the compilation thread before it had a chance to connect to the jitserver instance.

@pshipton
Copy link
Member Author

pshipton commented May 7, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk20_j9_sanity.functional_x86-64_linux_OMR_testList_1/12
cmdLineTester_criu_jitserverAcrossCheckpoint_0

Testing: Enable JITServer specified Pre-Checkpoint: Check connection in Post-Restore Verbose Log
Test start time: 2023/05/06 21:15:22 Eastern Standard Time
Running command: cat postRestoreVlog
Time spent starting: 3 milliseconds
Time spent executing: 5 milliseconds
Test result: FAILED
Output from test:
 [OUT] #CHECKPOINT RESTORE: Ready for restore
 [OUT]  (cold) Compiling openj9/internal/criu/InternalCRIUSupport.isCheckpointAllowedImpl()Z  OrdinaryMethod j9m=0000000000241C08  t=868 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) openj9/internal/criu/InternalCRIUSupport.isCheckpointAllowedImpl()Z @ 00007FCB1128F01C-00007FCB1128F174 OrdinaryMethod - Q_SZ=10 Q_SZI=10 QW=11 j9m=0000000000241C08 bcsz=2 JNI time=3480us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=3515us
 [OUT]  (cold) Compiling sun/nio/fs/UnixNativeDispatcher.open0(JII)I  OrdinaryMethod j9m=0000000000154320  t=868 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) sun/nio/fs/UnixNativeDispatcher.open0(JII)I @ 00007FCB1128F19C-00007FCB1128F311 OrdinaryMethod - Q_SZ=9 Q_SZI=9 QW=10 j9m=0000000000154320 bcsz=5 JNI time=870us mem=[region=832 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=3207us
 [OUT]  (cold) Compiling sun/nio/fs/UnixNativeDispatcher.dup(I)I  OrdinaryMethod j9m=00000000001542E0  t=868 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) sun/nio/fs/UnixNativeDispatcher.dup(I)I @ 00007FCB1128F33C-00007FCB1128F499 OrdinaryMethod - Q_SZ=8 Q_SZI=8 QW=9 j9m=00000000001542E0 bcsz=3 JNI time=455us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=3743us
 [OUT]  (cold) Compiling sun/nio/fs/UnixNativeDispatcher.fdopendir(I)J  OrdinaryMethod j9m=0000000000154A80  t=868 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] #CHECKPOINT RESTORE: Start and elapsed time: startTime=4089895363, elapsedTime=   868
 [OUT] #CHECKPOINT RESTORE: Reset start and elapsed time: startTime=4089898442, elapsedTime=     0
 [OUT] #PERF:  t=10	Sampling thread changed state to IDLE and frequency to 1000 ms
 [OUT] + (cold) sun/nio/fs/UnixNativeDispatcher.fdopendir(I)J @ 00007FCB1128F4BC-00007FCB1128F619 OrdinaryMethod - Q_SZ=7 Q_SZI=7 QW=8 j9m=0000000000154A80 bcsz=3 JNI time=450us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=4301us
 [OUT]  (cold) Compiling java/io/UnixFileSystem.delete0(Ljava/io/File;)Z  OrdinaryMethod j9m=0000000000117D88  t=10 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) java/io/UnixFileSystem.delete0(Ljava/io/File;)Z @ 00007FCB1128F63C-00007FCB1128F7BB OrdinaryMethod - Q_SZ=6 Q_SZI=6 QW=7 j9m=0000000000117D88 bcsz=3 JNI time=555us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=4699us
 [OUT]  (cold) Compiling openj9/internal/tools/attach/target/FileLock.lockFileImpl(Ljava/lang/String;IZ)J  OrdinaryMethod j9m=00000000001DC778  t=10 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) openj9/internal/tools/attach/target/FileLock.lockFileImpl(Ljava/lang/String;IZ)J @ 00007FCB1128F7DC-00007FCB1128F966 OrdinaryMethod - Q_SZ=5 Q_SZI=5 QW=6 j9m=00000000001DC778 bcsz=5 JNI time=483us mem=[region=832 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=5184us
 [OUT]  (cold) Compiling sun/nio/fs/UnixNativeDispatcher.readdir0(J)[B  OrdinaryMethod j9m=0000000000154AE0  t=10 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) sun/nio/fs/UnixNativeDispatcher.readdir0(J)[B @ 00007FCB1128F99C-00007FCB1128FB03 OrdinaryMethod - Q_SZ=4 Q_SZI=4 QW=5 j9m=0000000000154AE0 bcsz=3 JNI time=554us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=5809us
 [OUT]  (cold) Compiling openj9/internal/tools/attach/target/IPC.openSemaphore(Ljava/lang/String;Ljava/lang/String;Z)I  OrdinaryMethod j9m=0000000000185118  t=10 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) openj9/internal/tools/attach/target/IPC.openSemaphore(Ljava/lang/String;Ljava/lang/String;Z)I @ 00007FCB1128FB3C-00007FCB1128FCD5 OrdinaryMethod - Q_SZ=3 Q_SZI=3 QW=4 j9m=0000000000185118 bcsz=5 JNI time=553us mem=[region=832 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=6257us
 [OUT]  (cold) Compiling openj9/internal/tools/attach/target/FileLock.unlockFileImpl(J)I  OrdinaryMethod j9m=00000000001DC798  t=10 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) openj9/internal/tools/attach/target/FileLock.unlockFileImpl(J)I @ 00007FCB1128FCFC-00007FCB1128FE5B OrdinaryMethod - Q_SZ=2 Q_SZI=2 QW=3 j9m=00000000001DC798 bcsz=3 JNI time=449us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=6581us
 [OUT]  (cold) Compiling java/io/UnixFileSystem.list0(Ljava/io/File;)[Ljava/lang/String;  OrdinaryMethod j9m=0000000000117DC8  t=10 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) java/io/UnixFileSystem.list0(Ljava/io/File;)[Ljava/lang/String; @ 00007FCB1128FE7C-00007FCB1128FFFD OrdinaryMethod - Q_SZ=1 Q_SZI=1 QW=2 j9m=0000000000117DC8 bcsz=3 JNI time=477us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=6779us
 [OUT]  (cold) Compiling openj9/internal/tools/attach/target/IPC.waitSemaphore()I  OrdinaryMethod j9m=0000000000185138  t=10 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) openj9/internal/tools/attach/target/IPC.waitSemaphore()I @ 00007FCB1129001C-00007FCB1129016E OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000185138 bcsz=2 JNI time=437us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=6588us
 [OUT] #PERF:  t=13	Sampling thread interrupted and changed state to DEFAULT and frequency to 2 ms due to comp req
 [OUT]  (cold) Compiling sun/nio/fs/UnixNativeDispatcher.closedir(J)V  OrdinaryMethod j9m=0000000000154AA0  t=13 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) sun/nio/fs/UnixNativeDispatcher.closedir(J)V @ 00007FCB1129019C-00007FCB112902F5 OrdinaryMethod - Q_SZ=1 Q_SZI=1 QW=2 j9m=0000000000154AA0 bcsz=3 JNI time=473us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=619us
 [OUT]  (cold) Compiling sun/nio/fs/UnixNativeDispatcher.close0(I)V  OrdinaryMethod j9m=00000000001543A0  t=13 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) sun/nio/fs/UnixNativeDispatcher.close0(I)V @ 00007FCB1129031C-00007FCB11290473 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=00000000001543A0 bcsz=3 JNI time=472us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=168% queueTime=1140us
 [OUT] #INFO:  t=23 selectiveNoOptServer feature turned off
 [OUT] #JITSTATE:  t=  1037 VM changed state to NOT_STARTUP
 [OUT] #JITSTATE:  t=  1037 Changing maxIProfilingCount to 3000
 [OUT] #JITSTATE:  t=  1037 JIT changed state from STARTUP   to IDLE      cSmpl=  0 iSmpl=  5 comp= 40 recomp=  0, Q_SZ=  0 CLP=OFF jvmCPU=0%
 [OUT]  (cold) Compiling jdk/internal/misc/Unsafe.freeMemory0(J)V  OrdinaryMethod j9m=0000000000070420  t=2006 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) jdk/internal/misc/Unsafe.freeMemory0(J)V @ 00007FCB1129049C-00007FCB112905FA OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000070420 bcsz=3 JNI time=1526us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=83% queueTime=2543us
 [OUT]  (cold) Compiling openj9/internal/tools/attach/target/IPC.notifyVm(Ljava/lang/String;Ljava/lang/String;IZ)I  OrdinaryMethod j9m=0000000000185158  t=2026 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) openj9/internal/tools/attach/target/IPC.notifyVm(Ljava/lang/String;Ljava/lang/String;IZ)I @ 00007FCB1129061C-00007FCB112907C0 OrdinaryMethod - Q_SZ=1 Q_SZI=1 QW=2 j9m=0000000000185158 bcsz=6 JNI time=815us mem=[region=832 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=83% queueTime=972us
 [OUT]  (cold) Compiling openj9/internal/tools/attach/target/IPC.closeSemaphore()V  OrdinaryMethod j9m=0000000000185198  t=2026 compThreadID=0 memLimit=262144 KB freePhysicalMemory=7169 MB
 [OUT] + (cold) openj9/internal/tools/attach/target/IPC.closeSemaphore()V @ 00007FCB112907DC-00007FCB11290928 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=1 j9m=0000000000185198 bcsz=2 JNI time=523us mem=[region=704 system=16384]KB compThreadID=0 CpuLoad=0%(0%avg) JvmCpu=83% queueTime=711us
 [OUT] #PERF:  Time spent in compilation thread =0 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000026300, thread ID 3
 [OUT] #PERF:  Time spent in compilation thread =0 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000029F00, thread ID 4
 [OUT] #PERF:  Time spent in compilation thread =0 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000031900, thread ID 6
 [OUT] #PERF:  Time spent in compilation thread =0 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000022600, thread ID 2
 [OUT] #PERF:  Time spent in compilation thread =14 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 000000000001AC00, thread ID 0
 [OUT] #PERF:  Time spent in compilation thread =0 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 000000000001E900, thread ID 1
 [OUT] #PERF:  Time spent in compilation thread =0 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 000000000002DC00, thread ID 5
 [OUT] #PERF:  Time spent in compilation thread =0 ms
 [OUT] #INFO:  Stopping compilation thread, vmThread pointer 0000000000053C00, thread ID 15
>> Required condition was found: [Output match: CHECKPOINT RESTORE: Ready for restore]
>> Success condition was not found: [Output match: Connected to a server]

@pshipton pshipton reopened this May 7, 2023
@mpirvu
Copy link
Contributor

mpirvu commented May 8, 2023

@dsouzai After #17313 shouldn't the output show that the server is still alive?

@dsouzai
Copy link
Contributor

dsouzai commented May 9, 2023

Not in this case; this test is sort of a multi-test test, namely:

[2023-05-07T01:15:16.640Z] Testing: Enable JITServer specified Pre-Checkpoint but not explicitly enabled Post-Restore
[2023-05-07T01:15:16.640Z] Test start time: 2023/05/06 21:15:15 Eastern Standard Time
[2023-05-07T01:15:16.640Z] Running command: bash /home/jenkins/workspace/Test_openjdk20_j9_sanity.functional_x86-64_linux_OMR_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu/criuJitServerScript.sh /home/jenkins/workspace/Test_openjdk20_j9_sanity.functional_x86-64_linux_OMR_testList_1/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/criu /home/jenkins/workspace/Test_openjdk20_j9_sanity.functional_x86-64_linux_OMR_testList_1/openjdkbinary/j2sdk-image/bin "  -XX:+UseJITServer -Xjit:verbose={compilePerformance},verbose={JITServer},verbose={JITServerConns},vlog=preCheckpointVlog" org.openj9.criu.OptionsFileTest "JitOptionsTest -Xjit:verbose={compilePerformance},verbose={CheckpointRestore},verbose={JITServer},verbose={JITServerConns},vlog=postRestoreVlog" 1 false
[2023-05-07T01:15:16.640Z] Time spent starting: 6 milliseconds
[2023-05-07T01:15:24.141Z] Time spent executing: 7282 milliseconds
[2023-05-07T01:15:24.141Z] Test result: PASSED
[2023-05-07T01:15:24.141Z] 
[2023-05-07T01:15:24.141Z] Testing: Enable JITServer specified Pre-Checkpoint: Check no connection in Pre-Checkpoint Verbose Log
[2023-05-07T01:15:24.141Z] Test start time: 2023/05/06 21:15:22 Eastern Standard Time
[2023-05-07T01:15:24.141Z] Running command: cat preCheckpointVlog
[2023-05-07T01:15:24.141Z] Time spent starting: 5 milliseconds
[2023-05-07T01:15:24.141Z] Time spent executing: 35 milliseconds
[2023-05-07T01:15:24.141Z] Test result: PASSED
[2023-05-07T01:15:24.141Z] 
[2023-05-07T01:15:24.141Z] Testing: Enable JITServer specified Pre-Checkpoint: Check connection in Post-Restore Verbose Log
[2023-05-07T01:15:24.141Z] Test start time: 2023/05/06 21:15:22 Eastern Standard Time
[2023-05-07T01:15:24.141Z] Running command: cat postRestoreVlog
[2023-05-07T01:15:24.141Z] Time spent starting: 3 milliseconds
[2023-05-07T01:15:24.141Z] Time spent executing: 5 milliseconds
[2023-05-07T01:15:24.141Z] Test result: FAILED

This set of tests works in the following way:

  1. Enable JITServer specified Pre-Checkpoint but not explicitly enabled Post-Restore launches the criuJitServerScript.sh script which starts the jitserver instance, runs the org.openj9.criu.OptionsFileTest, and then stops the jitserver instance.
  2. Enable JITServer specified Pre-Checkpoint: Check no connection in Pre-Checkpoint checks the vlog generated pre-checkpoint
  3. Enable JITServer specified Pre-Checkpoint: Check connection in Post-Restore Verbose Log checks the vlog generated post-restore

Because these tests involve both jitserver and criu, it's kind of tricky to test the state of things pre-checkpoint and post-restore.

All that said, I'm not really sure what's going on here. If the jitserver instance didn't exist, I would've expected to see a message in the post-restore vlog that showed a failed connection. However, I don't see any such message, which implies that the JVM didn't even bother trying to connect. I'm not sure something like this wouldn't have shown up in the PR tests unless there's something specific to x86. EDIT: not sure that's a valid argument because literally every other build (including x86) passed.

@pshipton
Copy link
Member Author

pshipton commented Jun 5, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_0/336
cmdLineTester_criu_jitserverPostRestore_1

Testing: Check Verbose Log
Test start time: 2023/06/03 20:35:40 Eastern Standard Time
Running command: cat vlog
Time spent starting: 1 milliseconds
Time spent executing: 4 milliseconds
Test result: FAILED
Output from test:
 [ERR] cat: vlog: No such file or directory
>> Required condition was not found: [Output match: CHECKPOINT RESTORE: Ready for restore]
>> Success condition was not found: [Output match: Connected to a server]

@dsouzai
Copy link
Contributor

dsouzai commented Jun 5, 2023

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_s390x_linux_OMR_testList_0/336 cmdLineTester_criu_jitserverPostRestore_1

Testing: Check Verbose Log
Test start time: 2023/06/03 20:35:40 Eastern Standard Time
Running command: cat vlog
Time spent starting: 1 milliseconds
Time spent executing: 4 milliseconds
Test result: FAILED
Output from test:
 [ERR] cat: vlog: No such file or directory
>> Required condition was not found: [Output match: CHECKPOINT RESTORE: Ready for restore]
>> Success condition was not found: [Output match: Connected to a server]

I believe this is another instance of #17367 (since the test before the cat vlog "test" passed).

@dchopra001
Copy link
Contributor

[2023-06-07T19:07:49.299Z] Testing: Check Connection in Post-Restore Verbose Log
[2023-06-07T19:07:49.299Z] Test start time: 2023/06/07 16:07:47 Atlantic Standard Time
[2023-06-07T19:07:49.299Z] Running command: cat postRestoreVlog
[2023-06-07T19:07:49.299Z] Time spent starting: 2 milliseconds
[2023-06-07T19:07:49.299Z] Time spent executing: 4 milliseconds
[2023-06-07T19:07:49.299Z] Test result: FAILED
[2023-06-07T19:07:49.299Z] Output from test:
[2023-06-07T19:07:49.299Z]  [ERR] cat: postRestoreVlog: No such file or directory
[2023-06-07T19:07:49.299Z] >> Required condition was not found: [Output match: CHECKPOINT RESTORE: Ready for restore]
[2023-06-07T19:07:49.299Z] >> Success condition was not found: [Output match: Connected to a server]
[2023-06-07T19:07:49.299Z] 

Seeing a failure in my personal build that look related to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jit criu Used to track CRIU snapshot related work test failure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants