Enable JITServer post-restore only if explicitly specified #17205

dsouzai · 2023-04-17T16:00:28Z

This PR updates the how a JVM Client will connect to a JITServer in the context of CRIU as outlined in the following table; ✅ means the JVM will connect to a JITServer instance and ❌ means it won't.

	Non-Portable CRIU Pre-Checkpoint	Non-Portable CRIU Post-Restore	Portable CRIU Pre-Checkpoint	Portable CRIU Post-Restore
No Options Pre-Checkpoint; No Options Post-Restore	❌	❌	❌	❌
No Options Pre-checkpoint; `-XX:+UseJITServer` Post-Restore	❌	✅	❌	✅
`-XX:+UseJITServer` Pre-Checkpoint; No Options Post-Restore	❌	✅	✅	✅
`-XX:+UseJITServer` Pre-Checkpoint; `-XX:-UseJITServer` Post-Restore	❌	❌	✅	❌
`-XX:-UseJITServer` Pre-Checkpoint; `-XX:+UseJITServer` Post-Restore	❌	❌	❌	❌

This PR also adds cmdLineTester tests for jitserver, both by itself and in the context of CRIU.

dsouzai · 2023-04-17T16:00:59Z

@mpirvu could you please review?

@llxia could you please review the tests?

dsouzai · 2023-04-17T16:06:43Z

test/functional/cmdLineTests/jitserver/jitserverScript.sh

+kill -9 $JITSERVER_PID
+# Running pkill seems to cause a hang...
+#pkill -9 -xf "$TEST_JDK_BIN/jitserver $JITSERVER_OPTIONS"


Something about how the cmdLineTester tokenizes the args passed to this script causes it to hang at this point if I use pkill -xf (when I run this script manually there's no issue with pkill -xf). As such, I just stuck with kill.

dsouzai · 2023-04-17T16:09:14Z

test/functional/cmdLineTests/jitserver/jitserverconfig.sh

+
+random_port () {
+    RANDOM_PORT=$(($(($RANDOM%$DIFF))+$START_PORT))
+    out=$(lsof -i -P -n | grep LISTEN | grep $RANDOM_PORT)


When I was looking for this online, all examples use sudo, I suppose to list out every single port. However, I'm pretty sure that's not needed here since the ports we're searching should only be used by processes that don't have elevated privileges.

mpirvu · 2023-04-18T16:23:28Z

I would like to better understand the goal of these changes. Here's my take:

Non-portable restore mode (the default)
There is only one checkpoint/restore operation and options are parsed once at JVM bootstrap and second time immediately after restore. Before checkpoint: unless explicitly disabled, the JVM will be setup as a client, but no remote compilations will take place. After a restore: remote compilations will take place only if the user specifies -XX:+UseJITServer
It's debatable whether we want remote compilations post restore, if the user already specified -XX:+UseJITServer at JVM bootstrap.

Portable restore mode
There could be several checkpoint/restore operations and options are only processed at JVM bootstrap.
If the user has specified -XX:+UseJITServer, the JVM will work in client mode all the time performing remote compilations. There is no clear notion of "before checkpoint" or "after restore" since there could be several of such events.
If the user did not specify -XX:+UseJITServer (or if it used -XX:-UseJITServer) at JVM bootstrap, the JVM should not work in client mode at any given time. This is consistent with the fact that the user has to opt in to use JITServer tech.

dsouzai · 2023-04-18T17:47:23Z

I would like to better understand the goal of these changes. Here's my take:

Non-portable restore mode (the default)
There is only one checkpoint/restore operation and options are parsed once at JVM bootstrap and second time immediately after restore. Before checkpoint: unless explicitly disabled, the JVM will be setup as a client, but no remote compilations will take place. After a restore: remote compilations will take place only if the user specifies -XX:+UseJITServer

Yeah this is all accurate.

It's debatable whether we want remote compilations post restore, if the user already specified -XX:+UseJITServer at JVM bootstrap.

Ah yeah this is something that's not currently handled. I think if a user specified -XX:+UseJITServer at bootstrap, then we should enable it post-restore, because the user has anticipated at build time itself that a jitserver instance will be available at deployment. I will need to add this functionality (and a test for it as well).

Portable restore mode
There could be several checkpoint/restore operations and options are only processed at JVM bootstrap.

You're right in that there could be several checkpoint/restore operations, but there's still going to be the post-restore hook that's called at each restore, and so options will still be processed post-restore.

If the user has specified -XX:+UseJITServer, the JVM will work in client mode all the time performing remote compilations.

Yes, this is correct; this would be the EXPLICIT_CLIENT mode.

There is no clear notion of "before checkpoint" or "after restore" since there could be several of such events.

At the moment I don't think we have a good story for multiple checkpoint/restore points in that we're still going to call the post-restore options processing each time we restore. I think dealing with that is something we're gonna have to think about for options in general.

If the user did not specify -XX:+UseJITServer (or if it used -XX:-UseJITServer) at JVM bootstrap, the JVM should not work in client mode at any given time. This is consistent with the fact that the user has to opt in to use JITServer tech.

Yes this is correct.

mpirvu · 2023-04-18T18:12:14Z

If options are processed after each restore in portable mode, then some users might provide -XX:+UseJITServer as post restore options and expect that to take effect. However, I don't know what will happen if we generate a new client UID after each restore. Maybe it's better to just disable JITServer in this mode (and document the behavior).
In general we should document all the possible combinations and decisions taken. It's hard to keep track of it all.

dsouzai · 2023-04-18T19:42:19Z

If options are processed after each restore in portable mode, then some users might provide -XX:+UseJITServer as post restore options and expect that to take effect. However, I don't know what will happen if we generate a new client UID after each restore.

There's a test for this scenario:
https://github.com/eclipse-openj9/openj9/pull/17205/files#diff-8cca2294b80f1e8bfdae61f3efffc5a9e2676e05d8d6987d3a5ee3ce1457ccc8R116-R139
This portable criu test doesn't explicitly add -XX:+UseJITServer post restore, but because it was specified pre-checkpoint it basically does the same thing. I never ran into any issues even though pre-checkpoint and post-restore there were two different client UIDs generated.

In general we should document all the possible combinations and decisions taken. It's hard to keep track of it all.

Yeah I'll add documentation to this PR, and open another issue to keep track of all the things we need to document on the openj9 docs.

llxia · 2023-04-19T17:30:07Z

Should this test run in the JITAAS test build? In JITAAS test build, the test framework starts jitserver and sets -XX:+UseJITServer to all tests.
https://github.com/adoptium/TKG/blob/79db2ffe07e64a03150a4ec1960e50770be58dab/testEnv.mk#L23-L31

So the above test will have -XX:+UseJITServer set via JVM_OPTIONS when TEST_FLAG=JITAAS.

See criu test in Test_openjdk11_j9_sanity.functional_x86-64_linux_jit as example:

[2023-04-19T08:12:37.743Z] ===============================================
[2023-04-19T08:12:37.743Z] Running test cmdLineTester_criu_nonPortableRestoreJDK11Up_0 ...
[2023-04-19T08:12:37.743Z] ===============================================
[2023-04-19T08:12:37.743Z] cmdLineTester_criu_nonPortableRestoreJDK11Up_0 Start Time: Wed Apr 19 01:12:37 2023 Epoch Time (ms): 1681891957710
[2023-04-19T08:12:38.178Z] variation: -Xjit -XX:+CRIURestoreNonPortableMode
[2023-04-19T08:12:38.178Z] JVM_OPTIONS: -XX:+UseJITServer -Xjit -XX:+CRIURestoreNonPortableMode 
...

dsouzai · 2023-04-19T18:22:13Z

Should this test run in the JITAAS test build?

I don't think so; that's the reason I added the code for the random port. I was essentially following the same principle as

openj9/test/functional/JIT_Test/playlist.xml

Lines 721 to 761 in ea2117e

    
           	<test> 
        
           		<testCaseName>testJITServer</testCaseName> 
        
           		<!-- Variations are passed to the client via the CLIENT_PROGRAM property from $JVM_OPTIONS; neither the test harness nor the server care about these. --> 
        
           		<variations> 
        
           			<variation>Mode610</variation> 
        
           			<variation>Mode610 -Xshareclasses:none -Xjit:optLevel=hot</variation> 
        
           			<variation>Mode610 -Xshareclasses:name=test_jitscc -XX:+JITServerUseAOTCache</variation> 
        
           		</variations> 
        
           		<!-- Check if the JITServer launcher exists and if so start the test and 
        
           			 - specify the executables for the client and server via the CLIENT_EXE and SERVER_EXE properties respectively, 
        
           			 - specify what the client will run via the CLIENT_PROGRAM property. 
        
           			 If the launcher doesn't exist we assume that the build doesn't support JITServer and trivially pass the test. --> 
        
           		<command>if [ -x $(Q)$(TEST_JDK_BIN)$(D)jitserver$(Q) ]; \ 
        
           	then \ 
        
           		$(JAVA_COMMAND) \ 
        
           		-cp $(Q)$(RESOURCES_DIR)$(P)$(TESTNG)$(P)$(TEST_RESROOT)$(D)jitt.jar$(Q) \ 
        
           		-DSERVER_EXE=$(Q)$(TEST_JDK_BIN)$(D)jitserver$(Q) \ 
        
           		-DCLIENT_EXE=$(JAVA_COMMAND) \ 
        
           		-DCLIENT_PROGRAM=$(SQ)$(JVM_OPTIONS) -cp $(RESOURCES_DIR)$(P)$(TESTNG)$(P)$(TEST_RESROOT)$(D)jitt.jar -DjarTesterArgs=$(Q)-loopforever $(TEST_RESROOT)$(D)jitt.jar$(Q) org.testng.TestNG -d $(REPORTDIR)$(D)client $(TEST_RESROOT)$(D)testng.xml -testnames JarTesterTest -groups $(TEST_GROUP) -excludegroups $(DEFAULT_EXCLUDE)$(SQ) \ 
        
           		org.testng.TestNG \ 
        
           		-d $(REPORTDIR) \ 
        
           		$(Q)$(TEST_RESROOT)$(D)testng.xml$(Q) \ 
        
           		-testnames JITServerTest \ 
        
           		-groups $(TEST_GROUP) \ 
        
           		-excludegroups $(DEFAULT_EXCLUDE); \ 
        
           	else \ 
        
           		echo; \ 
        
           		echo $(Q)$(TEST_JDK_BIN)$(D)jitserver doesn't exist; assuming this JDK does not support JITServer and trivially passing the test.$(Q); \ 
        
           	fi; \ 
        
           	$(TEST_STATUS)</command> 
        
           		<platformRequirements>os.linux,^arch.arm,^arch.aarch64,bits.64</platformRequirements> 
        
           		<levels> 
        
           			<level>sanity</level> 
        
           		</levels> 
        
           		<groups> 
        
           			<group>functional</group> 
        
           		</groups> 
        
           		<impls> 
        
           			<impl>openj9</impl> 
        
           		</impls> 
        
           	</test>

which runs even in a non JITAAS test build. It deals with the JITAAS build by using a random port so that it doesn't clash with the port used by the jitserver instance started by the infrastructure.

llxia · 2023-04-20T16:10:53Z

With the current setup, it will run in JITAAS test build. If we want to disable it in JITAAS build, we need to use JITAAS:nonapplicable.

		<features>
			<feature>CRIU:required</feature>
			<feature>JITAAS:nonapplicable</feature>
		</features>

FYI @renfeiw

dsouzai · 2023-04-20T18:54:00Z

I'm ok with it running in a JITAAS build since even the testJITServer test above also runs in a JITAAS build. Unless you think both the new tests added in this PR and testJITServer should be disabled in a JITAAS build?

llxia · 2023-04-21T19:37:45Z

If it is ok to run in JITAAS build, then #17205 (comment) is not needed.

test/functional/cmdLineTests/jitserver/build.xml

llxia

Other than the typo in copyright, the test change lgtm.

dsouzai · 2023-04-24T18:51:06Z

@llxia updated the copyright.

@mpirvu good for review again. I removed the ClientMode enum and opted instead for two bools in the compInfo that are used to determine whether -XX:+UseJITServer was specified at bootstrap, and whether the JVM was allowed to connect to a server pre-checkpoint.

runtime/compiler/control/J9Options.cpp

runtime/compiler/control/OptionsPostRestore.cpp

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>

mpirvu

LGTM

mpirvu · 2023-04-26T22:55:31Z

jenkins test sanity plinux,xlinux,zlinux jdk17

mpirvu · 2023-04-27T02:53:55Z

zlinux failed cmdLineTester_criu_jitserverPostRestore_2

Testing: Check Verbose Log
Test start time: 2023/04/27 00:38:32 Coordinated Universal Time
Running command: cat vlog
Time spent starting: 12 milliseconds
Time spent executing: 13 milliseconds
Test result: FAILED
Output from test:
 [OUT] #CHECKPOINT RESTORE: Ready for restore
>> Required condition was found: [Output match: CHECKPOINT RESTORE: Ready for restore]
>> Success condition was not found: [Output match: Connected to a server]

dsouzai · 2023-04-27T14:21:53Z

Given that cmdLineTester_criu_jitserverPostRestore_0 and cmdLineTester_criu_jitserverPostRestore_1 passed, I think the reason this may have failed is because the jitserver instance failed to launch, perhaps because the port it got was already in use (maybe by a process that the infra couldn't see via lsof). However I'll see if I can reproduce it manually.

The x86 test failed because I guess the error given by curl is different from what the test expects; I'll have to think of a better success condition since the output of curl can change...

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>

dsouzai · 2023-04-27T16:04:09Z

I manually ran the test on two separate zlinux machines and it passed. I think because the restore did succeed but the test didn't see the client connect, it's more than likely that something prevented the jitserver instance from starting up.

The force push should also fix the x86 test failure caused by the change in the curl output.

dsouzai · 2023-04-27T16:04:14Z

jenkins test sanity plinux,xlinux,zlinux jdk17

dsouzai · 2023-04-27T22:51:28Z

@mpirvu looks like all jobs passed.

Sreekala-Gopakumar · 2023-06-30T13:25:31Z

@dsouzai - What do the columns in the table indicate exactly (Non-Portable CRIU Pre-Checkpoint, Non-Portable CRIU Post-Restore, etc.)?

dsouzai · 2023-06-30T13:57:00Z

The table indicates how JITServer behaves pre-checkpoint and post-restore in Portable CRIU Mode and Non-Portable CRIU mode.

So Non-Portable CRIU Pre-Checkpoint and Non-Portable CRIU Post-Restore go together conceptually, and Portable CRIU Pre-Checkpoint and Portable CRIU Post-Restore go together.

dsouzai added comp:test comp:jitserver Artifacts related to JIT-as-a-Service project criu Used to track CRIU snapshot related work labels Apr 17, 2023

dsouzai force-pushed the jitserverCRIU branch from 356f6f1 to 0a59b1c Compare April 17, 2023 16:04

dsouzai commented Apr 17, 2023

View reviewed changes

dsouzai mentioned this pull request Apr 17, 2023

Compiler Support for CRIU #16853

Open

30 tasks

mpirvu self-requested a review April 17, 2023 18:13

llxia reviewed Apr 21, 2023

View reviewed changes

test/functional/cmdLineTests/jitserver/build.xml Outdated Show resolved Hide resolved

llxia approved these changes Apr 21, 2023

View reviewed changes

dsouzai force-pushed the jitserverCRIU branch from 0a59b1c to 8751c58 Compare April 24, 2023 18:47

mpirvu reviewed Apr 25, 2023

View reviewed changes

runtime/compiler/control/J9Options.cpp Outdated Show resolved Hide resolved

runtime/compiler/control/OptionsPostRestore.cpp Outdated Show resolved Hide resolved

dsouzai added 2 commits April 26, 2023 14:52

Enable JITServer post-restore only if explicitly specified

e37345b

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>

Add jitserver postrestore tests

9e821d1

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>

dsouzai force-pushed the jitserverCRIU branch from 8751c58 to 7672c3c Compare April 26, 2023 18:52

mpirvu approved these changes Apr 26, 2023

View reviewed changes

mpirvu self-assigned this Apr 26, 2023

Add jitserver cmdline tests

ab92da8

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>

dsouzai force-pushed the jitserverCRIU branch from 7672c3c to ab92da8 Compare April 27, 2023 16:02

mpirvu merged commit 87bad8e into eclipse-openj9:master Apr 28, 2023

dsouzai mentioned this pull request Apr 28, 2023

CRIU Support Documentation Updates for 0.40 eclipse-openj9/openj9-docs#1092

Closed

dsouzai deleted the jitserverCRIU branch April 3, 2024 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable JITServer post-restore only if explicitly specified #17205

Enable JITServer post-restore only if explicitly specified #17205

dsouzai commented Apr 17, 2023 •

edited

Loading

dsouzai commented Apr 17, 2023

dsouzai Apr 17, 2023

dsouzai Apr 17, 2023

mpirvu commented Apr 18, 2023

dsouzai commented Apr 18, 2023 •

edited

Loading

mpirvu commented Apr 18, 2023

dsouzai commented Apr 18, 2023 •

edited

Loading

llxia commented Apr 19, 2023

dsouzai commented Apr 19, 2023

llxia commented Apr 20, 2023

dsouzai commented Apr 20, 2023

llxia commented Apr 21, 2023

llxia left a comment

dsouzai commented Apr 24, 2023

mpirvu left a comment

mpirvu commented Apr 26, 2023

mpirvu commented Apr 27, 2023

dsouzai commented Apr 27, 2023

dsouzai commented Apr 27, 2023

dsouzai commented Apr 27, 2023

dsouzai commented Apr 27, 2023

Sreekala-Gopakumar commented Jun 30, 2023

dsouzai commented Jun 30, 2023

Enable JITServer post-restore only if explicitly specified #17205

Enable JITServer post-restore only if explicitly specified #17205

Conversation

dsouzai commented Apr 17, 2023 • edited Loading

dsouzai commented Apr 17, 2023

dsouzai Apr 17, 2023

Choose a reason for hiding this comment

dsouzai Apr 17, 2023

Choose a reason for hiding this comment

mpirvu commented Apr 18, 2023

dsouzai commented Apr 18, 2023 • edited Loading

mpirvu commented Apr 18, 2023

dsouzai commented Apr 18, 2023 • edited Loading

llxia commented Apr 19, 2023

dsouzai commented Apr 19, 2023

llxia commented Apr 20, 2023

dsouzai commented Apr 20, 2023

llxia commented Apr 21, 2023

llxia left a comment

Choose a reason for hiding this comment

dsouzai commented Apr 24, 2023

mpirvu left a comment

Choose a reason for hiding this comment

mpirvu commented Apr 26, 2023

mpirvu commented Apr 27, 2023

dsouzai commented Apr 27, 2023

dsouzai commented Apr 27, 2023

dsouzai commented Apr 27, 2023

dsouzai commented Apr 27, 2023

Sreekala-Gopakumar commented Jun 30, 2023

dsouzai commented Jun 30, 2023

dsouzai commented Apr 17, 2023 •

edited

Loading

dsouzai commented Apr 18, 2023 •

edited

Loading

dsouzai commented Apr 18, 2023 •

edited

Loading