New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JIT System.nanoTime() in checkpoint mode #15609
Fix JIT System.nanoTime() in checkpoint mode #15609
Conversation
@r30shah can you look at the Z changes in particular? |
@0xdaryl can you or someone else familiar with the x86 codegen look at that part? |
@@ -9240,6 +9240,17 @@ inlineNanoTime( | |||
// result = reg + result | |||
generateRegMemInstruction(TR::InstOpCode::LEA8RegMem, node, result, generateX86MemoryReference(reg, result, 0, cg), cg); | |||
|
|||
#if defined(J9VM_OPT_CRIU_SUPPORT) | |||
if (fej9->inSnapshotMode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a correct check to determine if nano time adjustment is required.
openj9/runtime/compiler/env/VMJ9.cpp
Lines 9345 to 9348 in a57e6d9
TR_J9VMBase::inSnapshotMode() | |
{ | |
#if defined(J9VM_OPT_CRIU_SUPPORT) | |
return getJ9JITConfig()->javaVM->internalVMFunctions->isCheckpointAllowed(vmThread()); |
isCheckpointAllowed()
is TRUE
before a checkpoint taken in which nanoTimeMonotonicClockDelta
is zero and no adjustment required though it won't hurt to do so.However
isCheckpointAllowed()
is FALSE
after a checkpoint taken at CRIURestoreNonPortableMode
which is a main mode in use, though a checkpoint is not allowed nanoTimeMonotonicClockDelta
is not zero after a checkpoint, and adjustment is needed.I think you could always apply
result = result - nanoTimeMonotonicClockDelta
, or retrieve nanoTimeMonotonicClockDelta
value first, only do the subtraction if it is non-zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, you're right. I assumed it was true on both sides of the C/R.
I'm going to rework the JIT queries to add one that uses isCRIUSupportEnabled
(and make it clearer as to what question each query is answering) instead of looking at the the value of nanoTimeMonotonicClockDelta
because we'll likely have other cases where we need to generate C/R-aware code after restore.
I modified the tests to have separate test cases for pre and post C/R compilation and the post one catches this bug.
@@ -85,6 +85,7 @@ | |||
<variation>-Xjit:count=0 -XX:+CRIURestoreNonPortableMode</variation> | |||
</variations> | |||
<command> | |||
TR_Options=$(Q)exclude={org/openj9/criu/TimeChangeTest.nanoTimeInt()J},dontInline={org/openj9/criu/TimeChangeTest.nanoTimeInt()J|org/openj9/criu/TimeChangeTest.nanoTimeJit()J}$(Q) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest move the JIT option into the variation section above, there is a mode -Xint
apparently conflict with this TR_Options
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TR_Options
should not conflict with -Xint
because the JIT will not be initialized and the env var will be ignored by the rest of the components (unless someone decides to use it for other purposes).
Initially I tried setting these in the variations section, but the braces need to be quoted or escaped when being passed through the shell and every combination I tried failed.
I think it's probably better to use the current approach anyhow because someone adding a variation in the future will need to be aware of whether or not the JIT and/or AOT compilers are enabled by their options and duplicate the string, otherwise the tests will be ineffective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@llxia any insights to add the command line option #15609 (comment) into variation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is easier to put in command in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears there was a problem to do so - the braces need to be quoted or escaped when being passed through the shell and every combination I tried failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some offline discussion with Lan, the problem is between cmdlinetester
and criuScript.sh
, an additional option along w/ a separate playlist test target might be required to accommodate such JIT options hence out of scope of this PR.
There is no conflict settings introduced by TR_Options
in this testcase, current approach is fine.
This query tells us whether snapshot mode is enabled, which is true for life of the process, regardless of whether any snapshots/checkpoints or restores have happened. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
The x86 codegen inlines a direct call to clock_gettime() in place of calls to System.nanoTime(). In checkpoint mode the JVM adjusts the value returned by System.nanoTime() by subtracting any elapsed time between checkpoint and restore. The JIT implementation therefore needs to do the same. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
The getSupportsCurrentTimeMaxPrecision() query is currently used by Z to indicate to the Simplifier that it supports transforming System.nanoTime() calls into calls to currentTimeMaxPrecision(). The getSupportsMaxPrecisionMilliTime() query is used similarly for transforming System.currentTimeMillis(), but only if getSupportsCurrentTimeMaxPrecision() returns true as well. This means that we can't selectively disable the nanoTime() transformation without disabling the currentTimeMillis() one, but we can do the reverse. This patch changes Simplifier to check getSupportsCurrentTimeMaxPrecision() for nanoTime() and getSupportsMaxPrecisionMilliTime() for currentTimeMillis() to allow each transformation to be enabled independently of the other. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
In checkpoint mode the generated code needs to subtract the value of nanoTimeMonotonicClockDelta from the result. For now, disable this optimization. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
System.nanoTime() needs special handling in checkpoint mode. Since the JIT has an optimized version on some platforms it needs testing to verify that it is behaving as expected. This test case checks the JIT against the non-JIT implementation in the simple scenario where the reference clock moves forward between checkpoint and restore. It does not handle the reverse case, since there's no easy way to set that up. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
Compiled methods containing System.nanoTime() calls need to be checkpoint/restore aware regardless of whether or not future checkpoints are still possible at compile time (unlike other cases that only need to be compiled aware if future checkpoints are possible and can be compiled conventionally otherwise). Split the test case into two separate tests, one that provokes compilation before the checkpoint and one after. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
Apply count=1 to nanoTimeJit() to try to force its compilation as early as possible so that the compiled version will be invoked during the test, otherwise the test will be ineffective. This solution does not guarantee compilation, but it's the biggest hammer we have aside from `disableAsyncCompilation`, which is global and would affect all compiled code, test run time, and all test variations (since this option is being applied to all variations). Note that count=1 is better than count=0 here since 0 will most likely result in calls to System.nanoTime() being unresolved at compile time, which prevents the optimization that we are trying to verify. Signed-off-by: Younes Manton <ymanton@ca.ibm.com>
0a37276
to
d3cdc1b
Compare
I've
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Z Changes LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The VM related change looks good.
Jenkins test sanity.functional xlinuxcriu jdk11 |
One failure, but it doesn't look related to timing or checkpoint/restore functionality. #15617 was recently merged and looks related. @mpirvu does this crash look familiar?
|
I'll have a look as the failure looks related to my changes. |
I talked to Irwin and he doesn't see how the pointer we get from |
It doesn't look like this test does any checkpointing at all. |
Tried to reproduce this problem here: https://openj9-jenkins.osuosl.org/job/Grinder/1240/ but tests passed |
50 runs of the problematic test passed (https://openj9-jenkins.osuosl.org/job/Grinder/1243/). |
All new criu tests are working as expected. Will merge now |
The attached patches
System.nanoTime()
implementation to do the right thing in checkpoint modeSystem.nanoTime()
behaves as expected in checkpoint modeFixes #15513