-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delay lock-related operations in single thread mode #14583
Conversation
981f821
to
68dcc4e
Compare
@DanHeidinga Please review these changes |
91599b5
to
9adeb6b
Compare
runtime/nls/j9cl/j9jcl.nls
Outdated
@@ -515,3 +515,11 @@ J9NLS_JCL_CRIU_FAILED_TO_RUN_INTERNAL_RESTORE_HOOKS.explanation=An error occured | |||
J9NLS_JCL_CRIU_FAILED_TO_RUN_INTERNAL_RESTORE_HOOKS.system_action=The JVM will throw a RestoreException. | |||
J9NLS_JCL_CRIU_FAILED_TO_RUN_INTERNAL_RESTORE_HOOKS.user_response=View CRIU documentation to determine how to resolve the error. | |||
# END NON-TRANSLATABLE | |||
|
|||
J9NLS_JCL_CRIU_FAILED_DELAY_INDENTIY_OPS=Could not run delayed identity operations succesfully, errno=%li |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
J9NLS_JCL_CRIU_FAILED_DELAY_INDENTIY_OPS=Could not run delayed identity operations succesfully, errno=%li | |
J9NLS_JCL_CRIU_FAILED_DELAY_INDENTIY_OPS=Could not run delayed lock-related operations succesfully, errno=%li |
Identity makes sense in the context of Valhalla but less so here. Should we just call it lock-related
? Or are there other operations as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought there would be more, but now I think it will just be wait/notify* that we handle in this manner
runtime/nls/j9cl/j9jcl.nls
Outdated
J9NLS_JCL_CRIU_FAILED_DELAY_INDENTIY_OPS=Could not run delayed identity operations succesfully, errno=%li | ||
# START NON-TRANSLATABLE | ||
J9NLS_JCL_CRIU_FAILED_DELAY_INDENTIY_OPS.sample_input_1=1 | ||
J9NLS_JCL_CRIU_FAILED_DELAY_INDENTIY_OPS.explanation=An error occured when the JVM attempted to run delayed identity operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same concern with 'identity' here
runtime/vm/BytecodeInterpreter.hpp
Outdated
} else { | ||
#endif /* defined(J9VM_OPT_CRIU_SUPPORT) */ | ||
if (VM_ObjectMonitor::getMonitorForNotify(_currentThread, receiver, &monitorPtr, true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else { | |
#endif /* defined(J9VM_OPT_CRIU_SUPPORT) */ | |
if (VM_ObjectMonitor::getMonitorForNotify(_currentThread, receiver, &monitorPtr, true)) { | |
} else | |
#endif /* defined(J9VM_OPT_CRIU_SUPPORT) */ | |
{ | |
if (VM_ObjectMonitor::getMonitorForNotify(_currentThread, receiver, &monitorPtr, true)) { |
If you put the opening {
outside the ifdef, you can avoid the extra ifdef to close it later. Basically, treat it like an extra (otherwise unnecessary) scope that is only an else
block for the ifdef case
runtime/vm/CRIUHelpers.cpp
Outdated
#define JAVA_UTIL_RANDOM "java/util/Random" | ||
J9Class *juRandomClass = hashClassTableAt(vm->systemClassLoader, (U_8 *)JAVA_UTIL_RANDOM, LITERAL_STRLEN(JAVA_UTIL_RANDOM)); | ||
juRandomClass = hashClassTableAt(vm->systemClassLoader, (U_8 *)JAVA_UTIL_RANDOM, LITERAL_STRLEN(JAVA_UTIL_RANDOM)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be protected by the classTableMutex or use the peekClassHashTable
function instead
} | ||
if (!VM_ObjectMonitor::getMonitorForNotify(currentThread, instance, &monitorPtr, true)) { | ||
if (NULL != monitorPtr) { | ||
/* another thread owns the lock, shouldn't be possible */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracepoint
runtime/vm/CRIUHelpers.cpp
Outdated
J9Pool *delayedSingleThreadModeOpRecords = vm->checkpointState.delayedSingleThreadModeOpRecord; | ||
J9InternalVMFunctions* vmFuncs = vm->internalVMFunctions; | ||
J9DelayedSingleThreadModeOpRecord *delayedSingleThreadModeOp = (J9DelayedSingleThreadModeOpRecord*) pool_startDo(delayedSingleThreadModeOpRecords, &walkState); | ||
omrthread_monitor_t monitorPtr = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you declare this in the loop to shorten it's lifetime?
runtime/vm/CRIUHelpers.cpp
Outdated
} | ||
|
||
monitorPtr = NULL; | ||
vmFuncs->j9jni_deleteGlobalRef((JNIEnv*) currentThread, delayedSingleThreadModeOp->globalObjectRef, JNI_FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each item should be removed from the pool after the its been notified - use pool_removeElement
which says it is safe to use during an iteration
@@ -4070,11 +4070,20 @@ typedef struct J9InternalHookRecord { | |||
struct J9Pool *instanceObjects; | |||
} J9InternalHookRecord; | |||
|
|||
typedef struct J9DelayedSingleThreadModeOpRecord { | |||
jobject globalObjectRef; | |||
UDATA operation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does pool iteration guarantee the same order FIFO iteration order? i.e. the first notify we delay should be the first we do after exiting single threaded mode. Similarly the 2nd should be the 2nd, etc.
If the pool iteration order isn't guaranteed (and documented!) I think these should have a J9DelayedSingleThreadModeOpRecord *next
field as well and they should be stored in a linked list. The pool is fine for backing the allocation but we should iterate through the list to do the notifies.
And likely need two pointers - one for the head of the list and one for the tail so we can insert quickly at the tail and iterate from the head
8a6729e
to
b61b9cd
Compare
@DanHeidinga Changes are ready for another look |
2e72dac
to
d375fd8
Compare
66b4528
to
8eaadf6
Compare
I fixed a small issue with not releasing locks when there are no waiters and added more tracepoints. Changes are ready for another look |
|
||
vmFuncs->j9jni_deleteGlobalRef((JNIEnv*) currentThread, delayedLockingOperation->globalObjectRef, JNI_FALSE); | ||
pool_removeElement(vm->checkpointState.delayedLockingOperationsRecords, delayedLockingOperation); | ||
delayedLockingOperation = J9_LINKED_LIST_NEXT_DO(vm->checkpointState.delayedLockingOperationsRoot, delayedLockingOperation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a use after free issue as you've removed the element from the pool (free) and then dereferenced it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor formatting issues to shrink variable lifetimes and use c++ style casts, otherwise fine
runtime/vm/CRIUHelpers.cpp
Outdated
} | ||
|
||
vmFuncs->j9jni_deleteGlobalRef((JNIEnv*) currentThread, delayedLockingOperation->globalObjectRef, JNI_FALSE); | ||
lastOperation = delayedLockingOperation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this is a cpp file, can we declare lastOperation
here rather than at the top of the scope? It makes it easer to reason about the lifetime of the variable
lastOperation = delayedLockingOperation; | |
J9DelayedLockingOpertionsRecord *lastOperation = delayedLockingOperation; |
runtime/vm/CRIUHelpers.hpp
Outdated
goto throwOOM; | ||
} | ||
|
||
newRecord = (J9DelayedLockingOpertionsRecord*) pool_newElement(vm->checkpointState.delayedLockingOperationsRecords); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newRecord = (J9DelayedLockingOpertionsRecord*) pool_newElement(vm->checkpointState.delayedLockingOperationsRecords); | |
J9DelayedLockingOpertionsRecord *newRecord = static_cast<J9DelayedLockingOpertionsRecord*>( pool_newElement(vm->checkpointState.delayedLockingOperationsRecords)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't move this one as well because of the goto's
runtime/vm/CRIUHelpers.cpp
Outdated
{ | ||
J9JavaVM *vm = currentThread->javaVM; | ||
J9InternalVMFunctions* vmFuncs = vm->internalVMFunctions; | ||
J9DelayedLockingOpertionsRecord *delayedLockingOperation = (J9DelayedLockingOpertionsRecord*) J9_LINKED_LIST_START_DO(vm->checkpointState.delayedLockingOperationsRoot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
J9DelayedLockingOpertionsRecord *delayedLockingOperation = (J9DelayedLockingOpertionsRecord*) J9_LINKED_LIST_START_DO(vm->checkpointState.delayedLockingOperationsRoot); | |
J9DelayedLockingOpertionsRecord *delayedLockingOperation = static_cast<J9DelayedLockingOpertionsRecord*>(J9_LINKED_LIST_START_DO(vm->checkpointState.delayedLockingOperationsRoot)); |
runtime/vm/CRIUHelpers.cpp
Outdated
#define JAVA_UTIL_RANDOM "java/util/Random" | ||
J9Class *juRandomClass = hashClassTableAt(vm->systemClassLoader, (U_8 *)JAVA_UTIL_RANDOM, LITERAL_STRLEN(JAVA_UTIL_RANDOM)); | ||
juRandomClass = peekClassHashTable(currentThread, vm->systemClassLoader, (U_8 *)JAVA_UTIL_RANDOM, LITERAL_STRLEN(JAVA_UTIL_RANDOM)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
juRandomClass = peekClassHashTable(currentThread, vm->systemClassLoader, (U_8 *)JAVA_UTIL_RANDOM, LITERAL_STRLEN(JAVA_UTIL_RANDOM)); | |
J9Class *juRandomClass = peekClassHashTable(currentThread, vm->systemClassLoader, (U_8 *)JAVA_UTIL_RANDOM, LITERAL_STRLEN(JAVA_UTIL_RANDOM)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cant move this one because of the jump to done:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might require an additional scope so the change becomes:
{
J9Class *juRandomClass = peekClassHashTable(currentThread, vm->systemClassLoader, (U_8 *)JAVA_UTIL_RANDOM, LITERAL_STRLEN(JAVA_UTIL_RANDOM));
....
if (NULL != juRandomClass) {
addInternalJVMCheckpointHook(currentThread, TRUE, juRandomClass, FALSE, juRandomReseed);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm OK if you want to leave it as is. The method's clear enough that this isn't critical
3f85328
to
c66c635
Compare
@DanHeidinga The PR is ready for another look |
@tajila do we have CRIU-enabled PR builds? If so, do they need a special trigger phrase? |
I've made the changes
No, unfortunately we dont have machine support yet |
Jenkins test sanity win jdk11 |
The windows PR build is failing due to code reaching for |
Jenkins test sanity win jdk11 |
This PR delays notify/notifyAll operations by intercepting the Object.notify* INLs and saving the instances and operations to a queue. These instances must be saved as global JNI refs as we cannot save them in localref storage and disallowing a GC is not feasible. At the end of the single thread phase all the operations will be executed by the checkpointing thread. See eclipse-openj9#14584 Handles case 4) Notify Signed-off-by: Tobi Ajila <atobia@ca.ibm.com>
Jenkins test sanity win jdk11 |
1 similar comment
Jenkins test sanity win jdk11 |
Jenkins test sanity aix jdk8 |
Jenkins test sanity xlinux jdk17 |
Delay lock-related operations in single thread mode
This PR delays notify/notifyAll operations by intercepting the
Object.notify* INLs and saving the instances and operations to a queue.
These instances must be saved as global JNI refs as we cannot save them
in localref storage and disallowing a GC is not feasible. At the end of
the single thread phase all the operations will be executed by the
checkpointing thread.
See #14584
Handles case 4) Notify
Signed-off-by: Tobi Ajila atobia@ca.ibm.com