-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize continuation walkframe processing for JIT codecache reclaim #17527
Conversation
3393df0
to
6d6b751
Compare
87c35eb
to
27bfedb
Compare
27bfedb
to
3ebdc0a
Compare
} | ||
} | ||
_extensions->didIterateContinuationListForJIT = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this any different than CopyForward iteration?
would be nice to share.... perhaps put it into IGGC (which can be accesses with ext->globalCollector)
or as a static method in MM_ContinuationObjectBufferVLHGC ?
the latter approach, if works, could be applied for sharing Scavenge and MarkingScheme variants (using BufferStandard), that are also similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, a cleaner approach is to put it into HeapRegionManager (it's higher level abstract than Buffer), since iteration assumes some knowledge about how regions are organized, and indeed we do have separate HRMs for Standard and VLHGC configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both MM_HeapRegionManager and MM_HeapRegionManagerStandard are in omr, if we put it into HeapRegionManager, we need to update omr code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just realized it. nah, scratch it. we should not be introducing Java spacific code to OMR
{ | ||
J9JavaVM *vm = (J9JavaVM*)env->getOmrVM()->_language_vm; | ||
J9JITConfig *jitConfig = vm->jitConfig; | ||
if (NULL == jitConfig->methodsToDelete) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we generalize this with a check like 'anything registered to J9HOOK_MM_WALKCONTINUATION' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the check for jitConfig->methodsToDelete
to RootScanner
076931e
to
0e45f59
Compare
void | ||
MM_ScavengerRootClearer::iterateContinuationObjects(MM_EnvironmentBase *env) | ||
{ | ||
env->getGCEnvironment()->_continuationObjectBuffer->iterateContinuationObjects(env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still thinking if Buffer is the right class to add iteration to, but if we do it, it should be a static method, otherwise we may end up calling it with _continuationObjectBuffer NULL. Probably it would still work, since the iteration would never access 'this' pointer, but still it would be cleaner as a static.
Being a static, it could not be a virtual, but that would still be ok, (Scavenger/Marker would explicitly call Standard variant of iteration, and CopyForwad/GlobalMarkingScheme would call VLHGC variant.)
RootScanner's iterate method state in base class as empty/pure virtual, so we can share the jit's condition.
8f66b5b
to
593b506
Compare
runtime/gc_trace/TgcRootScanner.cpp
Outdated
@@ -70,7 +70,7 @@ const static char *attributeNames[] = { | |||
"phantomrefscomplete", /* RootScannerEntity_PhantomReferenceObjectsComplete */ | |||
"unfinalizedobjectscomplete", /* RootScannerEntity_UnfinalizedObjectsComplete */ | |||
"ownablesynchronizerobjectscomplete", /* RootScannerEntity_OwnableSynchronizerObjectsComplete */ | |||
"continuationobjectscomplete", /* RootScannerEntity_ContinuationObjectsComplete */ | |||
"iteratingcontinuationobjects", /* RootScannerEntity_ContinuationObjectsComplete */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd keep these name consistent (revert it back to 'complete')
@@ -48,7 +48,8 @@ | |||
#include "FinalizableReferenceBuffer.hpp" | |||
#include "FinalizeListManager.hpp" | |||
#endif /* J9VM_GC_FINALIZATION*/ | |||
#include "ContinuationObjectBuffer.hpp" | |||
//#include "ContinuationObjectBuffer.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this, if really not needed
runtime/gc_base/RootScanner.cpp
Outdated
@@ -1044,6 +1050,12 @@ MM_RootScanner::scanClearable(MM_EnvironmentBase *env) | |||
scanOwnableSynchronizerObjects(env); | |||
scanContinuationObjects(env); | |||
|
|||
J9JavaVM *vm = (J9JavaVM*)env->getOmrVM()->_language_vm; | |||
J9JITConfig *jitConfig = vm->jitConfig; | |||
if (NULL != jitConfig->methodsToDelete) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could jitConfig be null (if running -Xint)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update it to
if ((NULL != jitConfig) && (NULL != jitConfig->methodsToDelete)) {
runtime/gc_base/RootScanner.hpp
Outdated
@@ -435,6 +435,7 @@ class MM_RootScanner : public MM_BaseVirtual | |||
*/ | |||
virtual void scanOwnableSynchronizerObjects(MM_EnvironmentBase *env); | |||
virtual void scanContinuationObjects(MM_EnvironmentBase *env); | |||
virtual void iterateAllContinuationObjects(MM_EnvironmentBase *env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put some comments in this base RootScanner class for these 2 methods: that one is about pruning C.O. lists within a scope of current GC (for example just for Nursery in Scavenge) and the other one is about iterating all live COs in the heap and doing any processing that other parties may register through a hook.
593b506
to
711cea1
Compare
runtime/gc_base/RootScanner.hpp
Outdated
@@ -434,7 +434,16 @@ class MM_RootScanner : public MM_BaseVirtual | |||
* which modifies elements within the list. | |||
*/ | |||
virtual void scanOwnableSynchronizerObjects(MM_EnvironmentBase *env); | |||
/** | |||
* scanContinuationObjects is for pruning continuation Objects, which are dead or last unmounted, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terminology: I think we prune a list or remove object from list, but not prune objects
there are some other minor mistakes: missing space, missing letter, inconsistent casing...
runtime/gc_base/RootScanner.hpp
Outdated
virtual void scanContinuationObjects(MM_EnvironmentBase *env); | ||
/** | ||
* iterateAllContinuationObjects is for iterating all live Contunuation Objects in the heap and doing any processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect spelling of Continuation
runtime/gc_base/RootScanner.hpp
Outdated
virtual void scanContinuationObjects(MM_EnvironmentBase *env); | ||
/** | ||
* iterateAllContinuationObjects is for iterating all live Contunuation Objects in the heap and doing any processing | ||
* that other parties may register through a hook (iterateAllContinuationObjects}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to put the name of the hook within the brackets?
the brackets types are inconsistent
runtime/gc_api/HeapIteratorAPI.cpp
Outdated
if ((OMR_GC_CYCLE_TYPE_SCAVENGE == cycle_type) || (OMR_GC_CYCLE_TYPE_VLHGC_PARTIAL_GARBAGE_COLLECT == cycle_type) || | ||
(OMR_GC_CYCLE_TYPE_GLOBAL == cycle_type) || (OMR_GC_CYCLE_TYPE_VLHGC_GLOBAL_GARBAGE_COLLECT == cycle_type)) { | ||
if (extensions->didIterateContinuationListForJIT) { | ||
extensions->didIterateContinuationListForJIT = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we can get by without introducing this flag by registering JIT reclamation pass to CYCLE_END for all type of collectors (not just realtime). By doing that, we guarantee that stack walking pass for JIT (that we do now in clearable phase) is always done before reclamation pass. Clearable pass is done in the last increment of a cycle (if a cycle has more than one increment like concurrent scavenge, GMP, or realtime) when cycle end hook is also triggered that could do the reclamation.
8f2c1b0
to
c5a1d62
Compare
d5a7d08
to
4435f0a
Compare
4435f0a
to
e425ee2
Compare
jenkins test sanity aix,win jdk21 |
3e23a77
to
76e6f6b
Compare
jenkins test sanity aix,win jdk21 |
JIT code cache reclaim is generally triggered via end of GC event, it requires walk the java stack of all live continuation Objects to keep the related resource in code cache for the virtual threads, currently JIT uses single thread to iterate the global continuation list for the processing, it might generate up to 5-30% regression for throughput performance, use all of GC threads parallel iterating the list during GC clearable phase could improve CPUs usage and reduce the processing time(STW time). - parallel iterating continuation list at the end of GC clearable phase (except realtime) - register new J9HOOK_MM_WALKCONTINUATION event to callback jit java stack walk for code cache reclaim. - update that JIT code cache reclaim is only triggered by every end of GC Cycle event, it was triggered by both end of Local GC and end of of Global GC events, but some of end of GC events are unnecessary for code cache reclaim, such as concurrent scavenge has two STW GC phases (initializing at the begin and cleaning up at the end), at the end of concurrent scavenge initializing phase, there is not much GC work done yet, it would waster time for code cache reclamation at the point, also currently there are no different processing specific for local GC and global GC(if we need to handle the difference in future, we still can use "env->_cycleState->_type" to figure out local and global GC). Signed-off-by: Lin Hu <linhu@ca.ibm.com>
76e6f6b
to
b2d8b91
Compare
jenkins test sanity aix,win jdk21 |
The PR builds have passed. Merging in lieu of @amicic who has approved the PR and is currently away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should either be fixed or reverted.
@@ -4800,6 +4800,7 @@ C|UTILIZATION_WINDOW_SIZE | |||
S|MmhookConstants|MmhookConstantsPointer| | |||
C|J9HOOK_MM_CLASS_UNLOADING_END | |||
C|J9HOOK_MM_INTERRUPT_COMPILATION | |||
C|J9HOOK_MM_WALKCONTINUATION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dmitri is correct: new constants cannot be added to this file.
@@ -4800,6 +4800,7 @@ C|UTILIZATION_WINDOW_SIZE | |||
S|MmhookConstants|MmhookConstantsPointer| | |||
C|J9HOOK_MM_CLASS_UNLOADING_END | |||
C|J9HOOK_MM_INTERRUPT_COMPILATION | |||
C|J9HOOK_MM_WALKCONTINUATION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think it was necessary to add it.
#17986 addresses my concerns. |
jitCodeCache reclamation would be triggered by GC cycle end event, For scavenge backout case, at the end of cycle, heap would be rolled back to before scavenge cycle start, and another global GC would start right after the end of backout scavenge, so there is no need to do jitCodeCache reclamation at the end of backout scavenge. also from Java21 the partial reclamation(related with virtualThread) processing has been split to move to clearable phase of GC in order to improve the processing performance (PR:eclipse-openj9#17527), but for scavenge backout case, the backout only could happen before the first part of reclamation processing(the processing would never get a chance to run for the case), without the first part reclamation processing, the second part of reclamation processing, which triggered by GC cycle end, would be possible to reclaim the "live" stack jit frames, then cause the crash. so we need to skip jitCodeCache reclamation for scavenge backout case. - pass unsuccessful state via the cycleType of Cycle End event to the jitcodecache relamation hook for backout case. - check if unsuccessful state is set in the hook, if it is true, skip jitcodecache relamation processing(second part). Signed-off-by: hulin <linhu@ca.ibm.com>
jitCodeCache reclamation would be triggered by GC cycle end event, For scavenge backout case, at the end of cycle, heap would be rolled back to before scavenge cycle start, and another global GC would start right after the end of backout scavenge, so there is no need to do jitCodeCache reclamation at the end of backout scavenge. also from Java21 the partial reclamation(related with virtualThread) processing has been split to move to clearable phase of GC in order to improve the processing performance (PR:eclipse-openj9#17527), but for scavenge backout case, the backout only could happen before the first part of reclamation processing(the processing would never get a chance to run for the case), without the first part reclamation processing, the second part of reclamation processing, which triggered by GC cycle end, would be possible to reclaim the "live" stack jit frames, then cause the crash. so we need to skip jitCodeCache reclamation for scavenge backout case. - pass unsuccessful state via the cycleType of Cycle End event to the jitcodecache relamation hook for backout case. - check if unsuccessful state is set in the hook, if it is true, skip jitcodecache relamation processing(second part). Signed-off-by: hulin <linhu@ca.ibm.com>
jitCodeCache reclamation would be triggered by GC cycle end event, For scavenge backout case, at the end of cycle, heap would be rolled back to before scavenge cycle start, and another global GC would start right after the end of backout scavenge, so there is no need to do jitCodeCache reclamation at the end of backout scavenge. also from Java21 the partial reclamation(related with virtualThread) processing has been split to move to clearable phase of GC in order to improve the processing performance (PR:eclipse-openj9#17527), but for scavenge backout case, the backout only could happen before the first part of reclamation processing(the processing would never get a chance to run for the case), without the first part reclamation processing, the second part of reclamation processing, which triggered by GC cycle end, would be possible to reclaim the "live" stack jit frames, then cause the crash. so we need to skip jitCodeCache reclamation for scavenge backout case. - pass unsuccessful state via the cycleType of Cycle End event to the jitcodecache relamation hook for backout case. - check if unsuccessful state is set in the hook, if it is true, skip jitcodecache relamation processing(second part). Signed-off-by: hulin <linhu@ca.ibm.com>
JOptimize continuation walkframe processing for JIT codecache reclaim
JIT code cache reclaim is generally triggered via end of GC event,
it requires walk the java stack of all live continuation Objects to
keep the related resource in code cache for the virtual threads,
currently JIT uses single thread to iterate the global continuation list
for the processing, it might generate up to 5-30% regression for
throughput performance, use all of GC threads parallel iterating the
list during GC clearable phase could improve CPUs usage and reduce the
processing time(STW time).
(except realtime)
stack walk for code cache reclaim.
GC Cycle event, it was triggered by both end of Local GC and end of of
Global GC events, but some of end of GC events are unnecessary for
code cache reclaim, such as concurrent scavenge has two STW GC phases
(initializing at the begin and cleaning up at the end), at the end of
concurrent scavenge initializing phase, there is not much GC work done
yet, it would waster time for code cache reclamation at the point,
also currently there are no different processing specific for local GC
and global GC(if we need to handle the difference in future, we still
can use "env->_cycleState->_type" to figure out local and global GC).
Signed-off-by: Lin Hu linhu@ca.ibm.com