Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC Thread Pool Tuning for CRIU #16385

Merged
merged 1 commit into from
Dec 9, 2022
Merged

Conversation

RSalman
Copy link
Contributor

@RSalman RSalman commented Nov 30, 2022

Integrate CRIU support with new OMR GC APIs for thread pool adjustments, contract the thread pool for checkpoint and expand it for restore.

For background, see eclipse/omr#6831.

  • Introduced j9gc_prepare_for_checkpoint and j9gc_reinitialize_for_restore APIs to consolidate GC specific routines used for checkpoint/restore
    Report J9NLS_GC_FAILED_TO_INSTANTIATE_TASK_DISPATCHER on j9gc_reinitialize_for_restore failure
  • Introduced -XX:CheckpointGCThreads= option to allow the user to tune checkpoint thread count

Depends on: eclipse/omr#6831

Signed-off-by: Salman Rana salman.rana@ibm.com

@RSalman RSalman force-pushed the criu-final branch 2 times, most recently from 5337bae to d7e2c90 Compare November 30, 2022 04:27
@dmitripivkine dmitripivkine added comp:gc criu Used to track CRIU snapshot related work labels Nov 30, 2022
@RSalman
Copy link
Contributor Author

RSalman commented Nov 30, 2022

@tajila @amicic @dmitripivkine please have a look

runtime/vm/vmthread.cpp Outdated Show resolved Hide resolved
@RSalman
Copy link
Contributor Author

RSalman commented Dec 1, 2022

Following changes made in latest commit to address review comments:

  • MM API:
    • Renamedj9gc_criu_checkPoint -> j9gc_prepare_for_checkpoint
    • Renamed j9gc_criu_restore -> j9gc_reinitialize_for_restore
    • Generalized to call a higher level class for checkpoint/restore, call goes through configuration now rather than directly to dispatcher
  • Removed restoreGCthreadCount option
  • Extended threadCleanup (thread shutdown) condition with !VM_CRIUHelpers::isJVMInSingleThreadMode(vm) to ensure non-java threads don't run cleanup code in single thread mode

@RSalman
Copy link
Contributor Author

RSalman commented Dec 1, 2022

@tajila I'm trying to understand the failure path for restore.

We're thinking to fail if we can't startup threads on restore, something like this:

if (FALSE == vm->memoryManagerFunctions->j9gc_reinitialize_for_restore(currentThread)) {
			//FAIL PATH, return error message?
}

This will be similar to whats done with JVM startup when initing GC components and starting threads. If the dispatcher fails to startup the threads, then an error is returned and error message is printed:

gcStartupHeapManagement(J9JavaVM *javaVM)
{
...
	if (!extensions->dispatcher->startUpThreads()) {
		extensions->dispatcher->shutDownThreads();
		result = JNI_ENOMEM;
	}

	if (JNI_OK != result) {
		PORT_ACCESS_FROM_JAVAVM(javaVM);
		extensions->getGlobalCollector()->collectorShutdown(extensions);
		j9nls_printf(PORTLIB, J9NLS_ERROR, J9NLS_GC_FAILED_TO_STARTUP_GARBAGE_COLLECTOR);
		return result;
	}

@RSalman
Copy link
Contributor Author

RSalman commented Dec 1, 2022

I suppose we can forgo failing and simply continue on with the threads in checkpoint + how many ever we were able to startup

@tajila
Copy link
Contributor

tajila commented Dec 1, 2022

//FAIL PATH, return error message?

you can add an error message similar to the others describing what went wrong and thrown a JVMRestoreException

@tajila
Copy link
Contributor

tajila commented Dec 1, 2022

I suppose we can forgo failing and simply continue on with the threads in checkpoint + how many ever we were able to startup

It may be better to fail than to restore incorrectly. If we cant restore GC threads, theres a good chance will hit the OOM somewhere else.

@RSalman RSalman force-pushed the criu-final branch 7 times, most recently from 5ca2e94 to ca718bb Compare December 6, 2022 00:22
@RSalman RSalman changed the title [WIP] GC Thread Pool Tuning for CRIU GC Thread Pool Tuning for CRIU Dec 6, 2022
@amicic
Copy link
Contributor

amicic commented Dec 8, 2022

I'll let @tajila handle the final review/testing/merge

@tajila
Copy link
Contributor

tajila commented Dec 8, 2022

Jenkins test sanity xlinuxcriu jdk17

@tajila
Copy link
Contributor

tajila commented Dec 8, 2022

@RSalman

18:41:26  /home/jenkins/workspace/Build_JDK17_x86-64_linux_criu_Personal/openj9/runtime/criusupport/criusupport.cpp:654:32: error: 'J9JavaVM' {aka 'struct J9JavaVM'} has no member named 'criuJVMRestoreExceptionClass'
18:41:26    654 |    currentExceptionClass = vm->criuJVMRestoreExceptionClass;

You need to rebase your changes. It is vm->checkpointState.criuJVMRestoreExceptionClass now

Integrate CRIU support with new OMR GC APIs for thread pool adjustments,
contract the thread pool for checkpoint and expand it for restore.

For background, see eclipse/omr#6831.

- Introduced j9gc_prepare_for_checkpoint and
j9gc_reinitialize_for_restore APIs to consolidate  GC specific routines
used for checkpoint/restore
  Report J9NLS_GC_FAILED_TO_INSTANTIATE_TASK_DISPATCHER on
j9gc_reinitialize_for_restore failure
- Introduced -XX:CheckpointGCThreads= option to allow the user to tune
checkpoint thread count

Depends on: eclipse/omr#6831

Signed-off-by: Salman Rana <salman.rana@ibm.com>
@tajila
Copy link
Contributor

tajila commented Dec 9, 2022

Jenkins test sanity xlinuxcriu jdk17

@tajila tajila merged commit 38c0a73 into eclipse-openj9:master Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gc criu Used to track CRIU snapshot related work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants