Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU enabled container image #18229

Open
fipro78 opened this issue Oct 4, 2023 · 42 comments
Open

CRIU enabled container image #18229

fipro78 opened this issue Oct 4, 2023 · 42 comments
Labels
comp:vm criu Used to track CRIU snapshot related work

Comments

@fipro78
Copy link

fipro78 commented Oct 4, 2023

As @tajila offered, I raise a question about OpenJ9 CRIU here. My tests are based on container images to compare container size and startup behavior. I noticed that

ibm-semeru-runtimes:open-17-jre-focal
ibm-semeru-runtimes:open-17-jdk-focal
contains the openj9.criu module, but there doesn't seem to contain criu support.

ibm-semeru-runtimes:open-17-jre-jammy
ibm-semeru-runtimes:open-17-jdk-jammy
doesn't even seem to contain the openj9.criu module

ibm-semeru-runtimes:open-17-jre-centos7
ibm-semeru-runtimes:open-17-jdk-centos7
contains the openj9.criu module, but there doesn't seem to contain criu support.

For the -centos7 and the -focal containers I get the following exception on startup if I try to execute code that uses CRIUSupport:

org.eclipse.openj9.criu.SystemCheckpointException: The JVM attempted to load libcriu.so but was unable to: 1

Reading the CRIU support page I assumed that CRIU support is available in the Semeru container images.

Support for the Checkpoint/Restore In Userspace (CRIU) tool is currently provided as a technical preview in container environments. CRIU support is available for the customized CRIU version that is packaged with the Semeru container image. This preview is supported for use in production environments, however, all APIs and command-line options are subject to change.

Why does the -focal and the -centos7 images contain the openj9.criu module, but the -jammy images doesn't?

Why is the criu support not included on OS level in the Semeru container images?

Is there any container image I could simply use for my evaluation, or do I need to adapt the following file somehow and see what I can get running?
https://github.com/ibmruntimes/InstantOnStartupGuide/blob/main/Containerfiles/Containerfile.ubuntu22.unprivileged

@fipro78 fipro78 added the criu Used to track CRIU snapshot related work label Oct 4, 2023
@tajila
Copy link
Contributor

tajila commented Oct 5, 2023

For the -centos7 and the -focal containers I get the following exception on startup if I try to execute code that uses CRIUSupport:

Unfortunately, there is an issue with our docs. We only package criu with the ubi8 and 9 images. We have a custom version that we release on UBI that includes unprivileged and random UID support and other bug fixes. We are working on releasing a similar version for docker images.

Why does the -focal and the -centos7 images contain the openj9.criu module, but the -jammy images doesn't?

I tried the following:

tobi@summoned1:~/criuStuff$ sudo docker image pull docker.io/library/ibm-semeru-runtimes:open-17-jdk-jammy
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Trying to pull docker.io/library/ibm-semeru-runtimes:open-17-jdk-jammy...
Getting image source signatures
Copying blob 707e32e9fc56 skipped: already exists
Copying blob 741f802a3dd9 skipped: already exists
Copying blob 1ed42af34167 skipped: already exists
Copying blob d1aa882d2666 skipped: already exists
Copying config 77b2fca4c4 done
Writing manifest to image destination
Storing signatures
77b2fca4c4c4ec124c2e430a4bd99cb799de2be65abdb23059045f384e6ab3df
tobi@summoned1:~/criuStuff$ sudo docker run -it ibm-semeru-runtimes:open-17-jdk-jammy /bin/bash
root@114460984ea8:/# ls /opt/java/openjdk/jmods/openj9.criu.jmod
/opt/java/openjdk/jmods/openj9.criu.jmod
root@114460984ea8:/#

It seems to be there for me.

@tajila
Copy link
Contributor

tajila commented Oct 5, 2023

Is there any container image I could simply use for my evaluation, or do I need to adapt the following file somehow and see what I can get running?

For evaluation I recommend using the UBI8 or 9 images.
docker pull icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi9-amd64
docker pull icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi8-amd64

If you need ubuntu then you can copy the dockerfile you linked. But, I recommend updating the criu repo to https://github.com/ibmruntimes/criu/tree/0.40.1-release. And downloading the latest binaries here:
https://github.com/ibmruntimes/semeru11-binaries
https://github.com/ibmruntimes/semeru17-binaries

@tajila tajila added the comp:vm label Oct 6, 2023
@tajila
Copy link
Contributor

tajila commented Oct 12, 2023

@fipro78 Were you able to make any further progress?

@fipro78
Copy link
Author

fipro78 commented Oct 12, 2023

@tajila Unfortunately I am running from one issue to the other.

After your suggestion to use the UBI8 or UBI9 images, I managed to build the images, but starting the application still fails at checkpoint creation.

Short interlude to my setup:

  • Windows 10
  • Docker

As the automated image creation via Maven build doesn't show the error, I started trying to build manually.

The first thing I came around is that I need to use privileged containers if I want to create a checkpoint.
If I create the image and start it in privileged mode, I got the error that the necessary Linux capabilities are not set in the OpenJ9 images. After I set the capabilities in my application container and start the application for checkpoint creation, I get this error:

SystemCheckpointException: Could not dump the JVM processes, err=-52

After some googling I found these two:
#17295
#17457

I tried to build from the Linux based devcontainer, and even from within a WSL2. But always the same error.

Most of the time it sounds like the issue is the usage of Docker and the solution is to use Podman. So I tried to install Podman. But as I am behind a corporate firewall, I am somehow unable to get Podman to work. I always get errors related to proxy or DNS resolution.

Actually I am a bit stuck right now. The usage of Docker on Windows seem to be broken at the moment. To test Podman I need to first find a computer that is not necessarily behind a corporate firewall. Of course that should be the easier task when I find the time to test on my private computer. But not ideal if I want to promote this inside the company.

And for my build setup with Maven and the Docker Maven Plugin, I expect it also to fail, as it seems that I need Podman for the image creation with automatic checkpoint creation.

Anyhow, I don't give up, I probably just need some more time to get something working. But it is of course unfortunate that the Docker usage still doesn't work. Surely not your fault, just a thought. :)

@fipro78
Copy link
Author

fipro78 commented Nov 14, 2023

@tajila Sorry for the delay. I had some troubles with my environment and did some CRaC testing in parallel.

I have created a small setup with the OSGi application and some scripts to build the container image via the three step process.

Trying to create a checkpoint inside the container from Windows CMD via Docker or Podman and from within a WSL fails with this error:

org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(Unknown Source)
        at org.fipro.osgi.benchmark.criu.BenchmarkCRIUSupport.activate(BenchmarkCRUISupport.java:34)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:245)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:687)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:531)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:317)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:307)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createImplementationObject(SingleComponentManager.java:354)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createComponent(SingleComponentManager.java:115)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:1002)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getServiceInternal(SingleComponentManager.java:975)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.activateInternal(AbstractComponentManager.java:785)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enableInternal(AbstractComponentManager.java:674)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enable(AbstractComponentManager.java:437)
        at org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.enableComponents(ConfigurableComponentHolder.java:671)
        at org.apache.felix.scr.impl.BundleComponentActivator.initialEnable(BundleComponentActivator.java:310)
        at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:593)
        at org.apache.felix.scr.impl.Activator.access$200(Activator.java:74)
        at org.apache.felix.scr.impl.Activator$ScrExtension.start(Activator.java:460)
        at org.apache.felix.scr.impl.AbstractExtender.createExtension(AbstractExtender.java:196)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:169)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:49)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:488)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:1)
        at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:232)
        at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:450)
        at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:949)
        at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234)
        at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:229)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:138)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:130)
        at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:217)
        at org.eclipse.osgi.container.Module.publishEvent(Module.java:499)
        at org.eclipse.osgi.container.Module.start(Module.java:486)
        at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445)
        at aQute.launcher.Launcher.start(Launcher.java:699)
        at aQute.launcher.Launcher.startBundles(Launcher.java:679)
        at aQute.launcher.Launcher.activate(Launcher.java:585)
        at aQute.launcher.Launcher.launch(Launcher.java:404)
        at aQute.launcher.Launcher.run(Launcher.java:186)
        at aQute.launcher.Launcher.main(Launcher.java:162)
        at aQute.launcher.pre.EmbeddedLauncher.executeWithRunPath(EmbeddedLauncher.java:170)
        at aQute.launcher.pre.EmbeddedLauncher.findAndExecute(EmbeddedLauncher.java:119)
        at aQute.launcher.pre.EmbeddedLauncher.main(EmbeddedLauncher.java:52)

The log contains this info at the end:

(00.017752) Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
(00.017792) net: Unlock network
(00.017800) Unfreezing tasks into 1
(00.017803) 	Unseizing 2 into 1
(00.017807) Error (compel/src/lib/infect.c:418): Unable to detach from 2: No such process
(00.017824) Error (criu/cr-dump.c:2093): Dumping FAILED.

Running the script via Podman inside the WSL fails with this error:

Can't exec criu swrk: Operation not permitted
Can't send request: Broken pipe
org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processes, err=-70
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(Unknown Source)
        at org.fipro.osgi.benchmark.criu.BenchmarkCRIUSupport.activate(BenchmarkCRIUSupport.java:41)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:245)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:687)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:531)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:317)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:307)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createImplementationObject(SingleComponentManager.java:354)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createComponent(SingleComponentManager.java:115)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:1002)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getServiceInternal(SingleComponentManager.java:975)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.activateInternal(AbstractComponentManager.java:785)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enableInternal(AbstractComponentManager.java:674)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enable(AbstractComponentManager.java:437)
        at org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.enableComponents(ConfigurableComponentHolder.java:671)
        at org.apache.felix.scr.impl.BundleComponentActivator.initialEnable(BundleComponentActivator.java:310)
        at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:593)
        at org.apache.felix.scr.impl.Activator.access$200(Activator.java:74)
        at org.apache.felix.scr.impl.Activator$ScrExtension.start(Activator.java:460)
        at org.apache.felix.scr.impl.AbstractExtender.createExtension(AbstractExtender.java:196)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:169)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:49)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:488)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:1)
        at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:232)
        at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:450)
        at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:949)
        at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234)
        at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:229)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:138)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:130)
        at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:217)
        at org.eclipse.osgi.container.Module.publishEvent(Module.java:499)
        at org.eclipse.osgi.container.Module.start(Module.java:486)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel$2.run(ModuleContainer.java:1852)
        at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor$1$1.execute(EquinoxContainerAdaptor.java:136)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1845)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1788)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1750)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1672)
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1)
        at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234)
        at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:345)

I have no idea what is going wrong. Is it an issue on Windows? Is it not possible to create a checkpoint for my app? Is something else missing?

The attached zip archive contains four folders. The criu_ folders contain the Dockerfiles and the scripts to build the containers. I tried to run the scripts from Windows directly and from an Ubuntu 20.04 WSL.

Maybe someone can test this on Linux or Mac? Or maybe someone sees what I have done wrong?

CRaC_Test.zip

P.S. the crac_ folders contain the same example using Azul CRaC. There I can create the checkpoints in some scenarios, but the restore fails.

@tajila
Copy link
Contributor

tajila commented Nov 15, 2023

@tajila
Copy link
Contributor

tajila commented Nov 15, 2023

If there are any active TCP connections at the time of checkpoint setTCPEstablished(true) would have to be added as well

@fipro78
Copy link
Author

fipro78 commented Nov 17, 2023

@tajila Yes, that is the service that triggers the checkpoint creation.

There shouldn't be an open TCP connection. At least none I am aware of. But maybe it is the console that opens one. Not sure. I have added setTCPEstablished(true) but the error persists. It is now just a little bit different:

Using Docker:

org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(Unknown Source)
        at org.fipro.osgi.benchmark.criu.BenchmarkCRIUSupport.activate(BenchmarkCRIUSupport.java:42)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:245)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:687)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:531)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:317)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:307)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createImplementationObject(SingleComponentManager.java:354)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createComponent(SingleComponentManager.java:115)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:1002)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getServiceInternal(SingleComponentManager.java:975)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.activateInternal(AbstractComponentManager.java:785)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enableInternal(AbstractComponentManager.java:674)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enable(AbstractComponentManager.java:437)
        at org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.enableComponents(ConfigurableComponentHolder.java:671)
        at org.apache.felix.scr.impl.BundleComponentActivator.initialEnable(BundleComponentActivator.java:310)
        at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:593)
        at org.apache.felix.scr.impl.Activator.access$200(Activator.java:74)
        at org.apache.felix.scr.impl.Activator$ScrExtension.start(Activator.java:460)
        at org.apache.felix.scr.impl.AbstractExtender.createExtension(AbstractExtender.java:196)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:169)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:49)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:488)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:1)
        at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:232)
        at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:450)
        at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:949)
        at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234)
        at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:229)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:138)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:130)
        at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:217)
        at org.eclipse.osgi.container.Module.publishEvent(Module.java:499)
        at org.eclipse.osgi.container.Module.start(Module.java:486)
        at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445)
        at aQute.launcher.Launcher.start(Launcher.java:699)
        at aQute.launcher.Launcher.startBundles(Launcher.java:679)
        at aQute.launcher.Launcher.activate(Launcher.java:585)
        at aQute.launcher.Launcher.launch(Launcher.java:404)
        at aQute.launcher.Launcher.run(Launcher.java:186)
        at aQute.launcher.Launcher.main(Launcher.java:162)
        at aQute.launcher.pre.EmbeddedLauncher.executeWithRunPath(EmbeddedLauncher.java:170)
        at aQute.launcher.pre.EmbeddedLauncher.findAndExecute(EmbeddedLauncher.java:119)
        at aQute.launcher.pre.EmbeddedLauncher.main(EmbeddedLauncher.java:52)

In the log:

(00.009253) Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
(00.009284) net: Unlock network
(00.009309) Unfreezing tasks into 1
(00.009314) 	Unseizing 7 into 1
(00.009318) Error (compel/src/lib/infect.c:418): Unable to detach from 7: No such process
(00.009344) Error (criu/cr-dump.c:2093): Dumping FAILED.

Using Podman (rootless):

Can't exec criu swrk: Operation not permitted
Can't read request: Connection reset by peer
Can't receive response: Connection reset by peer
org.eclipse.openj9.criu.SystemCheckpointException: Could not dump the JVM processes, err=-70
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVMImpl(Native Method)
        at openj9.criu/org.eclipse.openj9.criu.CRIUSupport.checkpointJVM(Unknown Source)
        at org.fipro.osgi.benchmark.criu.BenchmarkCRIUSupport.activate(BenchmarkCRIUSupport.java:42)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:245)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:687)
        at org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:531)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:317)
        at org.apache.felix.scr.impl.inject.methods.ActivateMethod.invoke(ActivateMethod.java:307)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createImplementationObject(SingleComponentManager.java:354)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.createComponent(SingleComponentManager.java:115)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:1002)
        at org.apache.felix.scr.impl.manager.SingleComponentManager.getServiceInternal(SingleComponentManager.java:975)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.activateInternal(AbstractComponentManager.java:785)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enableInternal(AbstractComponentManager.java:674)
        at org.apache.felix.scr.impl.manager.AbstractComponentManager.enable(AbstractComponentManager.java:437)
        at org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.enableComponents(ConfigurableComponentHolder.java:671)
        at org.apache.felix.scr.impl.BundleComponentActivator.initialEnable(BundleComponentActivator.java:310)
        at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:593)
        at org.apache.felix.scr.impl.Activator.access$200(Activator.java:74)
        at org.apache.felix.scr.impl.Activator$ScrExtension.start(Activator.java:460)
        at org.apache.felix.scr.impl.AbstractExtender.createExtension(AbstractExtender.java:196)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:169)
        at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:49)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:488)
        at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:1)
        at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:232)
        at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:450)
        at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:949)
        at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234)
        at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:229)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:138)
        at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:130)
        at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:217)
        at org.eclipse.osgi.container.Module.publishEvent(Module.java:499)
        at org.eclipse.osgi.container.Module.start(Module.java:486)
        at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445)
        at aQute.launcher.Launcher.start(Launcher.java:699)
        at aQute.launcher.Launcher.startBundles(Launcher.java:679)
        at aQute.launcher.Launcher.activate(Launcher.java:585)
        at aQute.launcher.Launcher.launch(Launcher.java:404)
        at aQute.launcher.Launcher.run(Launcher.java:186)
        at aQute.launcher.Launcher.main(Launcher.java:162)
        at aQute.launcher.pre.EmbeddedLauncher.executeWithRunPath(EmbeddedLauncher.java:170)
        at aQute.launcher.pre.EmbeddedLauncher.findAndExecute(EmbeddedLauncher.java:119)
        at aQute.launcher.pre.EmbeddedLauncher.main(EmbeddedLauncher.java:52)

I actually tried it following the approach described in this article: https://blog.openj9.org/2022/09/29/unprivileged-openj9-criu-support/

Reading the InstantOn limitations and known issues I noticed the section about Running without the necessary Linux capabilities. The error looks exactly the same as mine when trying to build with Podman.

So for testing I tried to create the checkpoint with Docker/Podman and --privileged instead of --cap-add, but that produces exactly the same errors.

I now tried it in a WSL2 with:

  • Ubuntu 20.04 - Podman 3.4.2
  • Ubuntu 22.04 - Podman 3.4.4
  • Debian 12 - Podman 4.3.1 - don't get Podman to work at all, the machine init process fails

Not sure what to try out next.

@ymanton
Copy link
Member

ymanton commented Nov 20, 2023

@fipro78 thanks for the test. I took a look at the contents of criu_folder_based/ and made the following adjustments to get it to work:

diff --git a/criu_folder_based/Dockerfile_checkpoint b/criu_folder_based/Dockerfile_checkpoint
index 44aa817..e45b95e 100644
--- a/criu_folder_based/Dockerfile_checkpoint
+++ b/criu_folder_based/Dockerfile_checkpoint
@@ -7,7 +7,7 @@ ENV JAVA_OPTS="${JAVA_OPTS:--Dcontainer.init=true}"

 USER root

-RUN setcap cap_checkpoint_restore,cap_sys_ptrace,cap_net_admin=eip /usr/local/sbin/criu
+RUN setcap cap_checkpoint_restore,cap_sys_ptrace,cap_setpcap=eip /usr/local/sbin/criu

 EXPOSE 11311
  1. We no longer need cap_net_admin.
  2. We need cap_setpcap so that we can drop any unnecessary caps that the container runtime may give us.
diff --git a/criu_folder_based/app/start.sh b/criu_folder_based/app/start.sh
index d38ca9d..572cce2 100644
--- a/criu_folder_based/app/start.sh
+++ b/criu_folder_based/app/start.sh
@@ -1,5 +1,9 @@
 #!/bin/sh
 echo $JAVA_OPTS_EXTRA
+for ((i=0;i<1000;i++))
+do
+    /usr/bin/true
+done
 # java $JAVA_OPTS $JAVA_OPTS_EXTRA -Dgosh.args="--nointeractive -c telnetd -i 0.0.0.0 -p 11311 start" -Dgosh.home=/app -jar org.eclipse.osgi-3.18.500.jar "$@"
 # replace the line above with the following line to open the app with a console instead of a socket
-java $JAVA_OPTS $JAVA_OPTS_EXTRA -Dgosh.home=/app -jar org.eclipse.osgi-3.18.500.jar -console "$@"
+java $JAVA_OPTS $JAVA_OPTS_EXTRA -Dgosh.home=/app -jar org.eclipse.osgi-3.18.500.jar -console "$@" 0</dev/null 1>/app/checkpointData/stdout 2>/app/checkpointData/stderr

When CRIU restores a checkpointed process the original PID and TIDs of the process must be available. When we run in containers we are in a new (empty) PID namespace, so processes tend to get very low PIDs/TIDs. This is a problem for CRIU because on restore unless you pay close attention to every process that starts and their ordering it's likely that at least some of the required low PIDs/TIDs are taken.

We work around this by invoking a dummy command 1000 times so the Java process can start with PID/TIDs >1000, which on restore are very likely to be free.

In unprivileged mode we can't easily re-attach std out/err/in to terminals, we need to tie stdin to /dev/null and redirect stdout and stderr to files, and those files need to be present at the same path on restore. The easiest solution is to just dump them to the checkpoint data dir. Elsewhere I've made changes to make sure is available at the same path on restore.

diff --git a/criu_folder_based/Dockerfile b/criu_folder_based/Dockerfile
index 3fbb93c..9c42eb7 100644
--- a/criu_folder_based/Dockerfile
+++ b/criu_folder_based/Dockerfile
@@ -1,6 +1,6 @@
 FROM osgi_deployment_criu_checkpoint

-COPY checkpointData /checkpointData
+COPY checkpointData /app/checkpointData

-CMD ["criu", "restore", "-D", "./checkpointData", "--shell-job", "-v4", "--log-file=restore.log"]
+CMD ["criu", "restore", "--unprivileged", "-D", "/app/checkpointData", "--shell-job", "-v4", "--log-file=restore.log"]
 # CMD ["bash"]

Here we need to pass --unprivileged to CRIU, and we also need to make sure the checkpoint data directory is at the same path so CRIU can restore the Java process's stdout and err links to the corresponding files.

diff --git a/criu_folder_based/build_criu_image_docker.sh b/criu_folder_based/build_criu_image_docker.sh
index 218c3e1..a53dae5 100644
--- a/criu_folder_based/build_criu_image_docker.sh
+++ b/criu_folder_based/build_criu_image_docker.sh
@@ -12,7 +12,8 @@ docker build -t osgi_deployment_criu_checkpoint -f ./Dockerfile_checkpoint .
 # run the container with necessary capabilities
 docker run \
 -it \
---cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --cap-add=NET_ADMIN \
+--cap-drop=ALL --cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --cap-add=SETPCAP \
+--security-opt seccomp=unconfined \
 --name osgi_deployment_criu_checkpoint \
 osgi_deployment_criu_checkpoint

@@ -22,4 +23,6 @@ docker cp osgi_deployment_criu_checkpoint:/app/checkpointData/. ./checkpointData
 docker rm osgi_deployment_criu_checkpoint

 # create a new image from the previous one that adds the checkpoint files and exchanges the JAVA_OPTS_EXTRA
-docker build -t osgi_deployment_criu .
+docker build -t osgi_deployment_criu .
+
+echo "docker run --rm --cap-drop=ALL --cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --cap-add=SETPCAP --security-opt seccomp=unconfined osgi_deployment_criu:latest"

For Docker we first need to make sure we are using Docker 23.x or later. The 20.x versions do not support CAP_CHECKPOINT_RESTORE. Second we need to add --security-opt seccomp=unconfined. In order to run the result you can use the docker run command I added to the end.

diff --git a/criu_folder_based/build_criu_image_podman.sh b/criu_folder_based/build_criu_image_podman.sh
index 09dc725..ec72a12 100644
--- a/criu_folder_based/build_criu_image_podman.sh
+++ b/criu_folder_based/build_criu_image_podman.sh
@@ -7,19 +7,22 @@ rm -rf checkpointData
 mkdir checkpointData

 # first build the image to create the checkpoint
-podman build -t osgi_deployment_criu_checkpoint -f ./Dockerfile_checkpoint .
+sudo podman build -t osgi_deployment_criu_checkpoint -f ./Dockerfile_checkpoint .

 # run the container with necessary capabilities
-podman run \
+sudo podman run \
 -it \
---cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --cap-add=NET_ADMIN \
+--cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --cap-add=SETPCAP \
+--security-opt seccomp=unconfined \
 --name osgi_deployment_criu_checkpoint \
 osgi_deployment_criu_checkpoint

 # copy the checkpointData from the
-podman cp osgi_deployment_criu_checkpoint:/app/checkpointData/. ./checkpointData
+sudo podman cp osgi_deployment_criu_checkpoint:/app/checkpointData/. ./checkpointData

-podman rm osgi_deployment_criu_checkpoint
+sudo podman rm osgi_deployment_criu_checkpoint

 # create a new image from the previous one that adds the checkpoint files and exchanges the JAVA_OPTS_EXTRA
-podman build -t osgi_deployment_criu .
+sudo podman build -t osgi_deployment_criu .
+
+echo "sudo podman run --rm --cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --cap-add=SETPCAP osgi_deployment_criu:latest"

For Podman we first and foremost have to run as root. In order for options like --cap-add to be effective Podman needs to be root to be able to grant the capabilities to the containers it creates. Under Docker the Docker daemon creates containers and it runs as root, but under Podman we have to do that ourselves. Everything else is the same as Docker, except that we don't need --cap-drop or the second --security-opt seccomp=unconfined to restore the checkpoint (but we still do need it to dump, same as Docker).


As you can see there are a few extra steps in the process, and currently they're not well documented. In the near future we're hoping to have some helper scripts and commands to make life easier. Most of this works pretty well in Open Liberty because we do have the necessary scripts and helpers there, but the same experience isn't yet available in Semeru images, but will be.

Also note that rather than copying the checkpoint data from the first container back to the host, then building a new image and copying the data in, you can use a slightly different method: you can commit the original container and use the --change option to set CMD ["criu", "restore", ...]. This ensures that you'll be checkpointing and restoring in more or less the same environment and works with both Docker and Podman.

@fipro78
Copy link
Author

fipro78 commented Dec 6, 2023

@ymanton
Thank you very much for your explanations. After fixing some environmental issues on my side and fixing my example, I am now able to create an image with checkpoint data that really works. I can even use the socket of the Gogo shell and connect on the running container via telnet now.

Attached the updated example: criu_test.zip

I will try to include this into my GitHub project, so all the Java deployment options for OSGi applications are collected in one place.

I still have one question (at least, probably a lot more when it comes to details ;) ). Is it possible to create a checkpoint using the org.crac API instead of the OpenJ9 API? What I have seen so far is that the OpenJ9 CRIU API has a lot more options. Is there maybe some way to use the org.crac API and pass some configuration that would translate the checkpoint creation via org.crac to the respective OpenJ9 calls? That would make it possible to write code for creating the checkpoint without the need to have an OpenJ9 JDK for Linux installed.

Right now I needed to create a devcontainer to develop the code for the checkpoint creation, because my host system is Windows.

BTW, with your suggestions I also got the image creation working on Windows with Podman. I had to configure the machine as rootful, as on Windows there is no sudo concept.

@ymanton
Copy link
Member

ymanton commented Dec 6, 2023

Is it possible to create a checkpoint using the org.crac API instead of the OpenJ9 API? What I have seen so far is that the OpenJ9 CRIU API has a lot more options. Is there maybe some way to use the org.crac API and pass some configuration that would translate the checkpoint creation via org.crac to the respective OpenJ9 calls? That would make it possible to write code for creating the checkpoint without the need to have an OpenJ9 JDK for Linux installed.

Currently no, but I think we are looking at this. @tajila can probably elaborate.

@tajila
Copy link
Contributor

tajila commented Dec 7, 2023

What I have seen so far is that the OpenJ9 CRIU API has a lot more options. Is there maybe some way to use the org.crac API and pass some configuration that would translate the checkpoint creation via org.crac to the respective OpenJ9 calls?

Currently no, but I think we are looking at this. @tajila can probably elaborate.

Yes, we are currently working on adding some support for this. We should have something in Q1 2024.

@fipro78
Copy link
Author

fipro78 commented Dec 11, 2023

@tajila
Thanks for the information. Please let me know once you have something ready for testing. I can then try it out on my example. Maybe I have a second (probably more usable) one by then, and can give you some feedback if you like.

@tajila
Copy link
Contributor

tajila commented May 21, 2024

@fipro78 Sorry for the late notice. If you try a recent version of OpenJ9 (https://github.com/ibmruntimes/semeru17-binaries/releases/tag/jdk-17.0.11%2B7_openj9-0.44.0-m2) you'll be able to use -XX:CRaCCheckpointTo= along with the org.crac APIs. -XX:CRaCRestoreFrom= is still in progress but it will be equivalent to doing criu restore ... so you have a work around for that.

This year we are planning broader support for checkpoint/restore on Semeru so Ill can keep you posted on new developments.

@fipro78
Copy link
Author

fipro78 commented Jun 13, 2024

@tajila
Sorry for the late reply. I started to work on an example with a Jetty server to show a more useful use case. I tried the example with the OpenJDK version and create a checkpoint and register resources using the org.crac API. This works fine, at least with the JDK variant. The OpenJDK JRE variant is missing the jdk.crac resources. I contacted the OpenJDK team about this already.

Then I tried it with the OpenJ9 JDK and JRE. I used one of the available containers, but in both cases the jdk.crac classes can not be found.

Dockerfile

# FROM icr.io/appcafe/ibm-semeru-runtimes:open-17-jre-ubi9
# FROM icr.io/appcafe/ibm-semeru-runtimes:open-21-jre-ubi9
FROM icr.io/appcafe/ibm-semeru-runtimes:open-21-jdk-ubi9

ENV JAVA_OPTS_EXTRA="\
-XX:CRaCCheckpointTo=/app/crac-files \
-Djdk.crac.resource-policies=/app/crac_fd_policies.yaml \
-Dorg.crac.Core.Compat=jdk.crac"

USER root

EXPOSE 8080

# copy the application jar to the image
COPY app.jar /app/
# copy the file descriptor policies to the image
COPY crac_fd_policies.yaml /app/
# copy the shell scripts to the image
COPY start.sh /app/

# create the folder for the crac files inside the image
RUN \
  mkdir -p /app/crac-files && \
  chmod 755 /app/start.sh

# start the application for checkpoint creation
WORKDIR /app
CMD ["./start.sh"]

start.sh

#!/bin/sh

# invoke a dummy command 1000 times so the Java process can start with PID/TIDs >1000, which on restore are very likely to be free
for i in $(seq 1000)
do
    /bin/true
done

java $JAVA_OPTS $JAVA_OPTS_EXTRA -jar app.jar

Service class that creates the checkpoint using the org.crac API

package org.fipro.service.modifier.crac;

import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

import org.crac.CheckpointException;
import org.crac.Core;
import org.crac.RestoreException;
import org.osgi.service.component.annotations.Activate;
import org.osgi.service.component.annotations.Component;

@Component
public class CheckpointCreationComponent {
	
    @Activate
    void activate() {
        Executors.newSingleThreadScheduledExecutor().schedule(() -> {
            try {
                Core.checkpointRestore();

                // Note:
                // any code written after checkpointRestore() will be executed on restore

            } catch (CheckpointException|RestoreException|UnsupportedOperationException e) {
                e.printStackTrace();
            }
        }, 
        5, TimeUnit.SECONDS);
    }
}

When I try to run the example with an OpenJ9 I get the following exception (I only see it because I locally patched the org.crac jar to print out the exception, otherwise it is catched silent):

java.lang.ClassNotFoundException: jdk.crac.Resource

I checked the Java version in the container, and got the following output:

openjdk 21.0.3 2024-04-16 LTS
IBM Semeru Runtime Open Edition 21.0.3.0 (build 21.0.3+9-LTS)
Eclipse OpenJ9 VM 21.0.3.0 (build openj9-0.44.0, JRE 21 Linux amd64-64-Bit Compressed References 20240416_177 (JIT enabled, AOT enabled)
OpenJ9   - b0699311c7
OMR      - 254af5a04
JCL      - ac8f341bc20 based on jdk-21.0.3+9)

According to your message, I would assume it should support the org.crac API. But it looks like it doesn't.

I downloaded the archive and also do not see the jdk.crac module in there. Have you dropped it from the 0.44.0 release? Is there anything I have to do additionally to make the org.crac API support to work?

@tajila
Copy link
Contributor

tajila commented Jun 13, 2024

@fipro78 Thanks for your response.

I dont have access to your example so I made a little one of my own:

Dockerfile

FROM icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi9

ENV JAVA_OPTS_EXTRA="\
-XX:CRaCCheckpointTo=/app/checkpointData \
-Dorg.crac.Core.Compat=jdk.crac"

USER root

COPY CracExample.java /app/

COPY start.sh /app/

RUN mkdir -p /app/checkpointData

RUN chmod 755 /app/start.sh

WORKDIR /app

RUN curl -L -o org.crac-1.4.0.tar.gz https://github.com/CRaC/org.crac/archive/refs/tags/1.4.0.tar.gz

RUN tar -xzf org.crac-1.4.0.tar.gz

RUN javac -sourcepath ./org.crac-1.4.0/src/main/java/ CracExample.java

CMD ["./start.sh"]

start.sh

#!/bin/sh

# invoke a dummy command 1000 times so the Java process can start with PID/TIDs >1000, which on restore are very likely to be free
for i in $(seq 1000)
do
    /bin/true
done

java -cp ./org.crac-1.4.0/src/main/java/.:. -XX:CRaCCheckpointTo=checkpointData CracExample

CracExample.java

import org.crac.CheckpointException;
import org.crac.Core;
import org.crac.RestoreException;

public class CracExample {
	public static void main(String[] args) {
		try {
			System.out.println("pre-checkpoint");
			Core.checkpointRestore();
			System.out.println("post-restore");
		} catch (CheckpointException|RestoreException|UnsupportedOperationException e) {
                	e.printStackTrace();
            	}
	}
}

Then podman build -t j9:crac . followed with podman run --privileged -ti j9:crac

pre-checkpoint
/app/./start.sh: line 9:  1003 Killed                  java -cp ./org.crac-1.4.0/src/main/java/.:. -XX:CRaCCheckpointTo=checkpointData CracExample

So it seems to work for me. Perhaps you can send me the .jar file in your example.

@tajila
Copy link
Contributor

tajila commented Jun 13, 2024

I downloaded the archive and also do not see the jdk.crac module in there. Have you dropped it from the 0.44.0 release? Is there anything I have to do additionally to make the org.crac API support to work?

The crac support is in java.base in the [JDK]/jmods/java.base/classes/jdk/crac/ dir

@fipro78
Copy link
Author

fipro78 commented Jun 13, 2024

@tajila
Thanks for the fast response. If the classes are available (I have not found them, thanks for the pointer), then there is either a classloading issue. Or a conceptual issue.

The example is actually some work in progress, because there are multiple levels to fix before it can be published.

  1. org.crac API needs the OSGi metadata
    I have created a PR, so hopefully it gets merged soon: Add generation of OSGi metadata and JPMS module-info CRaC/org.crac#11

  2. I have added a org.crac.Resource to the Jakarta-RS Whiteboard implementation. There is no PR yet, because org.crac does not yet contain OSGi metadata. But I have it in a fork, so I can already test it
    osgi/jakartarest-osgi@main...fipro78:jakartarest-osgi:main

  3. And then I have created a Jakarta-RS Whiteboard application with the immediate component from my comment above, that creates the checkpoint.

As I said, it is work in progress, so I have not published it yet. But the goal is to publish it together with a blog post soon, so everybody is able to follow and reproduce things. I have attached the example so you can try. Hope that already helps.

CRaC_Jetty_JakartaRS.zip

What I noticed a few hours ago is, that the concept about registering a Resource vs. the Hooks for closing and reopening resources, is different in CRaC and OpenJ9 CRIU. With CRaC I can register a Resource where I have access to it. For example in the Jakarta-RS Whiteboard at the location where I have access to the Jetty server instance. It is decoupled from the place where the checkpoint creation is done (in my example the immediate component). With the OpenJ9 API it seems the hooks need to be registered where the checkpoint creation should be performed. At least I have that assumption now, as I tried to get it to work for a while, similar to the Resource, but the hooks need to be registered on a CRIUSupport instance. And then I have two instances that seem to collide. Or I have not yet understood how that should work with the OpenJ9 CRIU API.

Anyhow, either there is a classloading issue somehow, or the way how Resources are handled does not correctly work.

If you have any idea, let me know. I am currently trying to polish the example as much as possible. Then I can push it to my repo and then extend it step by step.

@tajila
Copy link
Contributor

tajila commented Jun 18, 2024

@fipro78 I was able to reproduce the issue, still investigating why it can't find the jdk.crac class.

What I noticed a few hours ago is, that the concept about registering a Resource vs. the Hooks for closing and reopening resources, is different in CRaC and OpenJ9 CRIU. With CRaC I can register a Resource where I have access to it.

No, there is no restriction on where, or when the hooks can be added so long as it is added to the CRIUSupport instance before checkpoint creation is done. So in that regard, it is identical to CraC. One thing that is unique to J9 is that you have the ability many CRIUSupport instances, but that doesnt mean you necessarily need many. You could create a singleton instance (similar to CraC) and make it accessible via an API or public static then add hooks wherever you need.

Anyhow, either there is a classloading issue somehow, or the way how Resources are handled does not correctly work.

Can you please elaborate on this?

I hope to resolve the jdk.crac issue by EOW, but either case Ill post an update here.

@fipro78
Copy link
Author

fipro78 commented Jun 18, 2024

@tajila
Thanks for the update. What I noticed is that using crac I can have a resource where the server is created and managed, but I can create the checkpoint at some other location. This is what I do in my example. Not the place where the server is needs to be triggered for checkpoint creation, I can trigger the creation anywhere. With OpenJ9 I need to extend the server part.

I also noticed a possible threading issue. OSGi and the services is a multi threaded application. Maybe this is causing issues?

I also try to use the Openj9 API in the example. But there I get again the error

suspending seccomp failed: Operation not permitted

I will try to polish my example so that it is easy to look into it. I really don't get what is happening, as I did all of the steps you mentioned on the way.

@fipro78
Copy link
Author

fipro78 commented Jun 19, 2024

I did another local modification to the org.crac API additionally to printing out the exception. The org.crac API uses Reflection to find the real crac implementation classes. It uses Class.forName() which is considered a bad practice in OSGi, as it might use the wrong ClassLoader. As I got the OSGi example actually working with OpenJDK CRaC, I suppose it works here with my modifications that got merged lately. But to be sure, I changed it to getClassLoader().loadClass().

The ClassNotFoundException persists. Only the stacktrace is different then.

With Class.forName()

java.lang.ClassNotFoundException: jdk.crac.Resource
        at java.base/java.lang.Class.forNameImpl(Native Method)
        at java.base/java.lang.Class.forName(Class.java:372)
        at java.base/java.lang.Class.forName(Class.java:350)
        at org.crac.Core$Compat.<init>(Core.java:128)
        at org.crac.Core.loadCompat(Core.java:192)
        at org.crac.Core.<clinit>(Core.java:211)
        at org.eclipse.osgitech.rest.jetty.crac.JettyBackedWhiteboardComponent.activate(JettyBackedWhiteboardComponent.java:121)

With getClassLoader().loadClass()

java.lang.ClassNotFoundException: jdk.crac.Resource cannot be found by org.crac_1.4.1.202406190425
        at org.eclipse.osgi.internal.loader.BundleLoader.generateException(BundleLoader.java:541)
        at org.eclipse.osgi.internal.loader.BundleLoader.findClass0(BundleLoader.java:536)
        at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:416)
        at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:168)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:1102)
        at org.crac.Core$Compat.<init>(Core.java:128)
        at org.crac.Core.loadCompat(Core.java:192)
        at org.crac.Core.<clinit>(Core.java:204)
        at org.eclipse.osgitech.rest.jetty.crac.JettyBackedWhiteboardComponent.activate(JettyBackedWhiteboardComponent.java:121)

BTW, if I look at the module-info.class of the java.base module of the OpenJ9 JRE, I do not find the jdk.crac package listed. Maybe that is the issue?

@fipro78
Copy link
Author

fipro78 commented Jun 20, 2024

@tajila
I got the ClassNotFoundException solved locally. After thinking about it for a while, I remembered that I can add additional system packages in the bndrun file. After I added

-runsystempackages: jdk.crac

the exceptions disappear. This is not necessary for the OpenJDK CRaC implementation. To be honest, I am not sure why the additional packages are resolved as system packages in the OpenJDK and not in OpenJ9. But maybe it gives you a hint. Or you tend to say it is not an issue of OpenJ9 and needs to be documented in some other place.

But now I get the following exception:

openj9.internal.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
        at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
        at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:999)
        at java.base/jdk.crac.CRIUSupportContext.checkpointJVM(Core.java:122)
        at java.base/jdk.crac.Core.checkpointRestore(Core.java:62)

The created criu.log contains the following entries:

Error (criu/tun.c:85): tun: Unable to create tun: No such file or directory
Warn  (criu/sk-unix.c:224): unix: Unable to open a socket file: Operation not permitted
Error (criu/net.c:3770): net: Unable create a network namespace: Operation not permitted
Warn  (criu/net.c:3826): net: NSID isn't reported for network links
Warn  (criu/net.c:3486): net: Unable to get socket network namespace
Error (criu/kerndat.c:1554): Unable create a network namespace: Operation not permitted
Error (criu/util.c:1495): Can't wait or bad status: errno=0, status=256
Error (criu/kerndat.c:1718): kerndat_has_nftables_concat failed when initializing kerndat.

I am using Docker inside a WSL on Windows. I will now also try to run it with Podman. But I need to setup everything again. Just wanted to update you with my observations.

I added the updated example with the systempackage modification, in case you want to test it.

CRaC_OpenJ9_Jetty_JakartaRS.zip

@tajila
Copy link
Contributor

tajila commented Jun 20, 2024

@ymanton Can you please help Dirk resolve the CRIU errors.

This is not necessary for the OpenJDK CRaC implementation. To be honest, I am not sure why the additional packages are resolved as system packages in the OpenJDK and not in OpenJ9. But maybe it gives you a hint. Or you tend to say it is not an issue of OpenJ9 and needs to be documented in some other place.

Ill take a look at this. We put jdk.crac in java.base which is loaded by the system loader. Perhaps this is the key difference between the openJDK version. In which case I would be fine with changing it to match OpenJDK. Another difference, is the jdk.crac package is not exported automatically due to compliance reasons, so this is why it doesnt show up in the module-info.class. We dynamically export it at rutime.

@JasonFengJ9
Copy link
Member

OpenJDK exports jdk.crac; in java.base/share/classes/module-info.java.

Another difference, is the jdk.crac package is not exported automatically due to compliance reasons, so this is why it doesnt show up in the module-info.class. We dynamically export it at rutime.

This seems the cause. -runsystempackages: jdk.crac effectively exports the package as per https://bnd.bndtools.org/instructions/runsystempackages.html.

@ymanton
Copy link
Member

ymanton commented Jun 24, 2024

@tajila I got the ClassNotFoundException solved locally. After thinking about it for a while, I remembered that I can add additional system packages in the bndrun file. After I added

-runsystempackages: jdk.crac

the exceptions disappear. This is not necessary for the OpenJDK CRaC implementation. To be honest, I am not sure why the additional packages are resolved as system packages in the OpenJDK and not in OpenJ9. But maybe it gives you a hint. Or you tend to say it is not an issue of OpenJ9 and needs to be documented in some other place.

But now I get the following exception:

openj9.internal.criu.SystemCheckpointException: Could not dump the JVM processes, err=-52
        at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVMImpl(Native Method)
        at java.base/openj9.internal.criu.InternalCRIUSupport.checkpointJVM(InternalCRIUSupport.java:999)
        at java.base/jdk.crac.CRIUSupportContext.checkpointJVM(Core.java:122)
        at java.base/jdk.crac.Core.checkpointRestore(Core.java:62)

The created criu.log contains the following entries:

Error (criu/tun.c:85): tun: Unable to create tun: No such file or directory
Warn  (criu/sk-unix.c:224): unix: Unable to open a socket file: Operation not permitted
Error (criu/net.c:3770): net: Unable create a network namespace: Operation not permitted
Warn  (criu/net.c:3826): net: NSID isn't reported for network links
Warn  (criu/net.c:3486): net: Unable to get socket network namespace
Error (criu/kerndat.c:1554): Unable create a network namespace: Operation not permitted
Error (criu/util.c:1495): Can't wait or bad status: errno=0, status=256
Error (criu/kerndat.c:1718): kerndat_has_nftables_concat failed when initializing kerndat.

I am using Docker inside a WSL on Windows. I will now also try to run it with Podman. But I need to setup everything again. Just wanted to update you with my observations.

I added the updated example with the systempackage modification, in case you want to test it.

CRaC_OpenJ9_Jetty_JakartaRS.zip

I looked at the example. The checkpoint fails because CRIU is being invoked in privileged mode and the user in the container doesn't have sufficient privileges for that.

I can't see the application code, but either the application or the OpenJ9 CRaC layer has to call

public CRIUSupport setUnprivileged(boolean unprivileged) {
so that the checkpoint will be performed in unprivileged mode. @tajila not sure if the CRaC API has notions of privileged/unprivileged, but if not I assume we'd have to set unprivileged ourselves.

@tajila
Copy link
Contributor

tajila commented Jun 25, 2024

so that the checkpoint will be performed in unprivileged mode. @tajila not sure if the CRaC API has notions of privileged/unprivileged, but if not I assume we'd have to set unprivileged ourselves.

CRAC does not, but we can add -Dopenj9.internal.criu.unprivilegedMode=true to achieve the same effect

@fipro78
Copy link
Author

fipro78 commented Jun 25, 2024

I just tried this. Now the error is

(00.024197) Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
(00.024283) net: Unlock network
(00.024296) Unfreezing tasks into 1
(00.024299)     Unseizing 1008 into 1
(00.024304) Error (compel/src/lib/infect.c:418): Unable to detach from 1008: No such process
(00.024320) Error (criu/cr-dump.c:2098): Dumping FAILED.

I tried to run with Docker in a Ubuntu Linux WSL. BTW, the same error I get when I try to use the CRIUSupport API directly.

My example is available here: https://github.com/fipro78/osgi-jakartars

I am still writing on the blog post and on the example for the CRIUSupport API. If you want to try out the example, you first need to run mvn clean verify in the jakartars folder. This will build the application jar and copy them to the checkpoint_container folder. There you can then execute the build scripts to create the containers.

Hope it helps to identify the issue.

@ymanton
Copy link
Member

ymanton commented Jun 25, 2024

@fipro78 your updated example works for me. What version of Ubuntu are you using in WSL? I assume 22.04? What is the kernel version (uname -r output)?

@fipro78
Copy link
Author

fipro78 commented Jun 26, 2024

PS C:\Users\xxx> wsl -v
WSL-Version: 2.1.5.0
Kernelversion: 5.15.146.1-2
WSLg-Version: 1.0.60
MSRDC-Version: 1.2.5105
Direct3D-Version: 1.611.1-81528511
DXCore-Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows-Version: 10.0.22631.3737

$ uname -r
5.15.146.1-microsoft-standard-WSL2

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

@fipro78
Copy link
Author

fipro78 commented Jun 28, 2024

I have updated my example and added the openj9 branch to see if using the CRIUSupport API works.

https://github.com/fipro78/osgi-jakartars/blob/9c06a9b2bc2be41bb83861d5275ef4e46db0e0b4/jakartars/criu/src/main/java/org/fipro/service/modifier/criu/jetty/JettyBackedWhiteboardComponent.java#L102-L128

I tried both variants (OpenJ9 with CRaC and OpenJ9 CRIUSupport) on Windows host using Podman, Windows Host using Docker and in Ubuntu WSL using Docker. I currently don't have a WSL with Podman. Need to check that.

Using the scripts in the checkpoint_container folder it should be easy to build the containers for checkpoint creation

Windows + Podman

OpenJ9 CRaC Support

build_criu.bat podman ubi_crac_jre

Result in criu.log:

Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
Error (compel/src/lib/infect.c:418): Unable to detach from 1003: No such process
Error (criu/cr-dump.c:2098): Dumping FAILED

OpenJ9 CRIUSupport

build_criu.bat podman ubi_openj9_jre

Result in criu.log:

(00.025928) Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
(00.025997) net: Unlock network
(00.026023) Unfreezing tasks into 1
(00.026027)     Unseizing 1003 into 1
(00.026031) Error (compel/src/lib/infect.c:418): Unable to detach from 1003: No such process
(00.026047) Error (criu/cr-dump.c:2098): Dumping FAILED.

Windows + Docker

OpenJ9 CRaC Support

build_criu.bat docker ubi_crac_jre

Result in criu.log:

Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
Error (compel/src/lib/infect.c:418): Unable to detach from 1003: No such process
Error (criu/cr-dump.c:2098): Dumping FAILED

OpenJ9 CRIUSupport

build_criu.bat docker ubi_openj9_jre

Result in criu.log:

(00.025928) Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
(00.025997) net: Unlock network
(00.026023) Unfreezing tasks into 1
(00.026027)     Unseizing 1003 into 1
(00.026031) Error (compel/src/lib/infect.c:418): Unable to detach from 1003: No such process
(00.026047) Error (criu/cr-dump.c:2098): Dumping FAILED.

Ubuntu WSL + Docker

OpenJ9 CRaC Support

./build_criu.sh docker ubi_crac_jre

Result in criu.log:

Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Warn  (criu/kerndat.c:1153): $XDG_RUNTIME_DIR not set. Cannot find location for kerndat file
Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
Error (compel/src/lib/infect.c:418): Unable to detach from 1008: No such process
Error (criu/cr-dump.c:2098): Dumping FAILED.

OpenJ9 CRIUSupport

./build_criu.sh docker ubi_openj9_jre

Result in criu.log:

(00.025547) Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted
(00.025636) net: Unlock network
(00.025648) Unfreezing tasks into 1
(00.025657)     Unseizing 1008 into 1
(00.025665) Error (compel/src/lib/infect.c:418): Unable to detach from 1008: No such process
(00.025681) Error (criu/cr-dump.c:2098): Dumping FAILED.

The errors are always the same. But I don't know what is wrong. Have I missed something? Is something not correctly configured?

@fipro78
Copy link
Author

fipro78 commented Jul 4, 2024

@ymanton / @tajila
Sorry for pinging, but do you have any updates on this. I have almost finished my example and blog post, but I am not yet able to verify the OpenJ9 part. Otherwise I need to somehow setup a Linux environment, but it would have been nice if it works inside a WSL.

BTW, one question to the IBM Semeru containers. When I try to create a new directory I get a

mkdir: cannot create directory ‘/app/checkpoint’: Permission denied

It only works if I call

USER root

in the Dockerfile before. Is this by intention? I haven't found any clue about this anywhere. And for the images on DockerHub it is not necessary. At least not from the description on the page.
https://hub.docker.com/_/ibm-semeru-runtimes

@ymanton
Copy link
Member

ymanton commented Jul 4, 2024

@fipro78 I'll try via WSL today and get back to you.

Just so I understand what you've tried, "Windows + Podman" and "Windows + Docker" are the native Windows versions of Podman and Docker, both using WSL2 under the covers? And "Ubuntu WSL + Docker" is the Linux version of Docker, installed from the Ubuntu repository inside your Ubuntu WSL2 environment?

BTW, one question to the IBM Semeru containers. When I try to create a new directory I get a

The Semeru images based on UBI 8 and 9 specify a non-root user (USER 1001) to run as inside the container; the ones based on Ubuntu do not. https://hub.docker.com/_/ibm-semeru-runtimes only has Ubuntu images so you'll be running as root inside the container, so the examples on the page are correct for those images, but not for the UBI-based images.

For the UBI-based images you can do USER root if you want, and then change it back at the end, or leave it, or copy your files to a directory that non-root users can write to. For InstantOn we don't typically run as root inside the container.

@fipro78
Copy link
Author

fipro78 commented Jul 4, 2024

Thanks for the clarification.

And yes your assumptions are correct.

@ymanton
Copy link
Member

ymanton commented Jul 4, 2024

Inside a fresh Windows VM I installed WSL:

wsl.exe --install --distribution Ubuntu-22.04

Inside the Ubuntu WSL env I created a user when prompted, installed Docker, added myself to the docker group, downloaded the latest zip file, modified the Dockerfile to add the unprivileged option:

ENV JAVA_OPTS_EXTRA="\
-XX:CRaCCheckpointTo=/app/checkpoint \
-Djdk.crac.resource-policies=/app/crac_fd_policies.yaml \
-Dorg.crac.Core.Compat=jdk.crac \
-Dopenj9.internal.criu.unprivilegedMode=true"

and ran build script; the output was:

ymanton@c21836v1:~/CRaC_OpenJ9_Jetty_JakartaRS$ bash -x ./build_criu_image_docker.sh
+ CHECKPOINT_NAME=criu_checkpoint
+ RESTORE_NAME=criu_restore
+ docker build -t criu_checkpoint -f criu_ubi9_jre.Dockerfile .
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon     16MB
Step 1/10 : FROM icr.io/appcafe/ibm-semeru-runtimes:open-21-jdk-ubi9
 ---> 57c44a594984
Step 2/10 : ENV JAVA_OPTS_EXTRA="-XX:CRaCCheckpointTo=/app/checkpoint -Djdk.crac.resource-policies=/app/crac_fd_policies.yaml -Dorg.crac.Core.Compat=jdk.crac -Dopenj9.internal.criu.unprivilegedMode=true"
 ---> Running in fdf804fcb594
Removing intermediate container fdf804fcb594
 ---> 0ec5ea104247
Step 3/10 : USER root
 ---> Running in 5b7fbca0a176
Removing intermediate container 5b7fbca0a176
 ---> c833c874641f
Step 4/10 : EXPOSE 8080
 ---> Running in 6659cfa2977a
Removing intermediate container 6659cfa2977a
 ---> cadef412215c
Step 5/10 : COPY app-crac.jar /app/app.jar
 ---> 95b1b4d55fd0
Step 6/10 : COPY crac_fd_policies.yaml /app/
 ---> fbb00a2934c7
Step 7/10 : COPY start.sh /app/
 ---> 2845480ea5d7
Step 8/10 : RUN   mkdir -p /app/checkpoint &&   chmod 755 /app/start.sh
 ---> Running in 4c6ac422b0fd
Removing intermediate container 4c6ac422b0fd
 ---> 09633e9e481e
Step 9/10 : WORKDIR /app
 ---> Running in 76496c0faaac
Removing intermediate container 76496c0faaac
 ---> 029a0d630fd8
Step 10/10 : CMD ["./start.sh"]
 ---> Running in 84e278ed6ace
Removing intermediate container 84e278ed6ace
 ---> 05ca684e9a97
Successfully built 05ca684e9a97
Successfully tagged criu_checkpoint:latest
+ docker run -it --cap-drop=ALL --cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --cap-add=SETPCAP --security-opt seccomp=unconfined --name criu_checkpoint criu_checkpoint
Jul 04, 2024 6:14:49 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider com.sun.tools.rngdatatype.helpers.ProxyDatatypeLibraryFactory of service com.sun.tools.rngdatatype.DatatypeLibraryFactory in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider com.sun.tools.xjc.addon.code_injector.PluginImpl of service com.sun.tools.xjc.Plugin in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider com.sun.tools.xjc.addon.locator.SourceLocationAddOn of service com.sun.tools.xjc.Plugin in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider com.sun.tools.xjc.addon.sync.SynchronizedMethodAddOn of service com.sun.tools.xjc.Plugin in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider com.sun.tools.xjc.addon.at_generated.PluginImpl of service com.sun.tools.xjc.Plugin in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider com.sun.tools.xjc.addon.episode.PluginImpl of service com.sun.tools.xjc.Plugin in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider com.sun.tools.xjc.addon.accessors.PluginImpl of service com.sun.tools.xjc.Plugin in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider org.glassfish.jaxb.runtime.v2.JAXBContextFactory of service jakarta.xml.bind.JAXBContextFactory in bundle com.sun.xml.bind.jaxb-osgi
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider org.eclipse.jetty.http.Http1FieldPreEncoder of service org.eclipse.jetty.http.HttpFieldPreEncoder in bundle org.eclipse.jetty.http
Jul 04, 2024 6:14:50 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider org.eclipse.osgitech.rest.provider.jakartars.RuntimeDelegateService of service jakarta.ws.rs.ext.RuntimeDelegate in bundle org.eclipse.osgitech.rest
Jul 04, 2024 6:14:50 PM org.eclipse.osgitech.rest.runtime.common.JerseyBundleTracker updateCondition
INFO: Registered Jersey runtime condition
Jul 04, 2024 6:14:51 PM org.apache.aries.spifly.BaseActivator log
INFO: Registered provider org.eclipse.osgitech.rest.sse.SseSourceBuilderService of service jakarta.ws.rs.sse.SseEventSource$Builder in bundle org.glassfish.jersey.media.jersey-media-sse
Jul 04, 2024 6:14:51 PM org.fipro.service.modifier.crac.jetty.JettyServerRunnable run
INFO: Started Jersey server at port 8080 successfully try http://localhost:8080
Jul 04, 2024 6:14:51 PM org.fipro.service.modifier.crac.jetty.JettyServerRunnable$1 lifeCycleStarting
INFO: lifeCycleStarting
[pool-4-thread-1] INFO org.eclipse.jetty.server.Server - jetty-11.0.19; built: 2023-12-15T20:54:39.802Z; git: f781e475c8fa9e9c8ce18b1eaa03110d510f905f; jvm 21.0.3+9-LTS
[pool-4-thread-1] INFO org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@c2fd6bae{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
[pool-4-thread-1] INFO org.eclipse.jetty.server.Server - Started Server@e4ca5177{STARTING}[11.0.19,sto=0] @4145ms
Jul 04, 2024 6:14:51 PM org.fipro.service.modifier.crac.jetty.JettyServerRunnable$1 lifeCycleStarted
INFO: lifeCycleStarted
Jul 04, 2024 6:14:51 PM org.fipro.service.modifier.crac.jetty.JettyBackedWhiteboardComponent startServer
INFO: Started Jakartars whiteboard server for port: 8080 and context: /
[pool-3-thread-1] WARN org.eclipse.jetty.server.handler.ContextHandler - Empty contextPath
Jul 04, 2024 6:14:53 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.glassfish.jersey.servlet.init.FilterUrlMappingsProviderImpl registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.glassfish.jersey.servlet.init.FilterUrlMappingsProviderImpl will be ignored.
Jul 04, 2024 6:14:53 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.glassfish.jersey.servlet.async.AsyncContextDelegateProviderImpl registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.glassfish.jersey.servlet.async.AsyncContextDelegateProviderImpl will be ignored.
[pool-3-thread-1] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@-46fc0548{/,null,AVAILABLE}
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.application.JerseyApplicationContentProvider canHandleApplication
INFO: [sid_55] There is no application select filter defined, using default application
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.JerseyServiceRuntime assignContent
INFO: Added content sid_55 to application .default class org.fipro.service.modifier.rest.CustomObjectMapperProvider
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.application.JerseyApplicationContentProvider canHandleApplication
INFO: [sid_56] There is no application select filter defined, using default application
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.JerseyServiceRuntime assignContent
INFO: Added content sid_56 to application .default class org.fipro.service.modifier.rest.HtmlWriterInterceptor
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.application.JerseyApplicationContentProvider canHandleApplication
INFO: [sid_57] There is no application select filter defined, using default application
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.JerseyServiceRuntime assignContent
INFO: Added content sid_57 to application .default class org.fipro.service.modifier.rest.JacksonJsonFeature
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.JerseyServiceRuntime assignContent
INFO: Added content OptionalResponse Filter to application modifyApplication class org.eclipse.osgitech.rest.util.OptionalResponseFilter
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.JerseyServiceRuntime assignContent
INFO: Added content OptionalResponse Filter to application .default class org.eclipse.osgitech.rest.util.OptionalResponseFilter
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.JerseyServiceRuntime assignContent
INFO: Added content modifier to application modifyApplication class org.fipro.service.modifier.rest.ModifierRestService
Jul 04, 2024 6:14:54 PM org.eclipse.osgitech.rest.runtime.JerseyServiceRuntime assignContent
INFO: Added content modifier to application .default class org.fipro.service.modifier.rest.ModifierRestService
Jul 04, 2024 6:14:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.glassfish.jersey.servlet.init.FilterUrlMappingsProviderImpl registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.glassfish.jersey.servlet.init.FilterUrlMappingsProviderImpl will be ignored.
Jul 04, 2024 6:14:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.glassfish.jersey.servlet.async.AsyncContextDelegateProviderImpl registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.glassfish.jersey.servlet.async.AsyncContextDelegateProviderImpl will be ignored.
[pool-3-thread-1] INFO org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@-76e80c1f{/mod,null,AVAILABLE}
Jul 04, 2024 6:14:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.glassfish.jersey.servlet.init.FilterUrlMappingsProviderImpl registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.glassfish.jersey.servlet.init.FilterUrlMappingsProviderImpl will be ignored.
Jul 04, 2024 6:14:54 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.glassfish.jersey.servlet.async.AsyncContextDelegateProviderImpl registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.glassfish.jersey.servlet.async.AsyncContextDelegateProviderImpl will be ignored.
[pool-2-thread-1] INFO org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@c2fd6bae{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
./start.sh: line 13:  1008 Killed                  java $JAVA_OPTS $JAVA_OPTS_EXTRA -jar app.jar
++ docker inspect '--format={{.Id}}' criu_checkpoint
+ CONTAINER_ID=219dc098fd492c7047b6f88b4433c49bea761d54b73e3e21eb543fee945fd009
+ docker container commit '--change=CMD ["criu", "restore", "--unprivileged", "-D", "/app/checkpoint", "--shell-job", "-v4", "--log-file=restore.log"]' 219dc098fd492c7047b6f88b4433c49bea761d54b73e3e21eb543fee945fd009 criu_restore
sha256:75ff6c4623ceb3f08d2339ce64224d897cc492a36a0c0225d2fa2a0ed2a0f4d5
+ docker container rm criu_checkpoint
criu_checkpoint
+ docker image rm criu_checkpoint
Untagged: criu_checkpoint:latest
++ docker images -q --filter dangling=true
+ docker rmi
"docker rmi" requires at least 1 argument.
See 'docker rmi --help'.

Usage:  docker rmi [OPTIONS] IMAGE [IMAGE...]

Remove one or more images

Ignoring the docker rmi error, I can see that the checkpoint succeeded. Restoring this image failed as expected because of std in/out/err, so I modified start.sh to run the JVM with those redirected and was able to checkpoint and restore. There's no output of course, but I could see the process running after restore:

ymanton@c21836v1:~$ ps -ef | grep java
root        5755    5746  3 14:22 ?        00:00:00 java -XX:CRaCCheckpointTo=/app/checkpoint -Djdk.crac.resource-policies=/app/crac_fd_policies.yaml -Dorg.crac.Core.Compat=jdk.crac -Dopenj9.internal.criu.unprivilegedMode=true -jar app.jar

Your comments had ./build_criu.sh docker ubi_openj9_jre instead, and that script is not part of the zip file, so perhaps there's an error there? Specifically --security-opt seccomp=unconfined may have been forgotten?

I'll try Podman for Windows next and let you know how that goes.

@fipro78
Copy link
Author

fipro78 commented Jul 4, 2024

As mentioned in the comment above I have pushed my example to github to make it easier to see the actual code etc. But interesting that it works on your machine and I am not able to create a checkpoint on different windows 11 machines.

@ymanton
Copy link
Member

ymanton commented Jul 4, 2024

OK thanks, I see the repo.

Can you check the seccomp state inside your containers? The --security-opt seccomp=unconfined option should disable seccomp inside containers, which then allows CRIU to trivially suspend it, but it seems like seccomp is not disabled in your containers.

You can check as follows:

docker run --rm <image> cat /proc/self/status

By default you will see

...
Seccomp:        2
Seccomp_filters:        1
...

near the bottom of the output, which means seccomp is enabled. If you do docker run --rm --security-opt seccomp=unconfined <image> cat /proc/self/status you should see

...
Seccomp:        0
Seccomp_filters:        0
...

@fipro78
Copy link
Author

fipro78 commented Jul 5, 2024

Thanks for the pointer. I actually already opened a ticket for the seccomp issue, but totally forgot about it. :(

microsoft/WSL#10981

I am running my WSL2 with networkingMode=mirrored. This is causing that the seccomp setting is ignored on starting a container (at least it seems so). Switching back to neworkingMode=NAT the checkpoint generation works.

Restoring this image failed as expected because of std in/out/err

Does it really fail or do you just see the tty error, but the application is starting? Because I do not see the error.

Ok, found out myself. If I start the container with -ti it works with std in/out/err. But without it, the container stops immediately.

And I solved the issue on using the OpenJ9 CRIUSupport API. I missed to set setShellJob(true). That example now also works.

And one (hopefully) last question:
The restore container also needs --cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --security-opt seccomp=unconfined to be able to start the process from checkpoint? Is that correct? From testing I see that it only works if I pass those arguments. But I would have assumed that it is not necessary, as I start in unprivileged mode. It is at least not necessary for CRaC, which is why I`m asking.

@ymanton
Copy link
Member

ymanton commented Jul 5, 2024

The restore container also needs --cap-add=CHECKPOINT_RESTORE --cap-add=SYS_PTRACE --security-opt seccomp=unconfined to be able to start the process from checkpoint? Is that correct? From testing I see that it only works if I pass those arguments. But I would have assumed that it is not necessary, as I start in unprivileged mode. It is at least not necessary for CRaC, which is why I`m asking.

CRIU's "unprivileged" mode still requires some privileges unfortunately.

CRIU's default privileged mode requires root, and supports checkpointing and restoring processes that use many different kinds of kernel features. CRIU's unprivileged mode does not require root, but it still requires a handful of capabilities. In unprivileged mode CRIU will avoid trying to checkpoint/restore processes that use certain features that we know can't be supported, and for others it will try any way and expectedly fail, so in that respect it only supports checkpointing a subset of programs that privileged CRIU can support.

In the JVM we've changed a few things to allow us to reduce the number of capabilities we need to grant CRIU; CAP_CHECKPOINT_RESTORE is always required, CAP_SYS_PTRACE is required for checkpointing, CAP_SETPCAP is required when using certain container runtimes but not others, etc.

Seccomp is a different story. Seccomp allows container runtimes to block certain syscalls, even if those syscalls are allowed to be used by non-root processes outside of containers. CRIU uses certain syscalls that are allowed to be called by non-root processes but that most programs don't ever use. Some container runtimes block such syscalls in their default profile to be conservatively safe. If you want keep seccomp enabled you can create a seccomp profile that allows the syscalls CRIU needs and keeps the rest disabled. You can take a look at https://blog.openj9.org/2022/09/29/unprivileged-openj9-criu-support/ for a discussion on that topic, and here's an example seccomp profile that we've used in the past: https://github.com/eclipse-openj9/openj9/files/8774222/criuseccompprofile.json.txt

@fipro78
Copy link
Author

fipro78 commented Jul 8, 2024

For your information:
I have just published my blog post about CRaC/CRIU
https://vogella.com/blog/cracin-your-osgi-application/

I hope I have collected and written about all the information you provided correctly. If you think something is wrong or should be adjusted, please let me know.

Thanks a lot for your support over the last months!

@tajila
Copy link
Contributor

tajila commented Jul 15, 2024

@fipro78 Thanks for writing the blog, looks good.

Just a few things:


There is no support for manually creating a checkpoint using the jcmd tool. The reason is probably that the OpenJ9 jcmd tool is a different implementation that is specific to OpenJ9, and is not related to the HotSpot tool of the same name (see [OpenJ9 - Java diagnostic command (jcmd) tool](https://eclipse.dev/openj9/docs/tool_jcmd/))

While its not stated in the docs, OpenJ9 does support JDK.checkpoint with jcmd. We dont "officially" support CraC yet which is why the docs havent been updated.


OpenJ9 CRIU Support additionally needs...

OpenJ9 encourages minimal set of caps when restoring an image. So at checkpoint cap_checkpoint_restore,cap_sys_ptrace,cap_setpcap is required, at restore only cap_checkpoint_restore,cap_setpcap is required. This is the case on podman, its possible that we could get away with fewer caps on docker (that is an exercise for us to determine).

The big benefit with our approach is that we are not running with cap_sys_ptrace on restore, which may be a safety issue in some production environments.

Overall, I think the blog does a great job in highlighting how to get started with both technologies and what to expect when using it. One aspect that is not discussed is how the technologies will perform in production environments. OpenJ9 has added diagnostic and tuning capabilities to address some of the pain points that will be encountered in production environments. Here are some aspects:

  • Portability between architectural flavours

    • When deploying a checkpointed image in cloud environments, one may not know which architectural flavour the image will be deployed to (eg. intel vs amd, broadwell vs icelake). In normal circumstances the JIT will attempt to use the best instructions available on the host machine. Taking a checkpoint on a new machine, and restoring on an old machine will lead to issues, unless there is some kind of intervention. J9 will automatically, limit the instructions use pre-checkpoint to a common subset, on restore it will use what is available on the restore host to ensure a checkpointed image can be restored in as many environments as possible.
  • Tuning an image for the deployment node

    • Similarly, when deploying an image there are cases where one may need to tune the JVM for the memory + cpu available on the deployment node. J9 automatically adjusts the parallelism on restore for things like ForkJoinPool (which Virtual Threads depends on). OpenJ9 also adjusts the number of GC threads and max heap size. There are cmdline options that for users to manually adjust those as well.
  • Debugging

    • Existing diagnostic tools like Xtrace and Xdump also work on restore. This means that users dont need to rebuild the image in order to add diagnostic options, they can use a production image and just add the options on restore.

    • We are currently working on extending this capability such that a user can attach a Java debugger on restore using a production image.

  • Security APIs are safe by default

    • The checkpoint/restore model has some security concerns in that the entire state of the JVM is serialized which includes the security stack. By default, J9 only allows CRIU compatible (Security APIs design for checkpoint/restore) APIs that do not leave any state in the image between checkpoint/restore.
  • User friendly compensations

    • J9 also adds hooks to ensure Random numbers are re-seeded between checkpoint/restore for uniqueness, accounts for the downtime between checkpoint and restore for timers, resets network state, ...

So while fast start is crucial for checkpoint/restore, it is important that there is support to ensure that the checkpointed image can be restored on a new system as though it was started there originally. We are building out this support and looking for more use cases to identify areas for improvement.

@fipro78
Copy link
Author

fipro78 commented Jul 15, 2024

@tajila
Thanks a lot for the additional information! Good that I have linked this issue in my blog post. So people can find this information without having it added to my blog post.

While its not stated in the docs, OpenJ9 does support JDK.checkpoint with jcmd.

That is interesting. I tried it but I got jcmd errors in the J9 environment. Maybe it was caused by other issues I have solved in the meantime, but it did not work in the beginning. I will update that note in my blog post.
Just to be sure, using the jcmd would only work with the CRaC support, but not with the OpenJ9 CRIUSupport. Is that correct?

@tajila
Copy link
Contributor

tajila commented Jul 15, 2024

Just to be sure, using the jcmd would only work with the CRaC support, but not with the OpenJ9 CRIUSupport. Is that correct?

Yes, that is correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:vm criu Used to track CRIU snapshot related work
Projects
Status: No status
Development

No branches or pull requests

4 participants