Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't fork for 12: File exists #2383

Open
liuaifu opened this issue Apr 9, 2024 · 8 comments
Open

Can't fork for 12: File exists #2383

liuaifu opened this issue Apr 9, 2024 · 8 comments

Comments

@liuaifu
Copy link

liuaifu commented Apr 9, 2024

Description

I dumped app1 in container 1 and kept it running. Then copy dump to container 2 restore, reporting the following error:

Warn  (criu/kerndat.c:1285): Can't keep kdat cache on non-tempfs
Error (criu/cr-restore.c:1490): Can't fork for 12: File exists
Error (criu/cr-restore.c:2572): Restoring FAILED.

Container 1 and container 2 run on the same host.
Should I use criu-ns?

Steps to reproduce the issue:
1.dump app1 in container 1 using criu
2.copy checkpoint to container 2
3.restore app1 in container 2 using criu

Describe the results you received:

Warn  (criu/kerndat.c:1285): Can't keep kdat cache on non-tempfs
Error (criu/cr-restore.c:1490): Can't fork for 12: File exists
Error (criu/cr-restore.c:2572): Restoring FAILED.

Describe the results you expected:
restore app1 success.

Additional information you deem important (e.g. issue happens only occasionally):

CRIU logs and information:

CRIU full dump/restore logs:

(paste your output here)

Output of `criu --version`:

(paste your output here)

Output of `criu check --all`:

(paste your output here)

Additional environment details:
host os: ubuntu22.04
container image: ubuntu22.04
criu 3.18

@adrianreber
Copy link
Member

The problem you see is because of duplicate PIDs. CRIU always creates processes with the same PID they had when checkpointing. In your destination environment the PID 12 is already in use so the restore fails.

Common workarounds are to use a container for each process you are trying to restore or the criu-ns script.

What are you trying to achieve? If we would know what your goal is we could maybe give you a better workaround.

Using criu-ns in a container will probably lead to nested namespaces and that is something CRIU usually does not handle well.

@liuaifu
Copy link
Author

liuaifu commented Apr 9, 2024

@adrianreber Thank you for your response. I want to run multiple instances of the same app, with the second and subsequent instances coming from the checkpoint of the first instance. Each instance runs in a different docker container. In other words, a container runs only one instance.

@adrianreber
Copy link
Member

So why do you do the checkpoint inside of the container. Docker and Podman offer checkpoint commands which can easily do that:

$ podman container checkpoint -R --export=/tmp/cp.tar
$ podman container restore --import=/tmp/cp.tar --name=copy1
$ podman container restore --import=/tmp/cp.tar --name=copy2

Something like that.

@liuaifu
Copy link
Author

liuaifu commented Apr 9, 2024

Common workarounds are to use a container for each process you are trying to restore or the criu-ns script.

It is currently restored in a different container. But report the error described above.

@adrianreber
Copy link
Member

Common workarounds are to use a container for each process you are trying to restore or the criu-ns script.

It is currently restored in a different container. But report the error described above.

Because there is already something running in the destination container. If you use Podman's or Docker's built-in support you avoid the problem of PID collisions.

@liuaifu
Copy link
Author

liuaifu commented Apr 9, 2024

So why do you do the checkpoint inside of the container. Docker and Podman offer checkpoint commands which can easily do that:

$ podman container checkpoint -R --export=/tmp/cp.tar
$ podman container restore --import=/tmp/cp.tar --name=copy1
$ podman container restore --import=/tmp/cp.tar --name=copy2

Something like that.

Thanks, I'll try tomorrow.

@liuaifu
Copy link
Author

liuaifu commented Apr 9, 2024

Common workarounds are to use a container for each process you are trying to restore or the criu-ns script.

It is currently restored in a different container. But report the error described above.

Because there is already something running in the destination container. If you use Podman's or Docker's built-in support you avoid the problem of PID collisions.

I understand a little bit. There are other programs occupying the pid.

Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants