New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C/R Support for POSIX message queues #2285
Comments
Hi all, any news/comments/ideas on this? |
No, I think mqueue checkpoint/restore unfortunately only works with empty mqueues at this point. Maybe this is something you could implement? |
I somehow suspected that, but thanks for clarifying that.
Well, I am quite new to this topic and have first to understand how C/R for the other filesystems work. Also, whether skipping this feature 7 years ago was due to lack of importance, or more to technical issues hard to solve. A quick research in the repo's git history didn't reveal any hints on that. |
I see one technical issue: we don't have ability to get messages from such queues without actually removing messages from queue, we need a kernel interface like we have for sockets receive with MSG_PEAK to be non-intrusive. Still we can:
In case criu segfaults, or is killed, or just has bad error handling between 5 and 6 - original container will have broken mqueue communication. We can restore in similar way as the dump with inverted operations. note: We will need to save/restore mq_getattr/mq_setattr for each mqueue. And that seems it. |
It looks like the problem and solution are both pretty well-defined, do you guys think this could be a potential project for GSoC 2024? |
I was starting to poke around this. Not sure if it is some kernel bug, but inside the container: It seems a similar issue has been encountered by others: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1507463
I do see some comments in the code (criu/files.c) about wrong info for overlayfs for certain kernel versions, but my kernel is 5.19 (whereas the comment above suggests that such a bug last appeared in 4.2) Also might be relevant: https://patchwork.kernel.org/project/linux-fsdevel/patch/20191121070613.4286-1-hu1.chen@intel.com/ |
A friendly reminder that this issue had no activity for 30 days. |
Description
It seems that checkpoint/restore for POSIX message queues is (still) not supported. Checkpointing applications using such queues fail:
Steps to reproduce the issue:
Create test application
mq0.py
[Note that Python was chosen for writing a compact test program. One could use C instead.]
Prepare podman (or docker) container file with Python3 support and
posix_ipc
module:Build test container
sudo podman build -t test container
Run container
sudo podman run -dt test python3 /mqtest.py
Checkpoint container
Describe the results you received:
Checkpointing the container fails:
Describe the results you expected:
Checkpoint (and restore) should also include POSIX message queues, as other IPC mechanisms like POSIX shared memories work without problems.
Additional tests:
When playing around with the test application by closing and/or unlinking the message queue after creation, criu's error messages vary a bit:
mq.close_fd()
after creation:mq.unlink()
after creation:mq.close_fd()
andmq.unlink()
after creation:No error, checkpoint succeeds as expected (but w/o mq for the application).
CRIU logs and information:
CRIU full dump/restore logs:
Output of `criu --version`:
Output of `criu check --all`:
Additional environment details:
The text was updated successfully, but these errors were encountered: