Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 3.17 #1727

Merged
merged 302 commits into from
Apr 29, 2022
Merged

Release 3.17 #1727

merged 302 commits into from
Apr 29, 2022

Conversation

adrianreber
Copy link
Member

I prepared a possible 3.17 release. I took everything from criu-dev except the flog/ changes.

I would like to add following PRs also for 3.17 once they are merged in criu-dev:

Especially #1706 feels important as Fedora 36 and RHEL 9 needs it for to have a working version of CRIU. @mihalicyn do you need any help with the rseq PR?

@checkpoint-restore/maintainers anything else you would like to see in 3.17?

Snorch and others added 30 commits January 21, 2022 11:23
If unlinkat fails it means that fs is in "corrupted" state - spoiled
with non-unlinked auxiliary directories.

While on it add fixme note as this function can be racy and BUG_ON if
path contains double slashes.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/b7b4e69fd

Changes: simplify while loop condition, remove confusing FIXME, remove
excess !count check in favour of while loop condition check

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
…ocesses

Previousely remounted_rw was not shared between all processes on
restore, thus cleanup didn't got this info from rfi_remap and these
mounts were wrongly left writable after restore.

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/3a1a592e7

Fixes: fd0a3cd ("mount: remount ro mounts writable before
ghost-file restore")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Previousely I din't mention this case because we had bad error handling
in ghost cleanup path.

Without these patch but with proper error handling for unlink we have an
error in mntns_ghost01 test:

Error (criu/files-reg.c:2269): Failed to unlink the remap file:
Read-only file system

Cherry-picked from Virtuozzo criu:
https://src.openvz.org/projects/OVZ/repos/criu/commits/151c859e1

Changes: check lookup_mnt_id return for NULL

Fixes: fd0a3cd ("mount: remount ro mounts writable before
ghost-file restore")
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
This is a test for "ghost/mount: allocate remounted_rw in shmem to get
info from other processes" patch, without the patch test fails with:

  ############# Test zdtm/static/mntns_ghost01 FAIL at result check ##############
  Test output: ================================
  16:15:19.607:     5: ERR: mntns_ghost01.c:95: open for write on rofs -> 7 (errno = 11 (Resource temporarily unavailable))
  16:15:19.607:     4: FAIL: mntns_ghost01.c:121: Test died (errno = 11 (Resource temporarily unavailable))

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
When we declare struct and at the same time declare variable pointer of
this struct type, it looks like clang-format threats "*" as a
multiplication operator instead of indirection (pointer declaration)
operator and puts spaces on both sides, which looks wrong.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
The current debian stable release is Bullseye, not Buster. However, we
can use the 'stable' release instead. This would allow the CI to
automatically pick up updates in the future.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
Debian testing has newer compiler version and running
cross compilation tests would allow us to catch any compilation
errors early.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
Starting with gcc-11, Debian's armhf compiler no longer builds with
a default -mfpu= option. Instead it enables the FPU via an extension
to the -march flag (--with-arch=armv7-a+fp). criu's Makefile explicitly
passes its own -march=armv7-a setting, which overrides the +fp default,
so we end up with no FPU:

    cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
Broken tests are being tracked at

 * checkpoint-restore#1669
 * checkpoint-restore#1635

This also enables previously disabled BPF related tests:

 * checkpoint-restore#1354

Signed-off-by: Adrian Reber <areber@redhat.com>
Looking at CI logs there are often messages like:

 "[WARNING] Option --keep-going is more useful when running multiple tests"

This commit removes '--keep-going' from single zdtm test runs.

Signed-off-by: Adrian Reber <areber@redhat.com>
An issue with dumping deleted reg files in overlayfs:

After deleting a file originated from lower layer in merged dir,
fstat() on the /proc/$pid/map_files symlink returns st_nlink=1, while
linkat() fails with errno ENOENT.

Signed-off-by: langyenan <ianlang@tencent.com>
The function run_tcp_server() was the last place CRIU was still using
the IPv4 only function inet_ntoa(). It was only used during a print, so
that it did not really break anything, but with this commit the output
is now no longer:

 Accepted connection from 0.0.0.0:58396

but correctly displaying the IPv6 address

 Accepted connection from ::1:58398

if connecting via IPv6.

Signed-off-by: Adrian Reber <areber@redhat.com>
Since commit 83301b5367a98 ("af_unix: Set TCP_ESTABLISHED for datagram sockets
too") in Linux kernel, SOCK_DGRAM unix sockets can have TCP_ESTABLISHED state
when connected. So we need to fix checks that assume SOCK_DRAM sockets cannot
have TCP_ESTABLISHED state.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
As the unix socket broken tests have been fixed in the pull request

checkpoint-restore#1680

We re-enable these tests.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
The version of ps in Alpine image by default is very limited.
It is based on the one from busybox and doesn't support options
such as '-p'.

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
We try to disable time namespace based testing for kernels older than
5.11. But we fail to come up with the correct if condition.

This changes (major <= 5) to (major < 5). There are no kernels with
major > 5 so currently the time namespace based are never run. This
should finally change it to run time namespace based tests on kernel
versions newer than 5.10.

Signed-off-by: Adrian Reber <areber@redhat.com>
Since
e2e8be3 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")
we doing fault-injection test for C/R of threads register set by filling tasks
xsave structures with the garbage. But there are some features for which that's not
safe. It leads to failures like described in checkpoint-restore#1635

In this particular case we meet the problem with PKRU feature, the problem
that after corrupting pkru registers we may restrict access to some vma areas,
so, after that process with the parasite injected get's segfault and crashes.

Let's manually specify which features is save to fill with the garbage by
keeping proper XFEATURE_MASK_FAULTINJ mask value.

Fixes: e2e8be3 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")

checkpoint-restore#1635

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Fixes: e2e8be3 ("x86/compel/fault-inject: Add a fault-injection for corrupting extended regset")

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
The --timeout option was introduced in [1] to prevent criu dump from
being able to hang indefinitely and allow users to adjust the time limit
in seconds for collecting tasks during the dump operation.

[1] checkpoint-restore@d0ff730

Signed-off-by: Radostin Stoyanov <radostin@redhat.com>
UNS_FDOUT means only that a userns call will return a file descriptor.

Signed-off-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
We added cross-compile tests with testing debian release to be able to
replicate the error reported in checkpoint-restore#1653, however, installing build
dependencies in this release currently fails with the following error:

libc6-dev:armhf : Breaks: libc6-dev-armhf-cross (< 2.33~) but 2.32-1cross4 is to be installed

This is not something we can fix, therefore using the debian unstable
release (instead of testing) could be more reliable option for our CI.
This would still replicate the problem reported in checkpoint-restore#1653.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
This patch sets the glibc.pthread.rseq tunable [1] to disable rseq
support in glibc as a temporary solution for the problem described in
[2]. This would allow us to run CI tests until CRIU has rseq support.

This commit also disables the rpc tests as they fail even
when GLIBC_TUNABLES is set.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=e3e589829d16af9f7e73c7b70f74f3c5d5003e45
[2] checkpoint-restore#1696

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
In contrast to the CLI it is not possible to do a single pre-dump via
RPC and thus libcriu. In cr-service.c pre-dump always goes into a
pre-dump loop followed by a final dump. runc already works around this
to only do a single pre-dump by killing the CRIU process waiting for the
message for the final dump.

Trying to implement pre-dump in crun via libcriu it is not as easy to
work around CRIU's pre-dump loop expectations as with runc that directly
talks to CRIU via RPC.

We know that LXC/LXD also does single pre-dumps using the CLI and runc
also only does single pre-dumps by misusing the pre-dump loop interface.

With this commit it is possible to trigger a single pre-dump via RPC and
libcriu without misusing the interface provided via cr-service.c. So
this commit basically updates CRIU to the existing use cases.

The existing pre-dump loop still sounds like a very good idea, but so
far most tools have decided to implement the pre-dump loop themselves.

With this change we can implement pre-dump in crun to match what is
currently implemented in runc.

Signed-off-by: Adrian Reber <areber@redhat.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
In most cases we run tests as:
./test/zdtm.py run -a

But it's also possible to run tests from root makefile:
make test

In this case, if criu tree have no ./test/umount2 binary
built we get the error like:
make[3]: *** No rule to make target 'umount2'. Stop.

It's worth to mention this "3". That's because we have
build process tree like this:
make -> make -> make -> zdtm.py -> make umount2
and also we have MAKEFLAGS variable set to:
build=-r -R -f ...

And that's bad because "-r" option means no builtin
rules and -R means no builtin variables. That makes
`make umount2` not working. Let's just cleanup this
variable to make things work properly.

Fixes: checkpoint-restore#1699
checkpoint-restore#1699

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
zdtm_ct.c:44:12: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
   44 | static int create_timens()

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Looks like in commit [1] we've non-intentionally added this tmp file to
git, let's remove it.

Fixes: 01ee297 ("s390:zdtm: Enable zdtm for s390") [1]
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Several lines above if (optind >= argc) we go to usage label and fail,
thus we don't need to check (optind < argc) here as it is always true.

Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
mihalicyn and others added 5 commits April 25, 2022 15:09
mountinfo contains more info than just "mount" output

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
bpf_create_map_xattr() has been replaced with bpf_map_create()
libbpf/libbpf@6cfb97c

DECLARE_LIBBPF_OPTS has been renamed to LIBBPF_OPTS
libbpf/libbpf@ea6c242

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
The map_extra field has been introduced in Linux Kernel release 5.16
and does not exist in older kernel versions. The current parsing
implementation fails when map_extra is missing.

In particular, it tries to parse the `memlock` field as `map_extra` and
fails but it does not exit with an error because map_extra is marked as
"optional". It then tries to parse the `map_id` field as `memlock` and
fails with an error because map_id is not optional:

Error (criu/proc_parse.c:2161): parse_fdinfo_pid_s: error parsing [map_type:\t2] for 19: Success'

To correctly handle this, we should try to parse again the next field
when parsing of `map_extra` fails, without reading the next line from
the bpfmap.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
bind_on_delete() return code is only used for setting errno for pr_perror()
This is mostly useless since a lot of syscalls already set it. All of
non-syscall errors already have prints in case of failure.
Fix bind_on_deleted() always returning 0 and simplify error juggling to
returning -1 in case of errors.

Fixes: checkpoint-restore#1771
Fixes: d0308e5 ("sk-unix: make criu respect existing files while restoring ghost unix socket fd")
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>
This commit adds a link to the workflow runs for each badge.

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
@adrianreber
Copy link
Member Author

Could you cherry-pick cfacd86 to this pr?

Added. Thanks for checking. Not sure how I missed that. Maybe a wrong range during cherry-pick

@lgtm-com
Copy link

lgtm-com bot commented Apr 25, 2022

This pull request fixes 1 alert when merging f548767 into be23910 - view on LGTM.com

fixed alerts:

  • 1 for Unused local variable

@avagin
Copy link
Member

avagin commented Apr 25, 2022

We need to create https://criu.org/Download/criu/3.17. I know @mihalicyn was ready to help with it.

@mihalicyn
Copy link
Member

Sure, I've created a release page. Friends, please take a look at it. Possibly, I missed some improvements or fixes which we have to list on the release page.

@avagin avagin requested review from a team, xemul, 0x7f454c46, kolyshkin, mihalicyn, rppt, rst0git and Snorch and removed request for a team April 25, 2022 23:38
Copy link
Member

@rst0git rst0git left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@Snorch Snorch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. And thanks a lot for merging mount engine patches!

@adrianreber
Copy link
Member Author

Sure, I've created a release page. Friends, please take a look at it. Possibly, I missed some improvements or fixes which we have to list on the release page.

Thanks for the wiki page. Looks good!

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Member

@mihalicyn mihalicyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@0x7f454c46 0x7f454c46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns from my side, LGTM.

@avagin
Copy link
Member

avagin commented Apr 27, 2022

LGTM

@avagin avagin merged commit 2f0f128 into checkpoint-restore:master Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-auto-close Don't auto-close as a stale issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.