Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

criu: fix a fatal failure if nft doesn't work #2403

Closed
wants to merge 1 commit into from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented May 10, 2024

On some systems, nft binary might not be installed, or some kernel options might be unconfigured, resulting in something like this:

sudo unshare -n nft create table inet CRIU
Error: Could not process rule: Operation not supported
create table inet CRIU
^^^^^^^^^^^^^^^^^^^^^^^

This is similar to what kerndat_has_nftables_concat() does, and if the outcome is the same, it returns an error to kerndat_init(), and an error from kerndat_init() is considered fatal.

Let's relax the check, returning mere "feature not working" instead of a fatal error.


This was discovered while running criu CI on ARM via actuated ci env generously provided by @alexellis. Currently it runs kernel 6.1.90 with the following config:

grep CONFIG_NF_TABLES /boot/config-6.1.90
CONFIG_NF_TABLES=m
# CONFIG_NF_TABLES_INET is not set
# CONFIG_NF_TABLES_NETDEV is not set
# CONFIG_NF_TABLES_IPV4 is not set
# CONFIG_NF_TABLES_ARP is not set
# CONFIG_NF_TABLES_IPV6 is not set
CONFIG_NF_TABLES_BRIDGE=m

I guess that missing CONFIG_NF_TABLES_INET is the source of the issue.

Adding this patch on top of current criu-dev fixes all failures of runc c/r tests (see e.g. https://github.com/opencontainers/runc/actions/runs/9024939764/job/24799738301).

How it failed before the fix

Failed run (using criu_3.19-2_arm64.deb from https://download.opensuse.org/repositories/devel:/tools:/criu/xUbuntu_22.04) looks like this (from https://github.com/opencontainers/runc/actions/runs/9023994162/job/24796971230):

=== RUN   TestUsernsCheckpoint
    checkpoint_test.go:45: Test requires userns
--- SKIP: TestUsernsCheckpoint (0.18s)
=== RUN   TestCheckpoint
time="2024-05-09T21:51:52Z" level=warning msg="--- Quoting \"/tmp/TestCheckpoint2075305918/003/criu/dump.log\""
time="2024-05-09T21:51:52Z" level=warning msg="17:(00.005350) net: Restoring netdev veth idx 10"
time="2024-05-09T21:51:52Z" level=warning msg="18:(00.006183) net: Dumping netns links"
time="2024-05-09T21:51:52Z" level=warning msg="19:(00.006241) net: \tLD: Got link 1, type 772"
time="2024-05-09T21:51:52Z" level=warning msg="20:(00.006249) net: \tLD: Got link 2, type 776"
time="2024-05-09T21:51:52Z" level=warning msg="21:(00.006256) net: \tLD: Got link 10, type 1"
time="2024-05-09T21:51:52Z" level=warning msg="22:(00.028176) Error (criu/kerndat.c:1563): Can't create nftables table"
time="2024-05-09T21:51:52Z" level=warning msg="23:(00.028262) Error (criu/util.c:1495): Can't wait or bad status: errno=0, status=256"
time="2024-05-09T21:51:52Z" level=warning msg="24:(00.028304) Error (criu/kerndat.c:1718): kerndat_has_nftables_concat failed when initializing kerndat."
time="2024-05-09T21:51:52Z" level=warning msg=---
    checkpoint_test.go:117: criu failed: type DUMP errno 0
--- FAIL: TestCheckpoint (0.25s)

(failures in runc integration tests are similar).

On some systems, nft binary might not be installed, or some kernel
options might be unconfigured, resulting in something like this:

	sudo unshare -n nft create table inet CRIU
	Error: Could not process rule: Operation not supported
	create table inet CRIU
	^^^^^^^^^^^^^^^^^^^^^^^

This is similar to what kerndat_has_nftables_concat() does, and if the
outcome is the same, it returns an error to kerndat_init(), and an error
from kerndat_init() is considered fatal.

Let's relax the check, returning mere "feature not working" instead of
a fatal error.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

Cc @adrianreber @ZeyadYasser @avagin

kolyshkin added a commit to kolyshkin/runc that referenced this pull request May 10, 2024
Currently, actuated runners have kernel 6.1.90 with
CONFIG_NF_TABLES_INET not set. This uncovers criu bug [1]
thus making c/r impossible to work.

Let's not install criu binary for now, effectively disabling c/r tests.

[1] checkpoint-restore/criu#2403

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@avagin avagin requested a review from mihalicyn May 10, 2024 05:40
@avagin
Copy link
Member

avagin commented May 10, 2024

Applied. Thanks!

Copy link
Member

@ZeyadYasser ZeyadYasser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, Thank you!

@rst0git
Copy link
Member

rst0git commented May 10, 2024

LGTM. Closing as the patch has been applied to the criu-dev branch.

@rst0git rst0git closed this May 10, 2024
@Snorch
Copy link
Member

Snorch commented May 13, 2024

The patch looks good.

But it seems that kdat.has_nftables_concat is unused after set, do we really need this code? @avagin @ZeyadYasser maybe you know more?

(I mean the only consequence of it set is Nftables based locking requires libnftables and set concatenations support warning)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants