Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rawhide, rootless: spurious warning messages #4012

Closed
edsantiago opened this issue Sep 12, 2019 · 37 comments · Fixed by #4018 or #4124
Closed

rawhide, rootless: spurious warning messages #4012

edsantiago opened this issue Sep 12, 2019 · 37 comments · Fixed by #4018 or #4124
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Collaborator

On rawhide, rootless:

$ podman images
WARN[0000] Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument
WARN[0000] Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument
WARN[0000] Failed to add podman to systemd sandbox cgroup: <nil>
REPOSITORY                     TAG      IMAGE ID       CREATED         SIZE
quay.io/libpod/alpine_labels   latest   4fab981df737   10 months ago   4.68 MB

Adding |& cat converts the messages to debug format, with time= / level=warning / msg=...

podman-1.5.2-0.79.dev.gitaf8fedc.fc32.x86_64.rpm , but 78.dev.gitafa3d11 does not elicit this, implying that the problem was introduced between afa3d11 and af8fedc

@rhatdan
Copy link
Member

rhatdan commented Sep 13, 2019

I have already brought up this issue.

rm ~/.config/containers/libpod.conf

And you should be all set. The problem is on upgrades, anyone who has run rootless podman in the past will have a libpod.conf in their homedir pointing at runc.

@rhatdan
Copy link
Member

rhatdan commented Sep 13, 2019

This is a duplicate of
#3976
Reopen if I am mistaken.

@rhatdan rhatdan closed this as completed Sep 13, 2019
@edsantiago
Copy link
Collaborator Author

I'm not sure it's a dup: I remember checking podman config, and in fact still see this in my scrollback buffer:

$ cat ~/.config/containers/libpod.conf
volume_path = "/home/fedora/.local/share/containers/storage/volumes"
image_default_transport = "docker://"
runtime = "crun"      <<<<<<<<<<<<<<<<<<<<
runtime_supports_json = ["crun", "runc"]
runtime_supports_nocgroups = ["crun"]
...

(this was a fresh virt, and had never run an old podman as nonroot). Plus it surprised me that podman-78 did not have the error, but it appeared in -79. Both use crun.

But it's bedtime here, and I won't have time tomorrow morning to bring up a new VM to test, so I'm respectfully reopening so @giuseppe will have a chance to have a quick look. If it really is a dup, I apologize in advance for wasting time.

@edsantiago edsantiago reopened this Sep 13, 2019
@mheon
Copy link
Member

mheon commented Sep 13, 2019

Did we remove runc as a hard dependency of Podman? That is Podman complaining that runc is not installed, but is mentioned in libpod.conf. It's a warning that one of the configured runtimes is not available.

@edsantiago
Copy link
Collaborator Author

I remember doing rpm -q runc and getting nothing, so perhaps that's a promising direction. However, why would it fail rootless but not root?

Oh another data point suggesting (to me) that this is not a dup of 3976: gating tests are failing, and those are fresh VMs. (Click the '8m' bar to expand; I have no idea why it's not auto-expanding and why it's not shown in red).

@rhatdan
Copy link
Member

rhatdan commented Sep 13, 2019

There is no hard dependency on runc, it has switched to crun. Why is podman (code) requiring runc? If the libpod.conf is set to crun, runc should be ignored.

@mheon
Copy link
Member

mheon commented Sep 13, 2019 via email

@edsantiago
Copy link
Collaborator Author

Please don't neglect this message:

WARN[0000] Failed to add podman to systemd sandbox cgroup: <nil>

I've just run a quick test and see this message whether runc is installed or not. So in short:

  • if runc and crun are both installed, podman nonroot issues the above warning on all (AFAICT) commands.
  • if only crun is installed, podman nonroot issues the three warnings described in comment 0.

@rhatdan
Copy link
Member

rhatdan commented Sep 13, 2019

Why are we showing these errors by default? I thought we defaulted to ERROR.

@mheon
Copy link
Member

mheon commented Sep 13, 2019

@giuseppe I think that systemd CGroup error was added by you, with the recent PR to join systemd cgroups?

@mheon
Copy link
Member

mheon commented Sep 13, 2019

@rhatdan We do, AFAIK

@giuseppe
Copy link
Member

@mheon yes my recent PR enabled systemd cgroups.

I did a mistake in the patch though and the error is overwritten, I'll prepare a PR to address it

giuseppe added a commit to giuseppe/libpod that referenced this issue Sep 13, 2019
do not shadow the err variable so that the correct error message can
be reported when utils.RunUnderSystemdScope fails.

Closes: containers#4012

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member

I've opened a PR here: #4018

@edsantiago
Copy link
Collaborator Author

The Failed to add podman to systemd sandbox cgroup: <nil> message is gone, but the runc ones remain and now there's a new one:

$ podman version
time="2019-09-16T16:07:44Z" level=warning msg="Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument"
time="2019-09-16T16:07:44Z" level=warning msg="Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument"
time="2019-09-16T16:07:44Z" level=warning msg="Failed to add podman to systemd sandbox cgroup: dial unix /run/user/0/bus: connect: permission denied"

This is causing Rawhide gating tests to fail; which is a pretty good indicator that this will cause a bad user experience in rawhide. Reopening.

@edsantiago edsantiago reopened this Sep 16, 2019
@mheon
Copy link
Member

mheon commented Sep 16, 2019

These are logrus.Warnf and our default log level should be error; they shouldn't be printing by default.

I do not want to remove the runc errors, as that's a legitimate configuration issue - a configured runtime is not available. The systemd one is likely Giuseppe's patch to set up a systemd cgroup, which is allowed to fail.

@edsantiago
Copy link
Collaborator Author

Can we just fix it somehow? Fix the default configuration, fix whatever needs to be fixed, just fix things so that these error messages stop showing up?

As I mentioned, they are causing rawhide system tests to fail. The system tests use BATS, which (love it or not) combines stdout and stderr streams. I am not going to spend effort hacking the system tests to filter out all instances of msg="Error initializing configured OCI" and msg="Failed to add podman, first because that's quite a bit of work, second because it would be hideous, and third because it could mask real problems. Is there really no way to fix this on the podman end?

@mheon
Copy link
Member

mheon commented Sep 16, 2019

Why are they being printed? They should not be displayed by default. Are you forcing a non-default log level?

@edsantiago
Copy link
Collaborator Author

No. Not intentionally, not to my memory either, and the words 'level' and 'log' do not appear in any related context under the test/system directory.

@rhatdan
Copy link
Member

rhatdan commented Sep 16, 2019

Do you see this happen, anywhere but in the test?

@edsantiago
Copy link
Collaborator Author

I saw it last week on a rawhide test setup. I'll bring up another rawhide virt now and confirm.

@rhatdan
Copy link
Member

rhatdan commented Sep 16, 2019

func outputError(err error) {
	if MainGlobalOpts.LogLevel == "debug" {
		logrus.Errorf(err.Error())
	} else {
		ee, ok := err.(*exec.ExitError)
		if ok {
			if status, ok := ee.Sys().(syscall.WaitStatus); ok {
				exitCode = status.ExitStatus()
			}
		}
		fmt.Fprintln(os.Stderr, "Error:", err.Error())
	}
}

This code seems curios. It looks to me like the outputError would output to stderr if the logLevel is not set to debug?

@rhatdan
Copy link
Member

rhatdan commented Sep 16, 2019

Well I guess it does not come from this, since Error: is not the prefix.

@edsantiago
Copy link
Collaborator Author

Sorry for the delay; there's been an openstack outage until just now. It is not log-level:

$ podman --log-level=error images
WARN[0000] Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument
WARN[0000] Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument
WARN[0000] Failed to add podman to systemd sandbox cgroup: dial unix /run/user/0/bus: connect: permission denied
REPOSITORY   TAG   IMAGE ID   CREATED   SIZE
$ ^error^fatal  -> same thing

(FWIW, the messages also appear with --log-level=debug, along with other expected debug messages.)

podman-1.5.2-0.90.dev.git1f5514e.fc32.x86_64

@edsantiago
Copy link
Collaborator Author

Dum de dum. Out of desperation I edited libpod.conf, removed all references to runc:

$ grep runc  ~/.config/containers/libpod.conf
$ podman --log-level=fatal images
WARN[0000] Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument
WARN[0000] Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument
WARN[0000] Failed to add podman to systemd sandbox cgroup: dial unix /run/user/0/bus: connect: permission denied
REPOSITORY   TAG   IMAGE ID   CREATED   SIZE

@rhatdan
Copy link
Member

rhatdan commented Sep 16, 2019

Did you remove it from /etc/containers/libpod.conf and /usr/share/containers/libpod.conf

@edsantiago
Copy link
Collaborator Author

No, but I just tried, no difference: still the same three warnings.

@rhatdan
Copy link
Member

rhatdan commented Sep 16, 2019

Where the frig is it finding runc...

@rhatdan
Copy link
Member

rhatdan commented Sep 16, 2019

@giuseppe Any ideas
@edsantiago could you try
podman system migrate

@edsantiago
Copy link
Collaborator Author

edsantiago commented Sep 16, 2019

Already did (system migrate) (sorry for not saying so). Also confirmed, via ps auxww, that there are no podman processes running.

Also tried strings -a /usr/bin/podman |grep -w --color=always runc |less -R and do not get anything useful: all instances I see are in the middle of a path string (e.g. source file paths).

Also tried rm -rf /run/user/1000/ /tmp/run-1000 ~/.config ~/.local. No difference.

@mheon
Copy link
Member

mheon commented Sep 16, 2019

@edsantiago We can pull runtimes from both system and local libpod.conf - can you confirm you removed runc from /etc/containers/libpod.conf, /usr/share/libpod.conf, and ~/.config/containers/libpod.conf

(Just now realizing how much of an ordeal this actually is...)

@edsantiago
Copy link
Collaborator Author

There is no /etc/containers/libpod.conf.

$ grep -Rw runc /usr/share/containers ~/
/usr/share/containers/libpod.conf:# Paths to look for a valid OCI runtime (runc, runv, etc)
/home/fedora/.lesshst:"runc
/home/fedora/.lesshst:"runc

In case it's helpful (it isn't to me), strace -ff produces two log files that match runc, and both look like:

openat(AT_FDCWD, "/home/fedora/.config/containers/storage.conf", O_RDONLY|O_CLOEXEC) = 7
epoll_ctl(4, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3131981368, u64=140521577004600}}) = -1 EPERM (Operation not permitted)
epoll_ctl(4, EPOLL_CTL_DEL, 7, 0xc0004bdb74) = -1 EPERM (Operation not permitted)
fstat(7, {st_mode=S_IFREG|0664, st_size=771, ...}) = 0
read(7, "[storage]\n  driver = \"overlay\"\n "..., 1283) = 771
read(7, "", 512)                        = 0
close(7)                                = 0
munmap(0x7fcdb9aad000, 32768)           = 0
flock(6, LOCK_UN)                       = 0
close(6)                                = 0
geteuid()                               = 1000
newfstatat(AT_FDCWD, "/run/user/1000/libpod/tmp", {st_mode=S_IFDIR|0751, st_size=140, ...}, 0) = 0
newfstatat(AT_FDCWD, "/run/user/1000/libpod/tmp/events", {st_mode=S_IFDIR|0700, st_size=40, ...}, 0) = 0
newfstatat(AT_FDCWD, "/usr/bin/runc", 0xc00023f968, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/runc", 0xc00023fa38, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/runc", 0xc00023fb08, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/sbin/runc", 0xc00023fbd8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/sbin/runc", 0xc00023fca8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/bin/runc", 0xc00023fd78, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib/cri-o-runc/sbin/runc", 0xc00023fe48, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/run/current-system/sw/bin/runc", 0xc00023ff18, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/fedora/.local/bin/runc", 0xc000294038, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/home/fedora/bin/runc", 0xc000294108, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/bin/runc", 0xc0002941d8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/runc", 0xc0002942a8, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/local/sbin/runc", 0xc000294378, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/sbin/runc", 0xc000294448, 0) = -1 ENOENT (No such file or directory)
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33[33mWARN\33[0m[0000] Error initia"..., 134) = 134
newfstatat(AT_FDCWD, "/usr/bin/crun", {st_mode=S_IFREG|0755, st_size=282616, ...}, 0) = 0

PATH is /home/fedora/.local/bin:/home/fedora/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin

If you can find where that /run/current-system/sw/bin is coming from, I think it might shed light on who/what is calling runc

@edsantiago
Copy link
Collaborator Author

...that being OCIRuntimes in libpod/runtime.go

@mheon
Copy link
Member

mheon commented Sep 16, 2019

It's falling back to $PATH because it can't find runc at any configured location.

@mheon
Copy link
Member

mheon commented Sep 16, 2019

I wonder if the default configuration isn't being applied regardless of runc being in the config file or not

@giuseppe
Copy link
Member

I think it is the default configuration being used. If there are no configuration files then we fallback to: https://github.com/containers/libpod/blob/master/libpod/runtime.go#L302-L313

I've tried configuring /etc/containers/libpod.conf with:

runtime = "crun"

[runtimes]
crun = [
    "/usr/bin/crun",
]

then as rootless rm -rf ~/.config/containers/

The configuration is created with only crun as the OCI runtime and I don't see any of the runc errors

@cmurf
Copy link

cmurf commented Sep 18, 2019

I run into this out of the box with Fedora 31 Workstation beta 1.1 which first has had podman updated to podman-1.5.1-3.git0005792.fc31.x86_64, and has not previously been run until now.

[chris@fmac ~]$ podman run hello-world
WARN[0000] Error initializing configured OCI runtime runc: no valid executable found for OCI runtime runc: invalid argument 
Trying to pull docker.io/library/hello-world...
Getting image source signatures
Copying blob 1b930d010525 done
Copying config fce289e99e done
Writing manifest to image destination
Storing signatures

Hello from Docker!
...

There is no libpod.conf in /etc but there is in ~/.config

$ ls -l .config/containers
total 8
-rw-rw-r--. 1 chris chris 1292 Sep 18 16:48 libpod.conf
-rw-rw-r--. 1 chris chris  770 Sep 18 16:48 storage.conf
[chris@fmac ~]$ 

If I run the command a second time, the problem does not occur.

$ podman run hello-world

Hello from Docker!
...

If I delete the user libpod.conf and retry, the error still does not return. And if I delete ~/.local/share/containers/* the problem also does not recur. So it's a bit confusing.

UPDATE: And if I reboot and run the command again, error happens. But only once.

@edsantiago
Copy link
Collaborator Author

ping, this is still causing gating test failures on rawhide

mheon added a commit to mheon/libpod that referenced this issue Sep 26, 2019
If we don't do this, we print WARN level messages that we should
not be printing by default.

Up one WARN message to ERROR so it still shows up by default.

Fixes: containers#4115
Fixes: containers#4012

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
5 participants