Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help debugging with cri-o #35

Closed
dmolik opened this issue Apr 13, 2019 · 31 comments · Fixed by #41
Closed

Help debugging with cri-o #35

dmolik opened this issue Apr 13, 2019 · 31 comments · Fixed by #41

Comments

@dmolik
Copy link
Contributor

dmolik commented Apr 13, 2019

I keep getting this when I try and create a container:
level=error msg="Container creation error: writing file 'cpu.shares': Bad file descriptor

ls /sys/fs/cgroup/cpu/kubepods/burstable/pod7f8667b0aa2fc59394329cc63d147fc3/
cgroup.clone_children  crio-conmon-018a9e0c4d02ed0ac3acadcb240df7d7e718a6264af811930048f75b55d16a58  crio-conmon-845ae88564fc18e50064223a1cefd85536d0a105fd6a50d14ca48a55be936114
cgroup.procs           crio-conmon-031f3ac376b34d2eecec24f263fcfd800091ad001013852ba42ecd4a5a2595e4  crio-conmon-a537c8308319eb1ab7710b9c4c4f1a590ae47c013dc38876908c8e3a7e070dbb
cpu.cfs_period_us      crio-conmon-3816120e55090b077cbdf75b62696b1e58b2655b8ee5165f28662cb9c165e3e3  crio-conmon-b4d592875062642b8627445dc26a9b80556442a8879f8deeb7be43a0d3f51c33
cpu.cfs_quota_us       crio-conmon-3aab6d526c5d97b401b287b6ecd28de911919940892b9a7a68e5adfdb969e57e  crio-conmon-c2295e785211b185f5726c647a24841cc3e444d4ca7bd0c7e29be87794f007c3
cpu.rt_period_us       crio-conmon-41c02b86cf760effc235e0b6498b45723102d23ce1daffa7cbd926ce0bd55da6  crio-conmon-d21d06f567283e6de85e51f0b87ad796fbca5f4dc397ab4748e2ae66bde5956e
cpu.rt_runtime_us      crio-conmon-468e517c34b9c0c9a4b466cbd00c89f859e00ee6b01fc89db54cd4bfa5c44499  crio-conmon-d8996193794ec44cde3dc14125f0481b5f6d4ec998dc1e6ac00d09ad4f002792
cpu.shares             crio-conmon-4cc0b934f3393dd33a40310ba09d6e3c9c0c2a498cfd1ceee8ac45d8d2201ba7  notify_on_release
cpu.stat               crio-conmon-7fcf8b268ab7050a1d4b2ee330aa4397169b60a431174ce463dff2a2d1096a21  tasks

Notice crio-UUID is missing

cat /sys/fs/cgroup/cpu/kubepods/burstable/pod7f8667b0aa2fc59394329cc63d147fc3/cpu.shares 
256
@giuseppe
Copy link
Member

thanks for the report! I am currently working on support for CRI-O. My WIP is here: cri-o/cri-o#2239.

I am still seeing some issues in crun that prevents the integration and e2e tests to pass but I am working on it.

Is there a way to store the config.json file for the container that is failing? That would make much easier to debug the issue, if not, I'll try to reproduce locally.

/cc @mrunalp

@dmolik
Copy link
Contributor Author

dmolik commented Apr 14, 2019

lesser01 /var/lib/containers/storage # cat $(echo $PWD)/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/config.json
{
	"ociVersion": "1.0.1-dev",
	"process": {
		"user": {
			"uid": 0,
			"gid": 0
		},
		"args": [
			"/pause"
		],
		"env": [
			"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
			"TERM=xterm"
		],
		"cwd": "/",
		"capabilities": {
			"bounding": [
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FSETID",
				"CAP_FOWNER",
				"CAP_NET_RAW",
				"CAP_SETGID",
				"CAP_SETUID",
				"CAP_SETPCAP",
				"CAP_NET_BIND_SERVICE",
				"CAP_SYS_CHROOT",
				"CAP_KILL"
			],
			"effective": [
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FSETID",
				"CAP_FOWNER",
				"CAP_NET_RAW",
				"CAP_SETGID",
				"CAP_SETUID",
				"CAP_SETPCAP",
				"CAP_NET_BIND_SERVICE",
				"CAP_SYS_CHROOT",
				"CAP_KILL"
			],
			"inheritable": [
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FSETID",
				"CAP_FOWNER",
				"CAP_NET_RAW",
				"CAP_SETGID",
				"CAP_SETUID",
				"CAP_SETPCAP",
				"CAP_NET_BIND_SERVICE",
				"CAP_SYS_CHROOT",
				"CAP_KILL"
			],
			"permitted": [
				"CAP_CHOWN",
				"CAP_DAC_OVERRIDE",
				"CAP_FSETID",
				"CAP_FOWNER",
				"CAP_NET_RAW",
				"CAP_SETGID",
				"CAP_SETUID",
				"CAP_SETPCAP",
				"CAP_NET_BIND_SERVICE",
				"CAP_SYS_CHROOT",
				"CAP_KILL"
			]
		},
		"oomScoreAdj": -998
	},
	"root": {
		"path": "/var/lib/containers/storage/overlay/34fa000f19d5b4ec46cdb9caabdfa8663115effba172effa6a4876c8ae872e69/merged",
		"readonly": true
	},
	"hostname": "lesser01",
	"mounts": [
		{
			"destination": "/proc",
			"type": "proc",
			"source": "proc"
		},
		{
			"destination": "/dev",
			"type": "tmpfs",
			"source": "tmpfs",
			"options": [
				"nosuid",
				"strictatime",
				"mode=755",
				"size=65536k"
			]
		},
		{
			"destination": "/dev/pts",
			"type": "devpts",
			"source": "devpts",
			"options": [
				"nosuid",
				"noexec",
				"newinstance",
				"ptmxmode=0666",
				"mode=0620",
				"gid=5"
			]
		},
		{
			"destination": "/dev/mqueue",
			"type": "mqueue",
			"source": "mqueue",
			"options": [
				"nosuid",
				"noexec",
				"nodev"
			]
		},
		{
			"destination": "/sys",
			"type": "sysfs",
			"source": "sysfs",
			"options": [
				"nosuid",
				"noexec",
				"nodev",
				"ro"
			]
		},
		{
			"destination": "/etc/resolv.conf",
			"type": "bind",
			"source": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/resolv.conf",
			"options": [
				"ro",
				"bind",
				"nodev",
				"nosuid",
				"noexec"
			]
		},
		{
			"destination": "/dev/shm",
			"type": "bind",
			"source": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/shm",
			"options": [
				"rw",
				"bind"
			]
		},
		{
			"destination": "/etc/hostname",
			"type": "bind",
			"source": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/hostname",
			"options": [
				"ro",
				"bind",
				"nodev",
				"nosuid",
				"noexec"
			]
		}
	],
	"annotations": {
		"component": "kube-scheduler",
		"io.kubernetes.container.name": "POD",
		"io.kubernetes.cri-o.Annotations": "{\"kubernetes.io/config.hash\":\"f44110a0ca540009109bfc32a7eb0baa\",\"kubernetes.io/config.seen\":\"2019-04-13T10:38:32.699683165-04:00\",\"kubernetes.io/config.source\":\"file\"}",
		"io.kubernetes.cri-o.CgroupParent": "/kubepods/burstable/podf44110a0ca540009109bfc32a7eb0baa",
		"io.kubernetes.cri-o.ContainerID": "5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2",
		"io.kubernetes.cri-o.ContainerName": "k8s_POD_kube-scheduler-lesser01_kube-system_f44110a0ca540009109bfc32a7eb0baa_0",
		"io.kubernetes.cri-o.ContainerType": "sandbox",
		"io.kubernetes.cri-o.Created": "2019-04-13T13:02:13.735873498-04:00",
		"io.kubernetes.cri-o.HostName": "lesser01",
		"io.kubernetes.cri-o.HostNetwork": "true",
		"io.kubernetes.cri-o.HostnamePath": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/hostname",
		"io.kubernetes.cri-o.IP": "",
		"io.kubernetes.cri-o.KubeName": "kube-scheduler-lesser01",
		"io.kubernetes.cri-o.Labels": "{\"component\":\"kube-scheduler\",\"io.kubernetes.container.name\":\"POD\",\"io.kubernetes.pod.name\":\"kube-scheduler-lesser01\",\"io.kubernetes.pod.namespace\":\"kube-system\",\"io.kubernetes.pod.uid\":\"f44110a0ca540009109bfc32a7eb0baa\",\"tier\":\"control-plane\"}",
		"io.kubernetes.cri-o.LogPath": "/var/log/pods/kube-system_kube-scheduler-lesser01_f44110a0ca540009109bfc32a7eb0baa/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2.log",
		"io.kubernetes.cri-o.Metadata": "{\"name\":\"kube-scheduler-lesser01\",\"uid\":\"f44110a0ca540009109bfc32a7eb0baa\",\"namespace\":\"kube-system\"}",
		"io.kubernetes.cri-o.MountPoint": "/var/lib/containers/storage/overlay/34fa000f19d5b4ec46cdb9caabdfa8663115effba172effa6a4876c8ae872e69/merged",
		"io.kubernetes.cri-o.Name": "k8s_kube-scheduler-lesser01_kube-system_f44110a0ca540009109bfc32a7eb0baa_0",
		"io.kubernetes.cri-o.Namespace": "kube-system",
		"io.kubernetes.cri-o.NamespaceOptions": "{\"network\":2,\"pid\":1}",
		"io.kubernetes.cri-o.PortMappings": "[]",
		"io.kubernetes.cri-o.PrivilegedRuntime": "true",
		"io.kubernetes.cri-o.ResolvPath": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/resolv.conf",
		"io.kubernetes.cri-o.RuntimeHandler": "",
		"io.kubernetes.cri-o.SandboxID": "5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2",
		"io.kubernetes.cri-o.SeccompProfilePath": "",
		"io.kubernetes.cri-o.ShmPath": "/var/run/containers/storage/overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/shm",
		"io.kubernetes.pod.name": "kube-scheduler-lesser01",
		"io.kubernetes.pod.namespace": "kube-system",
		"io.kubernetes.pod.uid": "f44110a0ca540009109bfc32a7eb0baa",
		"kubernetes.io/config.hash": "f44110a0ca540009109bfc32a7eb0baa",
		"kubernetes.io/config.seen": "2019-04-13T10:38:32.699683165-04:00",
		"kubernetes.io/config.source": "file",
		"tier": "control-plane"
	},
	"linux": {
		"resources": {
			"devices": [
				{
					"allow": false,
					"access": "rwm"
				}
			],
			"cpu": {
				"shares": 2
			}
		},
		"cgroupsPath": "/kubepods/burstable/podf44110a0ca540009109bfc32a7eb0baa/crio-5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2",
		"namespaces": [
			{
				"type": "pid"
			},
			{
				"type": "ipc"
			},
			{
				"type": "uts"
			},
			{
				"type": "mount"
			}
		],
		"seccomp": {
			"defaultAction": "SCMP_ACT_ERRNO",
			"architectures": [
				"SCMP_ARCH_X86_64",
				"SCMP_ARCH_X86",
				"SCMP_ARCH_X32"
			],
			"syscalls": [
				{
					"names": [
						"accept",
						"accept4",
						"access",
						"alarm",
						"bind",
						"brk",
						"capget",
						"capset",
						"chdir",
						"chmod",
						"chown",
						"chown32",
						"clock_getres",
						"clock_gettime",
						"clock_nanosleep",
						"close",
						"connect",
						"copy_file_range",
						"creat",
						"dup",
						"dup2",
						"dup3",
						"epoll_create",
						"epoll_create1",
						"epoll_ctl",
						"epoll_ctl_old",
						"epoll_pwait",
						"epoll_wait",
						"epoll_wait_old",
						"eventfd",
						"eventfd2",
						"execve",
						"execveat",
						"exit",
						"exit_group",
						"faccessat",
						"fadvise64",
						"fadvise64_64",
						"fallocate",
						"fanotify_mark",
						"fchdir",
						"fchmod",
						"fchmodat",
						"fchown",
						"fchown32",
						"fchownat",
						"fcntl",
						"fcntl64",
						"fdatasync",
						"fgetxattr",
						"flistxattr",
						"flock",
						"fork",
						"fremovexattr",
						"fsetxattr",
						"fstat",
						"fstat64",
						"fstatat64",
						"fstatfs",
						"fstatfs64",
						"fsync",
						"ftruncate",
						"ftruncate64",
						"futex",
						"futimesat",
						"getcpu",
						"getcwd",
						"getdents",
						"getdents64",
						"getegid",
						"getegid32",
						"geteuid",
						"geteuid32",
						"getgid",
						"getgid32",
						"getgroups",
						"getgroups32",
						"getitimer",
						"getpeername",
						"getpgid",
						"getpgrp",
						"getpid",
						"getppid",
						"getpriority",
						"getrandom",
						"getresgid",
						"getresgid32",
						"getresuid",
						"getresuid32",
						"getrlimit",
						"get_robust_list",
						"getrusage",
						"getsid",
						"getsockname",
						"getsockopt",
						"get_thread_area",
						"gettid",
						"gettimeofday",
						"getuid",
						"getuid32",
						"getxattr",
						"inotify_add_watch",
						"inotify_init",
						"inotify_init1",
						"inotify_rm_watch",
						"io_cancel",
						"ioctl",
						"io_destroy",
						"io_getevents",
						"ioprio_get",
						"ioprio_set",
						"io_setup",
						"io_submit",
						"ipc",
						"kill",
						"lchown",
						"lchown32",
						"lgetxattr",
						"link",
						"linkat",
						"listen",
						"listxattr",
						"llistxattr",
						"_llseek",
						"lremovexattr",
						"lseek",
						"lsetxattr",
						"lstat",
						"lstat64",
						"madvise",
						"memfd_create",
						"mincore",
						"mkdir",
						"mkdirat",
						"mknod",
						"mknodat",
						"mlock",
						"mlock2",
						"mlockall",
						"mmap",
						"mmap2",
						"mprotect",
						"mq_getsetattr",
						"mq_notify",
						"mq_open",
						"mq_timedreceive",
						"mq_timedsend",
						"mq_unlink",
						"mremap",
						"msgctl",
						"msgget",
						"msgrcv",
						"msgsnd",
						"msync",
						"munlock",
						"munlockall",
						"munmap",
						"nanosleep",
						"newfstatat",
						"_newselect",
						"open",
						"openat",
						"pause",
						"pipe",
						"pipe2",
						"poll",
						"ppoll",
						"prctl",
						"pread64",
						"preadv",
						"prlimit64",
						"pselect6",
						"pwrite64",
						"pwritev",
						"read",
						"readahead",
						"readlink",
						"readlinkat",
						"readv",
						"recv",
						"recvfrom",
						"recvmmsg",
						"recvmsg",
						"remap_file_pages",
						"removexattr",
						"rename",
						"renameat",
						"renameat2",
						"restart_syscall",
						"rmdir",
						"rt_sigaction",
						"rt_sigpending",
						"rt_sigprocmask",
						"rt_sigqueueinfo",
						"rt_sigreturn",
						"rt_sigsuspend",
						"rt_sigtimedwait",
						"rt_tgsigqueueinfo",
						"sched_getaffinity",
						"sched_getattr",
						"sched_getparam",
						"sched_get_priority_max",
						"sched_get_priority_min",
						"sched_getscheduler",
						"sched_rr_get_interval",
						"sched_setaffinity",
						"sched_setattr",
						"sched_setparam",
						"sched_setscheduler",
						"sched_yield",
						"seccomp",
						"select",
						"semctl",
						"semget",
						"semop",
						"semtimedop",
						"send",
						"sendfile",
						"sendfile64",
						"sendmmsg",
						"sendmsg",
						"sendto",
						"setfsgid",
						"setfsgid32",
						"setfsuid",
						"setfsuid32",
						"setgid",
						"setgid32",
						"setgroups",
						"setgroups32",
						"setitimer",
						"setpgid",
						"setpriority",
						"setregid",
						"setregid32",
						"setresgid",
						"setresgid32",
						"setresuid",
						"setresuid32",
						"setreuid",
						"setreuid32",
						"setrlimit",
						"set_robust_list",
						"setsid",
						"setsockopt",
						"set_thread_area",
						"set_tid_address",
						"setuid",
						"setuid32",
						"setxattr",
						"shmat",
						"shmctl",
						"shmdt",
						"shmget",
						"shutdown",
						"sigaltstack",
						"signalfd",
						"signalfd4",
						"sigreturn",
						"socket",
						"socketcall",
						"socketpair",
						"splice",
						"stat",
						"stat64",
						"statfs",
						"statfs64",
						"symlink",
						"symlinkat",
						"sync",
						"sync_file_range",
						"syncfs",
						"sysinfo",
						"syslog",
						"tee",
						"tgkill",
						"time",
						"timer_create",
						"timer_delete",
						"timerfd_create",
						"timerfd_gettime",
						"timerfd_settime",
						"timer_getoverrun",
						"timer_gettime",
						"timer_settime",
						"times",
						"tkill",
						"truncate",
						"truncate64",
						"ugetrlimit",
						"umask",
						"uname",
						"unlink",
						"unlinkat",
						"utime",
						"utimensat",
						"utimes",
						"vfork",
						"vmsplice",
						"wait4",
						"waitid",
						"waitpid",
						"write",
						"writev"
					],
					"action": "SCMP_ACT_ALLOW"
				},
				{
					"names": [
						"personality"
					],
					"action": "SCMP_ACT_ALLOW",
					"args": [
						{
							"index": 0,
							"value": 0,
							"op": "SCMP_CMP_EQ"
						},
						{
							"index": 0,
							"value": 8,
							"op": "SCMP_CMP_EQ"
						},
						{
							"index": 0,
							"value": 4294967295,
							"op": "SCMP_CMP_EQ"
						}
					]
				},
				{
					"names": [
						"chroot"
					],
					"action": "SCMP_ACT_ALLOW"
				},
				{
					"names": [
						"clone"
					],
					"action": "SCMP_ACT_ALLOW",
					"args": [
						{
							"index": 0,
							"value": 2080505856,
							"op": "SCMP_CMP_MASKED_EQ"
						}
					]
				},
				{
					"names": [
						"arch_prctl"
					],
					"action": "SCMP_ACT_ALLOW"
				},
				{
					"names": [
						"modify_ldt"
					],
					"action": "SCMP_ACT_ALLOW"
				}
			]
		}
	}
}lesser01 /var/lib/containers/storage # 
/sys/fs/cgroup/cpu$( cat ./overlay-containers/5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2/userdata/config.json | jq -Mr ".linux.cgroupsPath" )
-su: /sys/fs/cgroup/cpu/kubepods/burstable/podf44110a0ca540009109bfc32a7eb0baa/crio-5e6de76483f8828b831beca55acc75568523a95680734f0911084148fd6002d2: No such file or directory

@dmolik
Copy link
Contributor Author

dmolik commented Apr 14, 2019

hopefully that's useful...

@giuseppe
Copy link
Member

thanks, that is helpful. I've added a patch that catches earlier the error in opening the cgroup directory, but that is not enough yet to address the issue. What distro and kernel are you using?

@dmolik
Copy link
Contributor Author

dmolik commented Apr 14, 2019

Gentoo, kernel 5.0.7, openrc

@dmolik
Copy link
Contributor Author

dmolik commented Apr 15, 2019

this probably isn't helpful, but when I set the runtime to runc the .linux.cgroupsPath is created

@giuseppe
Copy link
Member

I've just merged some patches that let the CRI-O integration tests pass successfully (except for three tests that are dependent on a runc behaviour). There were no changes needed in the cgroup part though, I'll need to look at this separately

@giuseppe
Copy link
Member

any hint on what is the quickest way to get access to your same environment? I've tried a vagrant machine for Gentoo but it seems to get stuck. If there is nothing easier, I'll try to go through the full installation

@dmolik
Copy link
Contributor Author

dmolik commented Apr 17, 2019

MacOS as the base machine?

Another option is a VPS, I use Gentoo on https://linode.com

@dmolik
Copy link
Contributor Author

dmolik commented Apr 17, 2019

I would say Alpine, but I don't think they have a cri-o package yet.

giuseppe referenced this issue Apr 18, 2019
the issue didn't occur on Fedora as both cpu and cpu,cpuacct are
linked to the same directory, while it happens if the two subsystems
are mounted separately, such as on Gentoo+openrc.

Closes: https://github.com/giuseppe/crun/issues/35

Signed-off-by: Giuseppe Scrivano <giuseppe@scrivano.org>
@dmolik
Copy link
Contributor Author

dmolik commented Apr 18, 2019

I compiled patch #41 , and I'm getting this in the cri-o logs:

time="2019-04-18 11:09:30.353132930-04:00" level=info msg="Attempting to run pod sandbox with infra container: kube-system/kube-controller-manager-lesser01/POD" 
time="2019-04-18 11:09:30.353179614-04:00" level=debug msg="parsed reference into "[overlay@/var/lib/containers/storage+/var/run/containers/storage]k8s.gcr.io/pause:3.1"" 
time="2019-04-18 11:09:30.353494208-04:00" level=debug msg="exporting opaque data as blob "sha256:da86e6ba6ca197bf6bc5e9d900febd906b133eaa4750e6bed647b0fbe50ed43e"" 
time="2019-04-18 11:09:30.375666983-04:00" level=debug msg="created pod sandbox "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f"" 
time="2019-04-18 11:09:30.381432401-04:00" level=debug msg="pod sandbox "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f" has work directory "/var/lib/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata"" 
time="2019-04-18 11:09:30.381475661-04:00" level=debug msg="pod sandbox "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f" has run directory "/var/run/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata"" 
time="2019-04-18 11:09:30.390420579-04:00" level=debug msg="overlay: mount_data=lowerdir=/var/lib/containers/storage/overlay/l/27QPZN3VEWJOPGI5AP7BP23QCW,upperdir=/var/lib/containers/storage/overlay/be04d7df39ffd55703a4de15cdbc2ad75c8d9000df8c6bfba1cbe79169bb45e3/diff,workdir=/var/lib/containers/storage/overlay/be04d7df39ffd55703a4de15cdbc2ad75c8d9000df8c6bfba1cbe79169bb45e3/work" 
time="2019-04-18 11:09:30.390720062-04:00" level=debug msg="mounted container "a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f" at "/var/lib/containers/storage/overlay/be04d7df39ffd55703a4de15cdbc2ad75c8d9000df8c6bfba1cbe79169bb45e3/merged"" 
time="2019-04-18 11:09:30.391612056-04:00" level=debug msg="running conmon: /usr/libexec/crio/conmon" args=[--syslog -c a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f -u a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f -r /usr/bin/crun -b /var/run/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata -p /var/run/containers/storage/overlay-containers/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/userdata/pidfile -l /var/log/pods/kube-system_kube-controller-manager-lesser01_8dac7afa85d5212e4fa0be5103f31601/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level debug] 
time="2019-04-18 11:09:30.578157451-04:00" level=debug msg="Received container pid: 4629" 
error opening file '/run/crun/a5a78de051be45b20936aa0c0011957e257f73c69a0cfd324455ce7339a6884f/status': No such file or directory

@giuseppe
Copy link
Member

do you have anything under /var/run/user/0/crun?

I've seen that issue in the past, it depends on XDG_RUNTIME_DIR that is not always set. I'll need to find a better way to address that. I don't like much the way runc does it as it detect whether runc is running in a user namespace, but probably there are no better alternatives to it

@giuseppe
Copy link
Member

if you have anything under /var/run/user/0/crun then the best workaround for now is to ensure XDG_RUNTIME_DIR is not set at all for root

@dmolik
Copy link
Contributor Author

dmolik commented Apr 18, 2019

Okay I'll double check

@giuseppe
Copy link
Member

might be caused by 4966bb6 that was recently merged

@dmolik
Copy link
Contributor Author

dmolik commented Apr 18, 2019

so on this machine I don't have a /run/user dir

@giuseppe
Copy link
Member

is XDG_RUNTIME_DIR set?

@dmolik
Copy link
Contributor Author

dmolik commented Apr 18, 2019

doesn't look like it
xargs --null --max-args=1 echo < /proc/2882/environ
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_OCI_SYNCPIPE=3
_OCI_STARTPIPE=4
XDG_RUNTIME_DIR=
_LIBCONTAINER_CLONED_BINARY=1

@giuseppe
Copy link
Member

thanks to check this out, what about the CRI-O process?

@giuseppe
Copy link
Member

on my Linode Gentoo VM I don't see XDG_RUNTIME_DIR= set.

Could you revert 4966bb6 and see if that is the issue?

@giuseppe giuseppe reopened this Apr 18, 2019
@dmolik
Copy link
Contributor Author

dmolik commented Apr 18, 2019

okay the crio process has no XDG_RUNTIME_DIR var, and reverting 4966bb6 didn't seem to help

@dmolik
Copy link
Contributor Author

dmolik commented Apr 19, 2019

just pulled down master still unable to find /run/crun//status
I ran the test suite

Making check in libocispec
make[1]: Entering directory '/root/crun/libocispec'
make[1]: Circular /root/crun/libocispec/tests/data <- /root/crun/libocispec/tests/data dependency dropped.
  GEN      public-submodule-commit
make  check-am
make[2]: Entering directory '/root/crun/libocispec'
make  check-TESTS
make[3]: Entering directory '/root/crun/libocispec'
make[4]: Entering directory '/root/crun/libocispec'
PASS: tests/test-1
PASS: tests/test-2
PASS: tests/test-3
PASS: tests/test-4
PASS: tests/test-5
PASS: tests/test-6
PASS: tests/test-7
PASS: tests/test-8
============================================================================
Testsuite summary for libocispec 0.1.1
============================================================================
# TOTAL: 8
# PASS:  8
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================
make[4]: Leaving directory '/root/crun/libocispec'
make[3]: Leaving directory '/root/crun/libocispec'
make[2]: Leaving directory '/root/crun/libocispec'
make[1]: Leaving directory '/root/crun/libocispec'
make[1]: Entering directory '/root/crun'
make  check-TESTS
make[2]: Entering directory '/root/crun'
make[3]: Entering directory '/root/crun'
PASS: tests/test_capabilities.py 1 - no-caps
PASS: tests/test_capabilities.py 2 - new-privs
PASS: tests/test_capabilities.py 3 - some-caps-bounding
PASS: tests/test_capabilities.py 4 - some-caps-inheritable
PASS: tests/test_capabilities.py 5 - some-caps-ambient
PASS: tests/test_capabilities.py 6 - some-caps-permitted
PASS: tests/test_capabilities.py 7 - some-caps-effective-non-root
PASS: tests/test_capabilities.py 8 - some-caps-bounding-non-root
PASS: tests/test_capabilities.py 9 - some-caps-inheritable-non-root
PASS: tests/test_capabilities.py 10 - some-caps-ambient-non-root
PASS: tests/test_capabilities.py 11 - some-caps-permitted-non-root
PASS: tests/test_cwd.py 1 - cwd
PASS: tests/test_devices.py 1 - deny-devices
PASS: tests/test_devices.py 2 - allow-device
PASS: tests/test_hostname.py 1 - hostname
PASS: tests/test_limits.py 1 - limit-pid-0
PASS: tests/test_limits.py 2 - limit-pid-n
SKIP: tests/test_mounts.py
PASS: tests/test_paths.py 1 - readonly-paths
PASS: tests/test_paths.py 2 - masked-paths
PASS: tests/test_pid.py 1 - pid
PASS: tests/test_pid.py 2 - pid-user
PASS: tests/test_pid_file.py 1 - test_pid_file
PASS: tests/test_preserve_fds.py 1 - preserve-fds-0
PASS: tests/test_preserve_fds.py 2 - preserve-fds-some
PASS: tests/test_uid_gid.py 1 - uid
PASS: tests/test_uid_gid.py 2 - gid
PASS: tests/test_rlimits.py 1 - rlimits
PASS: tests/test_tty.py 1 - test-stdin-tty
PASS: tests/test_tty.py 2 - test-stdout-tty
PASS: tests/test_tty.py 3 - test-stderr-tty
PASS: tests/test_tty.py 4 - test-detach-tty
PASS: tests/test_hooks.py 1 - test-fail-prestart
PASS: tests/test_hooks.py 2 - test-success-prestart
SKIP: tests/test_update.py 1 - test-update # SKIP
PASS: tests/test_detach.py 1 - test-detach
PASS: tests/test_resources.py 1 - resources-pid-limit
FAIL: tests/test_start.py 1 - start
PASS: tests/test_exec.py 1 - exec
PASS: tests/test_exec.py 2 - exec-not-exists
PASS: tests/test_exec.py 3 - exec-detach-not-exists
PASS: tests/tests_libcrun_utils 1 - test_crun_path_exists
PASS: tests/tests_libcrun_utils 2 - test_write_read_file
PASS: tests/tests_libcrun_utils 3 - test_run_process
PASS: tests/tests_libcrun_utils 4 - test_dir_p
PASS: tests/tests_libcrun_utils 5 - test_socket_pair
PASS: tests/tests_libcrun_utils 6 - test_send_receive_fd
PASS: tests/tests_libcrun_errors 1 - test_crun_make_error
PASS: tests/tests_libcrun_errors 2 - test_crun_write_warning_and_release
============================================================================
Testsuite summary for crun 0.4
============================================================================
# TOTAL: 49
# PASS:  46
# SKIP:  2
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
Please report to giuseppe@scrivano.org
============================================================================
make[3]: *** [Makefile:1577: test-suite.log] Error 1
make[3]: Leaving directory '/root/crun'
make[2]: *** [Makefile:1685: check-TESTS] Error 2
make[2]: Leaving directory '/root/crun'
make[1]: *** [Makefile:1922: check-am] Error 2
make[1]: Leaving directory '/root/crun'
make: *** [Makefile:1462: check-recursive] Error 1
================================
   crun 0.4: ./test-suite.log
================================

# TOTAL: 49
# PASS:  46
# SKIP:  2
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

SKIP: tests/test_mounts
=======================

1..0
SKIP: tests/test_mounts.py

SKIP: tests/test_update
=======================

1..1
ok 1 - test-update #SKIP
SKIP: tests/test_update.py 1 - test-update # SKIP

FAIL: tests/test_start
======================

error opening file '/run/crun/test-tmp_w1kw99i/status': No such file or directory
a bytes-like object is required, not 'str'
1..1
not ok 1 - start
FAIL: tests/test_start.py 1 - start

@giuseppe
Copy link
Member

I'll give it another attempt in the next days (I am quite sure it is some weird interaction of XDG_RUNTIME_DIR).
In the meanwhile, I am working to something related and I've addressed some issues in the last days that are bringing CRI-O+crun closer to pass the Kubernetes e2e tests, progresses are tracked here:

cri-o/cri-o#2239

(just Fedora for now, as the RHEL failures are expected for a missing package)

@giuseppe
Copy link
Member

it seems it gets confused on Gentoo as there is no pids cgroup controller?

@dmolik
Copy link
Contributor Author

dmolik commented Apr 19, 2019

mount |grep pids
pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

@dmolik
Copy link
Contributor Author

dmolik commented Apr 19, 2019

test suite passes now

@dmolik
Copy link
Contributor Author

dmolik commented Apr 19, 2019

crio conmon does have the XDG_RUNTIME variable but it's not set

@giuseppe
Copy link
Member

I finally managed to pass the Kubernetes e2e tests with CRI-O and crun

@dmolik
Copy link
Contributor Author

dmolik commented Apr 23, 2019

Coolio!

But, getting back to business. I compiled master this morning (eastern US) and I was still having the same status file not found error. So I searched for status in the Repo, and took a closer look at this function;

https://github.com/giuseppe/crun/blob/77836e488fb847e8d76264d8ad192698c4d5f482/src/libcrun/status.c#L32:L49

so on a whimsy I did a ls on / and there is was, the /crun folder.

@giuseppe
Copy link
Member

good hint!

So I guess the issue is in the XDG_RUNTIME_DIR to be defined but empty. Is this patch making any difference?

https://github.com/giuseppe/crun/pull/44

@dmolik
Copy link
Contributor Author

dmolik commented Apr 23, 2019

I need a couple of minutes to finish up another task, I'll check as soon as I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants