podman top "-eo" Flag not working #19001

Mortom123 · 2023-06-26T14:38:24Z

Issue Description

The podman top command crashes with the error: Error: unexpected end of JSON input when running the podman top command with the -eo flag

Steps to reproduce the issue

Run a pod: podman run -it registry.access.redhat.com/ubi8:8.8-854
Check the top command for the new container: podman top ctId -eo pid,user

podman top -h also specifies: In the presence of ps(1) specific flags (e.g, -eo), Podman will execute ps(1) inside the container.
The image registry.access.redhat.com/ubi8:8.8-854 does not come with ps(1), hence I also tried the same procedure with: registry.access.redhat.com/rhel7:7.9-1011

Describe the results you received

Error: unexpected end of JSON input

Describe the results you expected

There should not be an error and the behavior should be similar to what happens when running docker top ctId -eo pid,user

podman info output

`podman version`

Client:       Podman Engine
Version:      4.4.1
API Version:  4.4.1
Go Version:   go1.19.6
Built:        Wed Apr 26 19:13:11 2023
OS/Arch:      linux/amd64

Server:       Podman Engine
Version:      4.4.1
API Version:  4.4.1
Go Version:   go1.19.6
Built:        Wed Apr 26 19:13:11 2023
OS/Arch:      linux/amd64

`podman info`
host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.6-1.module+el8.8.0+18098+9b44df5f.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: 8c4ab5a095127ecc96ef8a9c885e0e1b14aeb11b'
  cpuUtilization:
    idlePercent: 91.85
    systemPercent: 2.64
    userPercent: 5.51
  cpus: 2
  distribution:
    distribution: '"rhel"'
    version: "8.8"
  eventLogger: file
  hostname: ...
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-477.13.1.el8_8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 3818549248
  memTotal: 16494587904
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.1.4-1.module+el8.8.0+18060+3f21f2cc.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.4
      spec: 1.0.2-dev
      go: go1.19.4
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_SYS_CHROOT,CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-2.module+el8.8.0+18060+3f21f2cc.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 4248731648
  swapTotal: 4294963200
  uptime: 292h 16m 0.00s (Approximately 12.17 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - ...
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 18
    paused: 0
    running: 10
    stopped: 8
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /net/.../fs0/podman
  graphRootAllocated: 107317563392
  graphRootUsed: 61999063040
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 727
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /app/podman/volumes
version:
  APIVersion: 4.4.1
  Built: 1682529191
  BuiltTime: Wed Apr 26 19:13:11 2023
  GitCommit: ""
  GoVersion: go1.19.6
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Running the command without -eo is not a viable option, as this is part of a Jenkins Build Pipeline. Similar to: #8033 (comment)

Additional information

No response

The text was updated successfully, but these errors were encountered:

Luap99 · 2023-06-26T15:38:19Z

The error handling was fixed in: #18983.

Regarding the case that we should not depend on ps in the container I agree but I don't consider this a bug as it works as documented right now.

This ended up more complicated then expected. Lets start first with the problem to show why I am doing this: Currently we simply execute ps(1) in the container. This has some drawbacks. First, obviously you need to have ps(1) in the container image. That is no always the case especially in small images. Second, even if you do it will often be only busybox's ps which supports far less options. Now we also have psgo which is used by default but that only supports a small subset of ps(1) options. Implementing all options there is way to much work. Docker on the other hand executes ps(1) directly on the host and tries to filter pids with `-q` an option which is not supported by busybox's ps and conflicts with other ps(1) arguments. That means they fall back to full ps(1) on the host and then filter based on the pid in the output. This is kinda ugly and fails short because users can modify the ps output and it may not even include the pid in the output which causes and error. So every solution has a different drawback, but what if we can combine them somehow?! This commit tries exactly that. We use ps(1) from the host and execute that in the contianer. Now unfortunately because ps(1) is dynamically linked (at least on the mainstream distros) this is not trivial. The trick here is in theory simple, open the binary on the host then we have a fd for it and can refer to the path via /proc/self/fd/<NUM>. Now join the container mount and pid ns then simple execute the fd path. That fails quickly because the linker will try to load the shared libs and because we are in a different mount ns that fails. Now to solve this we use the same trick with the LD_PRELOAD variable basically to make the linker load the opened libs on the host via the fd paths. Except that still don't works because even the linker in the container can be different. Compare glibc vs musl based distros. So we first have to get the right linker path and open this one as well in order to execute it directly. Now because we execute the linker directly we can no longer use the LD_ vars and have to set the cli arguments directly, i.e. --preload. In order to get the actual linker path and shared libraries we first execute ldd(1) to get the output. We can then parse that and open all correct paths. If we have a static binary we can skip all that and just execute it directly on the host, we assume it is static if ldd fails. Technically this could be a breaking change if somebody does not have ps on the host and only in the contianer but I find that very unlikely so I have removed the in contianer fallback. Fixes containers#19001 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

This ended up more complicated then expected. Lets start first with the problem to show why I am doing this: Currently we simply execute ps(1) in the container. This has some drawbacks. First, obviously you need to have ps(1) in the container image. That is no always the case especially in small images. Second, even if you do it will often be only busybox's ps which supports far less options. Now we also have psgo which is used by default but that only supports a small subset of ps(1) options. Implementing all options there is way to much work. Docker on the other hand executes ps(1) directly on the host and tries to filter pids with `-q` an option which is not supported by busybox's ps and conflicts with other ps(1) arguments. That means they fall back to full ps(1) on the host and then filter based on the pid in the output. This is kinda ugly and fails short because users can modify the ps output and it may not even include the pid in the output which causes an error. So every solution has a different drawback, but what if we can combine them somehow?! This commit tries exactly that. We use ps(1) from the host and execute that in the container. Now unfortunately because ps(1) is dynamically linked (at least on the mainstream distros) this is not trivial. The trick here is in theory simple, open the binary on the host then we have a fd for it and can refer to the path via /proc/self/fd/<NUM>. Now join the container mount and pid ns then simple execute the fd path. That fails quickly because the linker will try to load the shared libs and because we are in a different mount ns that fails. Now to solve this we use the same trick with the LD_PRELOAD variable basically to make the linker load the opened libs on the host via the fd paths. Except that still don't works because even the linker in the container can be different. Compare glibc vs musl based distros. So we first have to get the right linker path and open this one as well in order to execute it directly. Now because we execute the linker directly we can no longer use the LD_ vars and have to set the cli arguments directly, i.e. --preload. In order to get the actual linker path and shared libraries we first execute ldd(1) to get the output. We can then parse that and open all correct paths. If we have a static binary we can skip all that and just execute it directly on the host, we assume it is static if ldd fails. Technically this could be a breaking change if somebody does not have ps on the host and only in the container but I find that very unlikely so I have removed the in container fallback. Fixes containers#19001 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

This ended up more complicated then expected. Lets start first with the problem to show why I am doing this: Currently we simply execute ps(1) in the container. This has some drawbacks. First, obviously you need to have ps(1) in the container image. That is no always the case especially in small images. Second, even if you do it will often be only busybox's ps which supports far less options. Now we also have psgo which is used by default but that only supports a small subset of ps(1) options. Implementing all options there is way to much work. Docker on the other hand executes ps(1) directly on the host and tries to filter pids with `-q` an option which is not supported by busybox's ps and conflicts with other ps(1) arguments. That means they fall back to full ps(1) on the host and then filter based on the pid in the output. This is kinda ugly and fails short because users can modify the ps output and it may not even include the pid in the output which causes an error. So every solution has a different drawback, but what if we can combine them somehow?! This commit tries exactly that. We use ps(1) from the host and execute that in the container. Now unfortunately because ps(1) is dynamically linked (at least on the mainstream distros) this is not trivial. The trick here is in theory simple, open the binary on the host then we have a fd for it and can refer to the path via /proc/self/fd/<NUM>. Now join the container mount and pid ns then simple execute the fd path. That fails quickly because the linker will try to load the shared libs and because we are in a different mount ns that fails. Now to solve this we use the same trick with the LD_PRELOAD variable basically to make the linker load the opened libs on the host via the fd paths. Except that still don't works because even the linker in the container can be different. Compare glibc vs musl based distros. So we first have to get the right linker path and open this one as well in order to execute it directly. Now because we execute the linker directly we can no longer use the LD_ vars and have to set the cli arguments directly, i.e. --preload. In order to get the actual linker path and shared libraries we first execute ldd(1) to get the output. We can then parse that and open all correct paths. If we have a static binary we can skip all that and just execute it directly on the host, we assume it is static if ldd fails. Technically this could be a breaking change if somebody does not have ps on the host and only in the container but I find that very unlikely so I have removed the in container fallback. This updates the docs accordingly, note that podman pod top never falls back to executing ps in the contianer as this makes no sense with multiple containers so I fuxed the docs there as well. Fixes containers#19001 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

This ended up more complicated then expected. Lets start first with the problem to show why I am doing this: Currently we simply execute ps(1) in the container. This has some drawbacks. First, obviously you need to have ps(1) in the container image. That is no always the case especially in small images. Second, even if you do it will often be only busybox's ps which supports far less options. Now we also have psgo which is used by default but that only supports a small subset of ps(1) options. Implementing all options there is way to much work. Docker on the other hand executes ps(1) directly on the host and tries to filter pids with `-q` an option which is not supported by busybox's ps and conflicts with other ps(1) arguments. That means they fall back to full ps(1) on the host and then filter based on the pid in the output. This is kinda ugly and fails short because users can modify the ps output and it may not even include the pid in the output which causes an error. So every solution has a different drawback, but what if we can combine them somehow?! This commit tries exactly that. We use ps(1) from the host and execute that in the container. Now unfortunately because ps(1) is dynamically linked (at least on the mainstream distros) this is not trivial. The trick here is in theory simple, open the binary on the host then we have a fd for it and can refer to the path via /proc/self/fd/<NUM>. Now join the container mount and pid ns then simple execute the fd path. That fails quickly because the linker will try to load the shared libs and because we are in a different mount ns that fails. Now to solve this we use the same trick with the LD_PRELOAD variable basically to make the linker load the opened libs on the host via the fd paths. Except that still don't works because even the linker in the container can be different. Compare glibc vs musl based distros. So we first have to get the right linker path and open this one as well in order to execute it directly. Now because we execute the linker directly we can no longer use the LD_ vars and have to set the cli arguments directly, i.e. --preload. In order to get the actual linker path and shared libraries we first execute ldd(1) to get the output. We can then parse that and open all correct paths. If we have a static binary we can skip all that and just execute it directly on the host, we assume it is static if ldd fails. Technically this could be a breaking change if somebody does not have ps on the host and only in the container but I find that very unlikely so I have removed the in container fallback. This updates the docs accordingly, note that podman pod top never falls back to executing ps in the container as this makes no sense with multiple containers so I fixed the docs there as well. Fixes containers#19001 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

This ended up more complicated then expected. Lets start first with the problem to show why I am doing this: Currently we simply execute ps(1) in the container. This has some drawbacks. First, obviously you need to have ps(1) in the container image. That is no always the case especially in small images. Second, even if you do it will often be only busybox's ps which supports far less options. Now we also have psgo which is used by default but that only supports a small subset of ps(1) options. Implementing all options there is way to much work. Docker on the other hand executes ps(1) directly on the host and tries to filter pids with `-q` an option which is not supported by busybox's ps and conflicts with other ps(1) arguments. That means they fall back to full ps(1) on the host and then filter based on the pid in the output. This is kinda ugly and fails short because users can modify the ps output and it may not even include the pid in the output which causes an error. So every solution has a different drawback, but what if we can combine them somehow?! This commit tries exactly that. We use ps(1) from the host and execute that in the container's pid namespace. Technically this could be a breaking change if somebody does not have ps on the host and only in the container but I find that very unlikely so I have removed the in container fallback. This updates the docs accordingly, note that podman pod top never falls back to executing ps in the container as this makes no sense with multiple containers so I fixed the docs there as well. Fixes containers#19001 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

This ended up more complicated then expected. Lets start first with the problem to show why I am doing this: Currently we simply execute ps(1) in the container. This has some drawbacks. First, obviously you need to have ps(1) in the container image. That is no always the case especially in small images. Second, even if you do it will often be only busybox's ps which supports far less options. Now we also have psgo which is used by default but that only supports a small subset of ps(1) options. Implementing all options there is way to much work. Docker on the other hand executes ps(1) directly on the host and tries to filter pids with `-q` an option which is not supported by busybox's ps and conflicts with other ps(1) arguments. That means they fall back to full ps(1) on the host and then filter based on the pid in the output. This is kinda ugly and fails short because users can modify the ps output and it may not even include the pid in the output which causes an error. So every solution has a different drawback, but what if we can combine them somehow?! This commit tries exactly that. We use ps(1) from the host and execute that in the container's pid namespace. There are some security concerns that must be addressed: - mount the executable paths for ps and podman itself readonly to prevent the container from overwriting it via /proc/self/exe. - set NO_NEW_PRIVS, SET_DUMPABLE and PDEATHSIG - close all non std fds to prevent leaking files in that the caller had open - unset all environment variables to not leak any into the contianer Technically this could be a breaking change if somebody does not have ps on the host and only in the container but I find that very unlikely so I have removed the in container fallback. This updates the docs accordingly, note that podman pod top never falls back to executing ps in the container as this makes no sense with multiple containers so I fixed the docs there as well. Fixes containers#19001 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

This ended up more complicated then expected. Lets start first with the problem to show why I am doing this: Currently we simply execute ps(1) in the container. This has some drawbacks. First, obviously you need to have ps(1) in the container image. That is no always the case especially in small images. Second, even if you do it will often be only busybox's ps which supports far less options. Now we also have psgo which is used by default but that only supports a small subset of ps(1) options. Implementing all options there is way to much work. Docker on the other hand executes ps(1) directly on the host and tries to filter pids with `-q` an option which is not supported by busybox's ps and conflicts with other ps(1) arguments. That means they fall back to full ps(1) on the host and then filter based on the pid in the output. This is kinda ugly and fails short because users can modify the ps output and it may not even include the pid in the output which causes an error. So every solution has a different drawback, but what if we can combine them somehow?! This commit tries exactly that. We use ps(1) from the host and execute that in the container's pid namespace. There are some security concerns that must be addressed: - mount the executable paths for ps and podman itself readonly to prevent the container from overwriting it via /proc/self/exe. - set NO_NEW_PRIVS, SET_DUMPABLE and PDEATHSIG - close all non std fds to prevent leaking files in that the caller had open - unset all environment variables to not leak any into the contianer Technically this could be a breaking change if somebody does not have ps on the host and only in the container but I find that very unlikely, we still have the exec in container fallback. Because this can be insecure when the contianer has CAP_SYS_PTRACE we still only use the podman exec version in that case. This updates the docs accordingly, note that podman pod top never falls back to executing ps in the container as this makes no sense with multiple containers so I fixed the docs there as well. Fixes containers#19001 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572 Signed-off-by: Paul Holzinger <pholzing@redhat.com>

Mortom123 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 26, 2023

github-actions bot added the remote Problem is in podman-remote label Jun 26, 2023

Luap99 self-assigned this Jun 30, 2023

Luap99 mentioned this issue Jun 30, 2023

top: do not depend on ps(1) in container #19066

Merged

EvaristeGalois11 mentioned this issue Jul 9, 2023

podman --remote top -eo pid,comm doesn't parse the columns parameter correctly #19176

Closed

openshift-merge-robot closed this as completed in #19066 Jul 14, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 13, 2023

github-actions bot locked as resolved and limited conversation to collaborators Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman top "-eo" Flag not working #19001

podman top "-eo" Flag not working #19001

Mortom123 commented Jun 26, 2023

Luap99 commented Jun 26, 2023

podman top "-eo" Flag not working #19001

podman top "-eo" Flag not working #19001

Comments

Mortom123 commented Jun 26, 2023

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Luap99 commented Jun 26, 2023