Skip to content

Commit

Permalink
top: do not depend on ps(1) in container
Browse files Browse the repository at this point in the history
This ended up more complicated then expected. Lets start first with the
problem to show why I am doing this:

Currently we simply execute ps(1) in the container. This has some
drawbacks. First, obviously you need to have ps(1) in the container
image. That is no always the case especially in small images. Second,
even if you do it will often be only busybox's ps which supports far
less options.

Now we also have psgo which is used by default but that only supports a
small subset of ps(1) options. Implementing all options there is way to
much work.

Docker on the other hand executes ps(1) directly on the host and tries
to filter pids with `-q` an option which is not supported by busybox's
ps and conflicts with other ps(1) arguments. That means they fall back
to full ps(1) on the host and then filter based on the pid in the
output. This is kinda ugly and fails short because users can modify the
ps output and it may not even include the pid in the output which causes
an error.

So every solution has a different drawback, but what if we can combine
them somehow?! This commit tries exactly that.

We use ps(1) from the host and execute that in the container. Now
unfortunately because ps(1) is dynamically linked (at least on the
mainstream distros) this is not trivial.

The trick here is in theory simple, open the binary on the host then we
have a fd for it and can refer to the path via /proc/self/fd/<NUM>.
Now join the container mount and pid ns then simple execute the fd path.
That fails quickly because the linker will try to load the shared libs
and because we are in a different mount ns that fails.
Now to solve this we use the same trick with the LD_PRELOAD variable
basically to make the linker load the opened libs on the host via the fd
paths. Except that still don't works because even the linker in the
container can be different. Compare glibc vs musl based distros.
So we first have to get the right linker path and open this one as well
in order to execute it directly.

Now because we execute the linker directly we can no longer use the LD_
vars and have to set the cli arguments directly, i.e. --preload.
In order to get the actual linker path and shared libraries we first
execute ldd(1) to get the output. We can then parse that and open all
correct paths.

If we have a static binary we can skip all that and just execute it
directly on the host, we assume it is static if ldd fails.

Technically this could be a breaking change if somebody does not
have ps on the host and only in the container but I find that very
unlikely so I have removed the in container fallback.

This updates the docs accordingly, note that podman pod top never falls
back to executing ps in the container as this makes no sense with
multiple containers so I fixed the docs there as well.

Fixes containers#19001
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=2215572

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
  • Loading branch information
Luap99 committed Jun 30, 2023
1 parent a16d83d commit 9b4e218
Show file tree
Hide file tree
Showing 7 changed files with 148 additions and 44 deletions.
4 changes: 3 additions & 1 deletion docs/source/markdown/podman-pod-top.1.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ podman\-pod\-top - Display the running processes of containers in a pod
**podman pod top** [*options*] *pod* [*format-descriptors*]

## DESCRIPTION
Display the running processes of containers in a pod. The *format-descriptors* are ps (1) compatible AIX format descriptors but extended to print additional information, such as the seccomp mode or the effective capabilities of a given process. The descriptors can either be passed as separate arguments or as a single comma-separated argument. Note that if additional options of ps(1) are specified, Podman falls back to executing ps with the specified arguments and options in the container.
Display the running processes of containers in a pod. The *format-descriptors* are ps (1) compatible AIX format
descriptors but extended to print additional information, such as the seccomp mode or the effective capabilities
of a given process. The descriptors can either be passed as separate arguments or as a single comma-separated argument.

## OPTIONS

Expand Down
10 changes: 8 additions & 2 deletions docs/source/markdown/podman-top.1.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,13 @@ podman\-top - Display the running processes of a container
**podman container top** [*options*] *container* [*format-descriptors*]

## DESCRIPTION
Display the running processes of the container. The *format-descriptors* are ps (1) compatible AIX format descriptors but extended to print additional information, such as the seccomp mode or the effective capabilities of a given process. The descriptors can either be passed as separated arguments or as a single comma-separated argument. Note that options and or flags of ps(1) can also be specified; in this case, Podman falls back to executing ps with the specified arguments and flags in the container. Please use the "h*" descriptors to extract host-related information. For instance, `podman top $name hpid huser` to display the PID and user of the processes in the host context.
Display the running processes of the container. The *format-descriptors* are ps (1) compatible AIX format
descriptors but extended to print additional information, such as the seccomp mode or the effective capabilities
of a given process. The descriptors can either be passed as separated arguments or as a single comma-separated
argument. Note that options and or flags of ps(1) can also be specified; in this case, Podman falls back to
executing ps(1) from the host with the specified arguments and flags in the container namespace. Please use the
"h*" descriptors to extract host-related information. For instance, `podman top $name hpid huser` to display
the PID and user of the processes in the host context.

## OPTIONS

Expand Down Expand Up @@ -90,7 +96,7 @@ PID SECCOMP COMMAND %CPU
8 filter vi /etc/ 0.000
```

Podman falls back to executing ps(1) in the container if an unknown descriptor is specified.
Podman falls back to executing ps(1) from the host in the container namespace if an unknown descriptor is specified.

```
$ podman top -l -- aux
Expand Down
159 changes: 125 additions & 34 deletions libpod/container_top_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,13 @@ package libpod

import (
"bufio"
"bytes"
"errors"
"fmt"
"os"
"os/exec"
"path"
"runtime"
"strconv"
"strings"

Expand All @@ -16,6 +20,7 @@ import (
"github.com/containers/psgo"
"github.com/google/shlex"
"github.com/sirupsen/logrus"
"golang.org/x/sys/unix"
)

// Top gathers statistics about the running processes in a container. It returns a
Expand Down Expand Up @@ -113,60 +118,146 @@ func (c *Container) GetContainerPidInformation(descriptors []string) ([]string,
return res, nil
}

// execPS executes ps(1) with the specified args in the container.
func (c *Container) execPS(args []string) ([]string, error) {
// execute ps(1) from the host within the container mountns
// This is done by first lookup the ps host path and open a fd for it,
// then read all linked libs from it then open them as well, (including the linker).
// Then we can join the pid and mountns, lastly execute the linker directly via
// /proc/self/fd/X and use --preload to set all shared libs as well.
// That way we open everything on the host and do not depend on any container libs.
func (c *Container) execPS(psArgs []string) ([]string, error) {
rPipe, wPipe, err := os.Pipe()
if err != nil {
return nil, err
}
defer wPipe.Close()
defer rPipe.Close()

rErrPipe, wErrPipe, err := os.Pipe()
if err != nil {
return nil, err
}
defer wErrPipe.Close()
defer rErrPipe.Close()

streams := new(define.AttachStreams)
streams.OutputStream = wPipe
streams.ErrorStream = wErrPipe
streams.AttachOutput = true
streams.AttachError = true

stdout := []string{}
go func() {
scanner := bufio.NewScanner(rPipe)
for scanner.Scan() {
stdout = append(stdout, scanner.Text())
}
}()
stderr := []string{}
go func() {
scanner := bufio.NewScanner(rErrPipe)
for scanner.Scan() {
stderr = append(stderr, scanner.Text())
}
}()

cmd := append([]string{"ps"}, args...)
config := new(ExecConfig)
config.Command = cmd
ec, err := c.Exec(config, streams, nil)
psPath, err := exec.LookPath("ps")
if err != nil {
return nil, err
} else if ec != 0 {
return nil, fmt.Errorf("runtime failed with exit status: %d and output: %s", ec, strings.Join(stderr, " "))
}
psFD, err := unix.Open(psPath, unix.O_PATH, 0)
if err != nil {
return nil, err
}
defer unix.Close(psFD)
logrus.Debugf("Trying to execute %q from the host in the container", psPath)
psPath = fmt.Sprintf("/proc/self/fd/%d", psFD)

if logrus.GetLevel() >= logrus.DebugLevel {
// If we're running in debug mode or higher, we might want to have a
// look at stderr which includes debug logs from conmon.
for _, log := range stderr {
logrus.Debugf("%s", log)
args := append([]string{psPath}, psArgs...)

// Now get all shared libs from ps(1), if this fails it is likely a static
// binary so no further actin required.
cmd := exec.Command("ldd", psPath)
output, err := cmd.Output()
if err == nil {
logrus.Debug("ps is dynamically linked, open linker and shared libraries for it")
var preload []string
var linkerPath string
for _, line := range strings.Split(string(output), "\n") {
fields := strings.Fields(line)
if len(fields) > 3 {
// open the shared lib on the host as it will most likely not be in the container
logrus.Debugf("Open shared library for ps: %s", fields[2])
fd, err := unix.Open(fields[2], unix.O_PATH, 0)
if err == nil {
defer unix.Close(fd)
preload = append(preload, fmt.Sprintf("/proc/self/fd/%d", fd))
}
} else if len(fields) == 2 {
if path.IsAbs(fields[0]) {
// this should be the dynamic linker
logrus.Debugf("Using linker for ps: %s", fields[0])
linkFD, err := unix.Open(fields[0], unix.O_PATH|unix.O_CLOEXEC, 0)
if err != nil {
return nil, err
}
defer unix.Close(linkFD)
linkerPath = fmt.Sprintf("/proc/self/fd/%d", linkFD)
}
}
}
// Ok, set linker args. First overwrite argv[0] because busybox for example needs it to know
// which program to execute as everything is in one binary and they need to proper name.
// Second now preload all linked shared libs. This is to prevent the executable from loading
// any libs in the container and thus very likely failing.
args = append([]string{linkerPath, "--argv0", "ps", "--preload", strings.Join(preload, " ")}, args...)
}

return stdout, nil
pid := c.state.PID
errChan := make(chan error)
go func() {
defer close(errChan)

// DO NOT UNLOCK THIS THREAD!!!
// We are joining a different pid and mount ns, go must destroy the
// thread when we are done and not reuse it.
runtime.LockOSThread()

// join the mount namespace of pid
mntFD, err := os.Open(fmt.Sprintf("/proc/%d/ns/mnt", pid))
if err != nil {
errChan <- err
return
}
defer mntFD.Close()

// join the pid namespace of pid
pidFD, err := os.Open(fmt.Sprintf("/proc/%d/ns/pid", pid))
if err != nil {
errChan <- err
return
}
defer pidFD.Close()

// create a new mountns on the current thread
if err = unix.Unshare(unix.CLONE_NEWNS); err != nil {
errChan <- fmt.Errorf("unshare NEWNS: %w", err)
return
}
if err := unix.Setns(int(mntFD.Fd()), unix.CLONE_NEWNS); err != nil {
errChan <- fmt.Errorf("setns NEWNS: %w", err)
return
}

if err := unix.Setns(int(pidFD.Fd()), unix.CLONE_NEWPID); err != nil {
errChan <- fmt.Errorf("setns NEWPID: %w", err)
return
}

logrus.Debugf("Executing ps in the containers mnt+pid namespace, final command: %v", args)
var errBuf bytes.Buffer
path := args[0]
args[0] = "ps"
cmd := exec.Cmd{
Path: path,
Args: args,
Stdout: wPipe,
Stderr: &errBuf,
}

err = cmd.Run()
if err != nil {
exitError := &exec.ExitError{}
if errors.As(err, &exitError) && errBuf.Len() > 0 {
// when error printed on stderr include it in error
err = fmt.Errorf("ps failed with exit code %d: %s", exitError.ExitCode(), errBuf.String())
} else {
err = fmt.Errorf("could not execute ps in the container: %w", err)
}
}
errChan <- err
}()

// the channel blocks and waits for command completion
err = <-errChan
return stdout, err
}
3 changes: 1 addition & 2 deletions pkg/api/server/register_containers.go
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,7 @@ func (s *APIServer) registerContainersHandlers(r *mux.Router) error {
// name: ps_args
// type: string
// default: -ef
// description: arguments to pass to ps such as aux. Requires ps(1) to be installed in the container if no ps(1) compatible AIX descriptors are used.
// description: arguments to pass to ps such as aux.
// produces:
// - application/json
// responses:
Expand Down Expand Up @@ -1177,7 +1177,6 @@ func (s *APIServer) registerContainersHandlers(r *mux.Router) error {
// default:
// description: |
// arguments to pass to ps such as aux.
// Requires ps(1) to be installed in the container if no ps(1) compatible AIX descriptors are used.
// produces:
// - application/json
// responses:
Expand Down
1 change: 0 additions & 1 deletion pkg/api/server/register_pods.go
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,6 @@ func (s *APIServer) registerPodsHandlers(r *mux.Router) error {
// default:
// description: |
// arguments to pass to ps such as aux.
// Requires ps(1) to be installed in the container if no ps(1) compatible AIX descriptors are used.
// responses:
// 200:
// $ref: "#/responses/podTopResponse"
Expand Down
4 changes: 2 additions & 2 deletions test/apiv2/20-containers.at
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,9 @@ fi
CTRNAME=test123
podman run --name $CTRNAME -d $IMAGE top
t GET libpod/containers/$CTRNAME/top?ps_args=--invalid 500 \
.cause~".*unrecognized option.*"
.cause~".*unknown gnu long option.*"
t GET containers/$CTRNAME/top?ps_args=--invalid 500 \
.cause~".*unrecognized option.*"
.cause~".*unknown gnu long option.*"

podman rm -f $CTRNAME

Expand Down
11 changes: 9 additions & 2 deletions test/e2e/top_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,17 @@ var _ = Describe("Podman top", func() {
})

It("podman top with ps(1) options", func() {
session := podmanTest.Podman([]string{"run", "-d", ALPINE, "top", "-d", "2"})
session := podmanTest.Podman([]string{"run", "-d", fedoraMinimal, "sleep", "inf"})
session.WaitWithDefaultTimeout()
Expect(session).Should(Exit(0))

// Extra check: Make sure the container image does not contain ps(1)
// podman top must work without that
exec := podmanTest.Podman([]string{"exec", session.OutputToString(), "ps"})
exec.WaitWithDefaultTimeout()
Expect(exec).Should(Exit(127))
Expect(exec.ErrorToString()).Should(ContainSubstring("OCI runtime attempted to invoke a command that was not found"))

result := podmanTest.Podman([]string{"top", session.OutputToString(), "aux"})
result.WaitWithDefaultTimeout()
Expect(result).Should(Exit(0))
Expand All @@ -100,7 +107,7 @@ var _ = Describe("Podman top", func() {
result = podmanTest.Podman([]string{"top", session.OutputToString(), "ax -o args"})
result.WaitWithDefaultTimeout()
Expect(result).Should(Exit(0))
Expect(result.OutputToStringArray()).To(Equal([]string{"COMMAND", "top -d 2"}))
Expect(result.OutputToStringArray()).To(Equal([]string{"COMMAND", "sleep inf"}))
})

It("podman top with comma-separated options", func() {
Expand Down

0 comments on commit 9b4e218

Please sign in to comment.