Skip to content
This repository has been archived by the owner on Oct 13, 2023. It is now read-only.

libcontainerd: fix reaper goroutine position #53

Merged
merged 1 commit into from Jun 10, 2017

Conversation

seemethere
Copy link
Contributor

Cherry-picked from: moby/moby#33419

It has observed defunct containerd processes accumulating over
time while dockerd was permanently failing to restart containerd.
Due to a bug in the runContainerdDaemon() function, dockerd does not clean up
its child process if containerd already exits very soon after the (re)start.

The reproducer and analysis below comes from docker 1.12.x but bug
still applies on latest master.

  • from libcontainerd/remote_linux.go:

    329 func (r *remote) runContainerdDaemon() error {
    :
    : // start the containerd child process
    :
    403 if err := cmd.Start(); err != nil {
    404 return err
    405 }
    :
    : // If containerd exits very soon after (re)start, it is
    possible
    : // that containerd is already in defunct state at the time
    when
    : // dockerd gets here. The setOOMScore() function tries to
    write
    : // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this
    fails
    : // with errno EINVAL because containerd is defunct. Please see
    : // snippets of kernel source code and further explanation
    below.
    :
    407 if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil
    {
    408 utils.KillProcess(cmd.Process.Pid)
    :
    : // Due to the error from write() we return here. As
    the
    : // goroutine that would clean up the child has not
    been
    : // started yet, containerd remains in the defunct
    state
    : // and never gets reaped.
    :
    409 return err
    410 }
    :
    417 go func() {
    418 cmd.Wait()
    419 close(r.daemonWaitCh)
    420 }() // Reap our child when needed
    :
    423 }

This is the kernel function that gets invoked when dockerd tries to
write
to /proc/PID_OF_CONTAINERD/oom_score_adj.

  • from fs/proc/base.c:

1197 static ssize_t oom_score_adj_write(struct file *file, ...
1198 size_t count, loff_t
*ppos)
1199 {
:
1223 task = get_proc_task(file_inode(file));
:
: // The defunct containerd process does not have a virtual
: // address space anymore, i.e. task->mm is NULL. Thus the
: // following code returns errno EINVAL to dockerd.
:
1230 if (!task->mm) {
1231 err = -EINVAL;
1232 goto err_task_lock;
1233 }
:
1253 err_task_lock:
:
1257 return err < 0 ? err : count;
1258 }

The purpose of the following program is to demonstrate the behavior of
the oom_score_adj_write() function in connection with a defunct process.

$ cat defunct_test.c

#include <unistd.h>

main()
{
pid_t pid = fork();

if (pid == 0)
    // child
    _exit(0);

// parent
pause();

}

$ make defunct_test
cc defunct_test.c -o defunct_test

$ ./defunct_test &
[1] 3142

$ ps -f | grep defunct_test | grep -v grep
root 3142 2956 0 13:04 pts/0 00:00:00 ./defunct_test
root 3143 3142 0 13:04 pts/0 00:00:00 [defunct_test]

$ echo "ps 3143" | crash -s
PID PPID CPU TASK ST %MEM VSZ RSS COMM
3143 3142 2 ffff880035def300 ZO 0.0 0 0
defunct_test

$ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s
$1 = (struct mm_struct *) 0x0
^^^ task->mm is NULL

$ cat /proc/3143/oom_score_adj
0

$ echo 0 > /proc/3143/oom_score_adj
-bash: echo: write error: Invalid argument"


This patch fixes the above issue by making sure we start the reaper
goroutine as soon as possible.

Signed-off-by: Antonio Murdaca runcom@redhat.com

(cherry picked from commit 27087ea)

Signed-off-by: Eli Uriegas eli.uriegas@docker.com

Signed-off-by: Eli Uriegas eli.uriegas@docker.com

It has observed defunct containerd processes accumulating over
time while dockerd was permanently failing to restart containerd.
Due to a bug in the runContainerdDaemon() function, dockerd does not clean up
its child process if containerd already exits very soon after the (re)start.

The reproducer and analysis below comes from docker 1.12.x but bug
still applies on latest master.

- from libcontainerd/remote_linux.go:

  329 func (r *remote) runContainerdDaemon() error {
   :
   :      // start the containerd child process
   :
  403     if err := cmd.Start(); err != nil {
  404             return err
  405     }
   :
   :      // If containerd exits very soon after (re)start, it is
possible
   :      // that containerd is already in defunct state at the time
when
   :      // dockerd gets here. The setOOMScore() function tries to
write
   :      // to /proc/PID_OF_CONTAINERD/oom_score_adj. However, this
fails
   :      // with errno EINVAL because containerd is defunct. Please see
   :      // snippets of kernel source code and further explanation
below.
   :
  407     if err := setOOMScore(cmd.Process.Pid, r.oomScore); err != nil
{
  408             utils.KillProcess(cmd.Process.Pid)
   :
   :              // Due to the error from write() we return here. As
the
   :              // goroutine that would clean up the child has not
been
   :              // started yet, containerd remains in the defunct
state
   :              // and never gets reaped.
   :
  409             return err
  410     }
   :
  417     go func() {
  418             cmd.Wait()
  419             close(r.daemonWaitCh)
  420     }() // Reap our child when needed
   :
  423 }

This is the kernel function that gets invoked when dockerd tries to
write
to /proc/PID_OF_CONTAINERD/oom_score_adj.

- from fs/proc/base.c:

 1197 static ssize_t oom_score_adj_write(struct file *file, ...
 1198                                         size_t count, loff_t
*ppos)
 1199 {
   :
 1223         task = get_proc_task(file_inode(file));
   :
   :          // The defunct containerd process does not have a virtual
   :          // address space anymore, i.e. task->mm is NULL. Thus the
   :          // following code returns errno EINVAL to dockerd.
   :
 1230         if (!task->mm) {
 1231                 err = -EINVAL;
 1232                 goto err_task_lock;
 1233         }
   :
 1253 err_task_lock:
   :
 1257         return err < 0 ? err : count;
 1258 }

The purpose of the following program is to demonstrate the behavior of
the oom_score_adj_write() function in connection with a defunct process.

$ cat defunct_test.c

\#include <unistd.h>

main()
{
    pid_t pid = fork();

    if (pid == 0)
        // child
        _exit(0);

    // parent
    pause();
}

$ make defunct_test
cc     defunct_test.c   -o defunct_test

$ ./defunct_test &
[1] 3142

$ ps -f | grep defunct_test | grep -v grep
root      3142  2956  0 13:04 pts/0    00:00:00 ./defunct_test
root      3143  3142  0 13:04 pts/0    00:00:00 [defunct_test] <defunct>

$ echo "ps 3143" | crash -s
  PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
  3143   3142   2  ffff880035def300  ZO   0.0       0      0
defunct_test

$ echo "px ((struct task_struct *)0xffff880035def300)->mm" | crash -s
$1 = (struct mm_struct *) 0x0
                          ^^^ task->mm is NULL

$ cat /proc/3143/oom_score_adj
0

$ echo 0 > /proc/3143/oom_score_adj
-bash: echo: write error: Invalid argument"

---

This patch fixes the above issue by making sure we start the reaper
goroutine as soon as possible.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>

(cherry picked from commit 27087ea)

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
@andrewhsu andrewhsu mentioned this pull request Jun 9, 2017
40 tasks
@andrewhsu
Copy link
Contributor

LGTM

@andrewhsu andrewhsu merged commit 897b692 into docker:17.06 Jun 10, 2017
@andrewhsu andrewhsu modified the milestone: 17.06.0 Jul 12, 2017
@seemethere seemethere deleted the cherry_pick_33419 branch August 2, 2017 21:34
andrewhsu added a commit that referenced this pull request Nov 29, 2017
Bump Go to 1.8.4
Upstream-commit: 18b3997
Component: packaging
docker-jenkins pushed a commit that referenced this pull request Sep 17, 2018
…trol

[18.09] backport always hornor client side to choose which builder to use with DOCKER_…
Upstream-commit: 5fb0a7ced7bcdb9e86e048fe3b24d0d4297d6155
Component: engine
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Jan 30, 2020
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020
libcontainerd: fix reaper goroutine position
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants