Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

journald restart crashes containerd #580

Closed
x1022as opened this issue Feb 28, 2017 · 2 comments
Closed

journald restart crashes containerd #580

x1022as opened this issue Feb 28, 2017 · 2 comments

Comments

@x1022as
Copy link

x1022as commented Feb 28, 2017

This is closely related to moby/moby#19728 : journald restart crashes Docker

daemon and containerd's stderr point to the same socket if their logs are managed by journald:

# ls /proc/31602//fd/2 -al
lrwx------ 1 root root 64 Feb 28 15:47 /proc/31602//fd/2 -> socket:[6992842]
# ls /proc/31609/fd/2 -al
lrwx------ 1 root root 64 Feb 28 15:50 /proc/31609/fd/2 -> socket:[6992842]

restart journald service, daemon will ignore SIGPIPE and some docker operation will work properly, while all docker logs are missing, because it's trying to write its log into a broken pipe. Containerd for now still get SIGPIPE, and may get killed. The wired thing is that docker did not start a new containerd.

# systemctl restart systemd-journald.service
# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
# ps -ef | grep docker
root      9486  4081  0 16:06 pts/2    00:00:00 grep --color=auto docker
root     31602     1  0 15:47 ?        00:00:00 /usr/bin/docker daemon -D --live-restore
root     31609 31602  0 15:47 ?        00:00:00 docker-containerd -l /var/run/docker/libcontainerd/docker-containerd.sock --runtime docker-runc --start-timeout 2m --state-dir /var/run/docker/libcontainerd/containerd --debug --metrics-interval=0
# docker run -tid busybox sleep 10000
afdccea8dfb5483ed082547c97293f33b652fc1cacdee23247642b56b2bcee24
# docker rm -f afdccea8dfb5483ed082547c97293f33b652fc1cacdee23247642b56b2bcee24

after journald restart, docker and containerd are still there and docker run works fine. but docker rm will block forever. ps in another bash, I get this:

# ps -ef | grep docker
root     11306  4081  0 16:09 pts/2    00:00:00 docker rm -f afdccea8dfb5483ed082547c97293f33b652fc1cacdee23247642b56b2bcee24
root     11376 19869  0 16:09 pts/1    00:00:00 grep --color=auto docker
root     31602     1  0 15:47 ?        00:00:00 /usr/bin/docker daemon -D --live-restore

containerd got killed, and never start again. As all logs are missing, it will be hard to debug this.

ps. docker version is 1.11.2 but have moby/moby#22460 : Ignore SIGPIPE events backported.


maybe containerd need to ignore SIGPIPE too. but missing all the docker logs is kind of severe problem.

not sure if there is a better way for docker and containerd to deal with these broken pipe errors?

ping @crosbymichael @LK4D4 @jwhonce @coolljt0725

@ys-hwang
Copy link

ys-hwang commented Apr 6, 2017

We've also got this issue.
Like @x1022as said above, when journald restarted, containerd died, and docker did not start a new containerd.
We hope that this issue will resolve asap. Thanks.

Our docker version is 17.03.0-ce.

@hqhq
Copy link
Contributor

hqhq commented May 31, 2017

Fixed by: #930

@hqhq hqhq closed this as completed May 31, 2017
liusdu pushed a commit to liusdu/moby that referenced this issue Oct 30, 2017
- bump containerd:
   - containerd: ignore SIGPIPE to fix containerd/containerd#580
- Feature: Support combining partial images into completed one (mr 504)
- Bugfix: Make accel name prefix "anon_[cli|img]_accel_" reserved (mr 499 fix DTS2017052504931)
- Bugfix: check accel input arguments (mr 493 fix DTS2017050408086)
- Bugfix: devmapper: remove broken device when start daemon (mr 494 fix DTS2017051611286)
- Bugfix: Adding support for docker max restart time (mr 507 fix DTS2017052704554)
- Bugfix: Fix race between sandbox.delete() and SetKey() (mr 497 fix DTS2017051700511)
- Bugfix: Typo:change contianer -> container (mr 510 fix DTS2017052704554)
- Backport: Moving the UDS file out of /var/lib/docker and into /run/ (mr 498)

Signed-off-by: Lei Jitang <leijitang@huawei.com>
liusdu pushed a commit to liusdu/moby that referenced this issue Oct 30, 2017
bump to v1.11.2.31

- bump containerd:
   - containerd: ignore SIGPIPE to fix containerd/containerd#580
- Feature: Support combining partial images into completed one (mr 504)
- Bugfix: Make accel name prefix "anon_[cli|img]_accel_" reserved (mr 499 fix DTS2017052504931)
- Bugfix: check accel input arguments (mr 493 fix DTS2017050408086)
- Bugfix: devmapper: remove broken device when start daemon (mr 494 fix DTS2017051611286)
- Bugfix: Adding support for docker max restart time (mr 507 fix DTS2017052704554)
- Bugfix: Fix race between sandbox.delete() and SetKey() (mr 497 fix DTS2017051700511)
- Bugfix: Typo:change contianer -> container (mr 510 fix DTS2017052704554)
- Backport: Moving the UDS file out of /var/lib/docker and into /run/ (mr 498)

Signed-off-by: Lei Jitang <leijitang@huawei.com>



See merge request docker/docker!515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants