Service origin-master-controllers crash when service systemd-journald reloads #40

jperville · 2017-01-03T16:54:19Z

Following the release of v 1.10.26 of this cookbook, I want to submit my last issue with running this cookbook in openshift_HA mode (with external etcd).

Some context: In my environment cookbook, at some point the systemd-journald configuration gets reloaded (I enable persisting journal to disk and setup some max sizes).

The bug: Reloading systemd-journald has the repeatable effect of crashing the origin-master-controllers service which won't come back until I run chef again or manually restart the service. I still don't know if this is openshift issue or this cookbook's issue (the origin-master-* systemd units are created by this cookbook).

Here is how to reproduce, step by step:

checkout https://github.com/PerfectMemory/origin-provision-bug-demo.git
vagrant up master

This boots a Vagrant VM with a working openshift3 1.3.1 master configured:

to use external etcd (HA mode)
to directly log to journald (important!)

You may need to install Vagrant, the latest chef-dk and the vagrant-berkshelf plugin to make it work.

Once the VM is provisioned, ssh into it for the rest of the reproduction steps.

vagrant ssh master

in the VM, check that the different openshift master services are working

[vagrant@master ~]$ sudo netstat -ntlp | egrep ':(8443|8444)'
tcp        0      0 0.0.0.0:8443            0.0.0.0:*               LISTEN      24445/openshift
tcp        0      0 0.0.0.0:8444            0.0.0.0:*               LISTEN      24468/openshift

in another vagrant ssh terminal, tail system messages

[vagrant@master ~]$ sudo journalctl -f

restart systemd-journald service

[vagrant@master ~]$ sudo systemctl restart systemd-journald.service

In the log we tailed in step 4, a message "Started Flush Journal to Persistent Storage" appears and just after that point the openshift some master services will be crashed.

observe crashed origin-master services

[vagrant@master ~]$ sudo netstat -ntlp | egrep ':(8443|8444)'
tcp        0      0 0.0.0.0:8443            0.0.0.0:*               LISTEN      24445/openshift

[vagrant@master ~]$ sudo service origin-master-controllers status
Redirecting to /bin/systemctl status  origin-master-controllers.service
● origin-master-controllers.service - Atomic OpenShift Master Controllers
   Loaded: loaded (/usr/lib/systemd/system/origin-master-controllers.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Tue 2017-01-03 16:05:13 UTC; 50s ago
     Docs: https://github.com/openshift/origin
 Main PID: 24468 (code=killed, signal=PIPE)

Jan 03 16:04:23 master origin-master-controllers[24468]: I0103 16:04:23.521639   24468 nodecontroller.go:609] NodeController is entering network segmentation mode.

The text was updated successfully, but these errors were encountered:

IshentRas · 2017-01-03T21:45:42Z

@jperville The issue was on our side...:-1:
The service file for master-controllers was outdated and did not include some conditions which do help against the issue you reported.
Check this one and let me know if it works better : https://github.com/IshentRas/cookbook-openshift3/tree/release/1.10.27

3d9e605#diff-dda3923836a9a27946faf8134cc3445d

jperville · 2017-01-03T21:48:19Z

Cheers !

jperville · 2017-01-03T21:51:58Z

I will test this when I return to work tomorrow morning. Thanks for the quick investigation and fix @IshentRas

IshentRas · 2017-01-03T21:53:31Z

Let me know if it does work and I'll merge, upload the code...

jperville · 2017-01-04T08:22:12Z

Hello @IshentRas, I got success with your fix. After restarting systemd-journald, the origin-master daemons come back as expected.

IshentRas · 2017-01-04T08:57:43Z

Perfect, will be merged soon 👍
Thanks for pointing it out.

IshentRas · 2017-01-04T09:35:20Z

https://github.com/IshentRas/cookbook-openshift3/releases/tag/v1.10.27

jperville mentioned this issue Jan 3, 2017

Openshift master crash when logging to journald and service systemd-journald is restarted openshift/origin#12377

Closed

IshentRas self-assigned this Jan 3, 2017

IshentRas added the bug label Jan 3, 2017

IshentRas closed this as completed Jan 4, 2017

IshentRas reopened this Jan 4, 2017

IshentRas closed this as completed Jan 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service origin-master-controllers crash when service systemd-journald reloads #40

Service origin-master-controllers crash when service systemd-journald reloads #40

jperville commented Jan 3, 2017 •

edited

IshentRas commented Jan 3, 2017 •

edited

jperville commented Jan 3, 2017

jperville commented Jan 3, 2017

IshentRas commented Jan 3, 2017

jperville commented Jan 4, 2017

IshentRas commented Jan 4, 2017 •

edited

IshentRas commented Jan 4, 2017

Service origin-master-controllers crash when service systemd-journald reloads #40

Service origin-master-controllers crash when service systemd-journald reloads #40

Comments

jperville commented Jan 3, 2017 • edited

IshentRas commented Jan 3, 2017 • edited

jperville commented Jan 3, 2017

jperville commented Jan 3, 2017

IshentRas commented Jan 3, 2017

jperville commented Jan 4, 2017

IshentRas commented Jan 4, 2017 • edited

IshentRas commented Jan 4, 2017

jperville commented Jan 3, 2017 •

edited

IshentRas commented Jan 3, 2017 •

edited

IshentRas commented Jan 4, 2017 •

edited