Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Service origin-master-controllers crash when service systemd-journald reloads #40
Following the release of v 1.10.26 of this cookbook, I want to submit my last issue with running this cookbook in openshift_HA mode (with external etcd).
Some context: In my environment cookbook, at some point the systemd-journald configuration gets reloaded (I enable persisting journal to disk and setup some max sizes).
The bug: Reloading systemd-journald has the repeatable effect of crashing the origin-master-controllers service which won't come back until I run chef again or manually restart the service. I still don't know if this is openshift issue or this cookbook's issue (the origin-master-* systemd units are created by this cookbook).
Here is how to reproduce, step by step:
This boots a Vagrant VM with a working openshift3 1.3.1 master configured:
You may need to install Vagrant, the latest chef-dk and the vagrant-berkshelf plugin to make it work.
Once the VM is provisioned, ssh into it for the rest of the reproduction steps.
vagrant ssh master
[vagrant@master ~]$ sudo netstat -ntlp | egrep ':(8443|8444)' tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 24445/openshift tcp 0 0 0.0.0.0:8444 0.0.0.0:* LISTEN 24468/openshift
[vagrant@master ~]$ sudo systemctl restart systemd-journald.service
In the log we tailed in step 4, a message "Started Flush Journal to Persistent Storage" appears and just after that point the openshift some master services will be crashed.
[vagrant@master ~]$ sudo netstat -ntlp | egrep ':(8443|8444)' tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 24445/openshift [vagrant@master ~]$ sudo service origin-master-controllers status Redirecting to /bin/systemctl status origin-master-controllers.service ● origin-master-controllers.service - Atomic OpenShift Master Controllers Loaded: loaded (/usr/lib/systemd/system/origin-master-controllers.service; enabled; vendor preset: disabled) Active: inactive (dead) since Tue 2017-01-03 16:05:13 UTC; 50s ago Docs: https://github.com/openshift/origin Main PID: 24468 (code=killed, signal=PIPE) Jan 03 16:04:23 master origin-master-controllers: I0103 16:04:23.521639 24468 nodecontroller.go:609] NodeController is entering network segmentation mode.
referenced this issue
Jan 3, 2017
@jperville The issue was on our side...