-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Steps to reproduce
- use bosh to deploy a postgres-release v23 VM
- reboot the VM on IaaS level (e.g.
nova reboot <VM_CID>
if on openstack)
Observed behaviour
- postgres consistently keeps failing to start because the pid file directory
/var/vcap/sys/run/postgres/
does not exist.
from postgres_ctl.err.log
:
[2018-01-05 12:53:31+0000] + main start
[2018-01-05 12:53:31+0000] + local action
[2018-01-05 12:53:31+0000] + action=start
[2018-01-05 12:53:31+0000] + source /var/vcap/jobs/postgres/bin/pgconfig.sh
[2018-01-05 12:53:31+0000] ++ current_version=9.6.4
[2018-01-05 12:53:31+0000] ++ pgversion_current=postgres-9.6.4
[2018-01-05 12:53:31+0000] ++ pgversion_old=postgres-9.6.3
[2018-01-05 12:53:31+0000] ++ pgversion_older=postgres-9.6.2
[2018-01-05 12:53:31+0000] ++ JOB_DIR=/var/vcap/jobs/postgres
[2018-01-05 12:53:31+0000] ++ PACKAGE_DIR=/var/vcap/packages/postgres-9.6.4
[2018-01-05 12:53:31+0000] ++ STORE_DIR=/var/vcap/store
[2018-01-05 12:53:31+0000] ++ PG_STORE_DIR=/var/vcap/store/postgres
[2018-01-05 12:53:31+0000] ++ DATA_DIR=/var/vcap/store/postgres/postgres-9.6.4
[2018-01-05 12:53:31+0000] ++ DATA_DIR_PREVIOUS=/var/vcap/store/postgres/postgres-previous
[2018-01-05 12:53:31+0000] ++ DATA_DIR_OLD=/var/vcap/store/postgres/postgres-unknown
[2018-01-05 12:53:31+0000] ++ PACKAGE_DIR_OLD=/var/vcap/packages/postgres-unknown
[2018-01-05 12:53:31+0000] ++ POSTGRES_UPGRADE_LOCK=/var/vcap/store/postgres/POSTGRES_UPGRADE_LOCK
[2018-01-05 12:53:31+0000] ++ pgversion_upgrade_from=postgres-unknown
[2018-01-05 12:53:31+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.3 -a -f /var/vcap/store/postgres/postgres-9.6.3/postgresql.conf ']'
[2018-01-05 12:53:31+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.2 -a -f /var/vcap/store/postgres/postgres-9.6.2/postgresql.conf ']'
[2018-01-05 12:53:31+0000] ++ RUN_DIR=/var/vcap/sys/run/postgres
[2018-01-05 12:53:31+0000] ++ LOG_DIR=/var/vcap/sys/log/postgres
[2018-01-05 12:53:31+0000] ++ PIDFILE=/var/vcap/sys/run/postgres/postgres.pid
[2018-01-05 12:53:31+0000] ++ CONTROL_JOB_PIDFILE=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:53:31+0000] ++ HOST=0.0.0.0
[2018-01-05 12:53:31+0000] ++ PORT=5524
[2018-01-05 12:53:31+0000] ++ [[ -n '' ]]
[2018-01-05 12:53:31+0000] ++ LD_LIBRARY_PATH=/var/vcap/packages/postgres-9.6.4/lib
[2018-01-05 12:53:31+0000] + set +u
[2018-01-05 12:53:31+0000] + source /var/vcap/packages/postgres-common/utils.sh
[2018-01-05 12:53:31+0000] + set -u
[2018-01-05 12:53:31+0000] + case "${action}" in
[2018-01-05 12:53:31+0000] + pid_guard /var/vcap/sys/run/postgres/postgresctl.pid 'Postgres control job' false
[2018-01-05 12:53:31+0000] + local pidfile=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:53:31+0000] + local 'name=Postgres control job'
[2018-01-05 12:53:31+0000] + local logall=false
[2018-01-05 12:53:31+0000] + '[' false == true ']'
[2018-01-05 12:53:31+0000] + '[' -f /var/vcap/sys/run/postgres/postgresctl.pid ']'
[2018-01-05 12:53:31+0000] + echo 1315
[2018-01-05 12:53:31+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory
[2018-01-05 12:55:11+0000] + main start
[2018-01-05 12:55:11+0000] + local action
[2018-01-05 12:55:11+0000] + action=start
[2018-01-05 12:55:11+0000] + source /var/vcap/jobs/postgres/bin/pgconfig.sh
[2018-01-05 12:55:11+0000] ++ current_version=9.6.4
[2018-01-05 12:55:11+0000] ++ pgversion_current=postgres-9.6.4
[2018-01-05 12:55:11+0000] ++ pgversion_old=postgres-9.6.3
[2018-01-05 12:55:11+0000] ++ pgversion_older=postgres-9.6.2
[2018-01-05 12:55:11+0000] ++ JOB_DIR=/var/vcap/jobs/postgres
[2018-01-05 12:55:11+0000] ++ PACKAGE_DIR=/var/vcap/packages/postgres-9.6.4
[2018-01-05 12:55:11+0000] ++ STORE_DIR=/var/vcap/store
[2018-01-05 12:55:11+0000] ++ PG_STORE_DIR=/var/vcap/store/postgres
[2018-01-05 12:55:11+0000] ++ DATA_DIR=/var/vcap/store/postgres/postgres-9.6.4
[2018-01-05 12:55:11+0000] ++ DATA_DIR_PREVIOUS=/var/vcap/store/postgres/postgres-previous
[2018-01-05 12:55:11+0000] ++ DATA_DIR_OLD=/var/vcap/store/postgres/postgres-unknown
[2018-01-05 12:55:11+0000] ++ PACKAGE_DIR_OLD=/var/vcap/packages/postgres-unknown
[2018-01-05 12:55:11+0000] ++ POSTGRES_UPGRADE_LOCK=/var/vcap/store/postgres/POSTGRES_UPGRADE_LOCK
[2018-01-05 12:55:11+0000] ++ pgversion_upgrade_from=postgres-unknown
[2018-01-05 12:55:11+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.3 -a -f /var/vcap/store/postgres/postgres-9.6.3/postgresql.conf ']'
[2018-01-05 12:55:11+0000] ++ '[' -d /var/vcap/store/postgres/postgres-9.6.2 -a -f /var/vcap/store/postgres/postgres-9.6.2/postgresql.conf ']'
[2018-01-05 12:55:11+0000] ++ RUN_DIR=/var/vcap/sys/run/postgres
[2018-01-05 12:55:11+0000] ++ LOG_DIR=/var/vcap/sys/log/postgres
[2018-01-05 12:55:11+0000] ++ PIDFILE=/var/vcap/sys/run/postgres/postgres.pid
[2018-01-05 12:55:11+0000] ++ CONTROL_JOB_PIDFILE=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:55:11+0000] ++ HOST=0.0.0.0
[2018-01-05 12:55:11+0000] ++ PORT=5524
[2018-01-05 12:55:11+0000] ++ [[ -n '' ]]
[2018-01-05 12:55:11+0000] ++ LD_LIBRARY_PATH=/var/vcap/packages/postgres-9.6.4/lib
[2018-01-05 12:55:11+0000] + set +u
[2018-01-05 12:55:11+0000] + source /var/vcap/packages/postgres-common/utils.sh
[2018-01-05 12:55:11+0000] + set -u
[2018-01-05 12:55:11+0000] + case "${action}" in
[2018-01-05 12:55:11+0000] + pid_guard /var/vcap/sys/run/postgres/postgresctl.pid 'Postgres control job' false
[2018-01-05 12:55:11+0000] + local pidfile=/var/vcap/sys/run/postgres/postgresctl.pid
[2018-01-05 12:55:11+0000] + local 'name=Postgres control job'
[2018-01-05 12:55:11+0000] + local logall=false
[2018-01-05 12:55:11+0000] + '[' false == true ']'
[2018-01-05 12:55:11+0000] + '[' -f /var/vcap/sys/run/postgres/postgresctl.pid ']'
[2018-01-05 12:55:11+0000] + echo 1473
[2018-01-05 12:55:11+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory
the important line from log above is
[2018-01-05 12:53:31+0000] /var/vcap/jobs/postgres/bin/postgres_ctl: line 18: /var/vcap/sys/run/postgres/postgresctl.pid: No such file or directory
Expected behaviour
postgres is able to survive a VM reboot.
Workaround
bosh ssh into the postgres VM and manually execute
mkdir -p /var/vcap/sys/run/postgres/
chown -R vcap:vcap /var/vcap/sys/run/
monit start postgres
Possible fix
there are similar bugs already fixed in other bosh releases, see e.g. cloudfoundry/capi-release#25
I think the problem is that the control scripts rely on prestart to create $RUN_DIR
:
postgres-release/jobs/postgres/templates/pre-start.sh.erb
Lines 49 to 50 in 6e31baf
mkdir -p "${RUN_DIR}" | |
chown -R vcap:vcap "${RUN_DIR}" |
and in case of external VM reboot (i.e. outside of any bosh operation), prestart is not executed.
/var/vcap/data/sys/run/
is a temp file system
$ mount | grep sys/run
tmpfs on /var/vcap/data/sys/run type tmpfs (rw,size=1m)
and thus is empty after VM reboot (which is fine).
although there is an mkdir -p $RUN_DIR
in
mkdir -p "${RUN_DIR}" |
, this is too late in case of VM reboot because of this line executed earlier:
echo $$ > ${CONTROL_JOB_PIDFILE} |
which assumes the directory $RUN_DIR
already exists.
Initialization of $RUN_DIR
should probably happen as the very first thing when start) is invoked, see the fix for a similar problem in cloud_controller_ng
In our scenario (we run CCDB on postgres-release) this bug has caused an extended Cloudfoundry downtime triggered by forced VM reboots which were required to deploy urgent security fixes on the IaaS level.