New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-tmpfiles-setup.service is not running after reboot #1581

Closed
jwaldrip opened this Issue Sep 22, 2016 · 19 comments

Comments

@jwaldrip

jwaldrip commented Sep 22, 2016

Issue Report

Bug

systemd-tmpfiles-setup.service is not running after reboot, thus causing fleet to not start with the following error.

Sep 22 16:47:46 bastion.c.commercialtribe-staging.internal systemd[1]: Started fleet daemon.
Sep 22 16:47:46 bastion.c.commercialtribe-staging.internal fleetd[1627]: INFO fleetd.go:64: Starting fleetd version 0.11.7
Sep 22 16:47:46 bastion.c.commercialtribe-staging.internal fleetd[1627]: INFO fleetd.go:168: No provided or default config file found - proceeding without
Sep 22 16:47:46 bastion.c.commercialtribe-staging.internal fleetd[1627]: FATAL fleetd.go:90: Failed creating Server: mkdir /run/fleet: permission denied
Sep 22 16:47:46 bastion.c.commercialtribe-staging.internal systemd[1]: fleet.service: Main process exited, code=exited, status=1/FAILURE
Sep 22 16:47:46 bastion.c.commercialtribe-staging.internal systemd[1]: fleet.service: Unit entered failed state.
Sep 22 16:47:46 bastion.c.commercialtribe-staging.internal systemd[1]: fleet.service: Failed with result 'exit-code'.

CoreOS Version

$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1122.2.0
VERSION_ID=1122.2.0
BUILD_ID=2016-09-06-1449
PRETTY_NAME="CoreOS 1122.2.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

Google Cloud Platform

Expected Behavior

  1. Reboot machine from cloud console.
  2. systemd-tmpfiles-setup.service runs.
  3. sudo fleetctl list-machines works as expected.

Actual Behavior

  1. Reboot machine from cloud console.
  2. systemd-tmpfiles-setup.service does not run as indicated by sudo systemctl status systemd-tmpfiles-setup.service
● systemd-tmpfiles-setup.service - Create Volatile Files and Directories
   Loaded: loaded (/usr/lib64/systemd/system/systemd-tmpfiles-setup.service; static; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:tmpfiles.d(5)
           man:systemd-tmpfiles(8)
  1. sudo fleetctl list-machines does not work as the fleet.service has not started due to permissions.

Reproduction Steps

  1. launch a new instance on google cloud with fleet and etcd cluster of 3 or more machines.
  2. wait for the instance to come up and run sudo fleetctl list-units (works).
  3. from the google console, stop & start OR reset the host.
  4. wait for the instance to come up and run sudo fleetctl list-units (no longer works).
  5. confirm that fleet was not started with sudo journalctl -fu fleet or sudo systemctl status fleet

Other Information

Sometimes on reboot, things will come back up. But this is maybe 1/5 times. Manually running sudo systemctl start systemd-tmpfiles-setup.service resolves the issue on the host. But this is not sustainable.

Note: I tend to ssh into the machine ASAP, maybe my session is causing an issue if boot is not complete?

@crawford

This comment has been minimized.

Member

crawford commented Sep 26, 2016

That's odd. That service should always run on boot. Can you dump out the service contents? Here is what I see on a fresh instance (booted on AWS):

$ systemctl cat systemd-tmpfiles-setup
# /usr/lib64/systemd/system/systemd-tmpfiles-setup.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Create Volatile Files and Directories
Documentation=man:tmpfiles.d(5) man:systemd-tmpfiles(8)
DefaultDependencies=no
Conflicts=shutdown.target
After=local-fs.target systemd-sysusers.service
Before=sysinit.target shutdown.target
RefuseManualStop=yes

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev
@f0

This comment has been minimized.

f0 commented Nov 21, 2016

Hi,
@crawford
i see the same on 1122.3.0

Here is the error from my system(but does not happen each time)

Nov 18 14:37:47 xxx.example.net systemd[1]: systemd-tmpfiles-setup.service: Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with multipathd.socket/start

regards f0

@crawford

This comment has been minimized.

Member

crawford commented Nov 21, 2016

Here is the error from my system(but does not happen each time)

That is really weird. Are you using coreos-cloudinit? If so, can you share your config?

@f0

This comment has been minimized.

f0 commented Nov 22, 2016

@crawford yes we use coreos-cloudinit, i need to clean up the config a bit , and then i can share

@f0

This comment has been minimized.

f0 commented Nov 22, 2016

@crawford here is the file (it is a cleaned up version, so maybe the syntax is broken)
cloud-config.yaml.txt

@crawford

This comment has been minimized.

Member

crawford commented Nov 22, 2016

The reason this happens sporadically is because you are using coreos-cloudinit to start multipathd.service. Can you attach more of the logs around that ordering cycle? systemd should log all of the services involved in the cycle. I'm curious if this issue is entirely due to coreos-cloudinit or if it's a problem with one of the service definitions.

@f0

This comment has been minimized.

f0 commented Nov 22, 2016

@crawford ok i check what i can provide...
if i remove the start and only leave the enable option, it should be fixed right?

@crawford

This comment has been minimized.

Member

crawford commented Nov 22, 2016

I don't recommend using runtime: false or enable: true with coreos-cloudinit. It ends up severely complicating the boot process and makes it such that the first boot is different than subsequent boots. In this case, multipathd wouldn't be started on the first boot but it would start on subsequent boots. Ignition solves this by leveraging systemd from the start (vs trying to be its own init system, like coreos-cloudinit). You'll get reproducible boots, so they might always succeed or they might always fail. We'll know whether it'll work or not from the dependency loop. I suspect it's caused by coreos-cloudinit, which means Ignition will work, but I'm not certain.

@f0

This comment has been minimized.

f0 commented Nov 22, 2016

@crawford ok here is the svg from systemd-analyze
systemd-tree

@crawford

This comment has been minimized.

Member

crawford commented Nov 22, 2016

Do you have the logs? You provided a snippet of the full output earlier. That analyze graph doesn't show dependencies.

@f0

This comment has been minimized.

f0 commented Nov 22, 2016

@crawford i have only this in the logs for systemd-tmpfiles-setup.service

-- Logs begin at Sun 2016-11-13 23:21:56 CET, end at Tue 2016-11-22 20:39:27 CET. --
Nov 18 14:37:47 xxx.example.net systemd[1]: systemd-tmpfiles-setup.service: Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with multipathd.s
Nov 21 13:16:58 xxx.example.net systemd[1]: Starting Create Volatile Files and Directories...
Nov 21 13:16:58 xxx.example.net systemd-tmpfiles[34968]: [/usr/lib64/tmpfiles.d/trousers.conf:1] Duplicate line for path "/var/lib/tpm", ignoring.
Nov 21 13:16:58 xxx.example.net systemd-tmpfiles[34968]: [/usr/lib64/tmpfiles.d/var.conf:20] Duplicate line for path "/var/lib", ignoring.
Nov 21 13:16:59 xxx.example.net systemd[1]: Started Create Volatile Files and Directories.
@f0

This comment has been minimized.

f0 commented Nov 23, 2016

@crawford any hints where i can find more logs? in the mantime i have modified the cloud-init to only enable the systemd services , and i plan to migrate to ignition. I only need a solution for the runtime changes.....

@f0

This comment has been minimized.

f0 commented Nov 24, 2016

@crawford hit again by this error, this time with sysvinit.target

# systemctl status systemd-tmpfiles-setup.service
● systemd-tmpfiles-setup.service - Create Volatile Files and Directories
   Loaded: loaded (/usr/lib64/systemd/system/systemd-tmpfiles-setup.service; static; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:tmpfiles.d(5)
           man:systemd-tmpfiles(8)

Nov 24 06:35:18 example.net systemd[1]: systemd-tmpfiles-setup.service: Job systemd-tmpfiles-setup.service/start deleted to break ordering cycle starting with sysinit.target/start
@f0

This comment has been minimized.

f0 commented Nov 24, 2016

@crawford maybe this is a multipath dependency problem i found this
https://www.redhat.com/archives/dm-devel/2015-March/msg00101.html

@f0

This comment has been minimized.

f0 commented Nov 25, 2016

@crawford when i disable the multipathd, the problem goes away, but then i have no datastore.....

@f0

This comment has been minimized.

f0 commented Nov 25, 2016

@crawford with the patch from https://www.redhat.com/archives/dm-devel/2015-March/msg00101.html its fixed, no dependency cycle and a working multipath :-)

@f0

This comment has been minimized.

f0 commented Nov 28, 2016

@crawford ping

@crawford

This comment has been minimized.

Member

crawford commented Dec 8, 2016

@f0 sorry for the delay. Since that patch works, it means that cloud-init is not at fault and using Ignition would always trigger the error. We'll have to go ahead and pull that fix into our units. Thank you so much for tracking this down.

@dm0-

This comment has been minimized.

Member

dm0- commented Dec 13, 2016

We'll upgrade the multipath-tools package to upstream's 0.6.2, which includes the unit dependency fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment