New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd: Add explicit Before=ceph.target #15835

Merged
merged 1 commit into from Jul 3, 2017

Conversation

Projects
None yet
7 participants
@tserong
Contributor

tserong commented Jun 22, 2017

The PartOf= and WantedBy= directives in the various systemd
unit files and targets create the following hierarchy:

Additionally, the ceph-{fuse,mds,mon,osd,radosgw,rbd-mirror}
targets have WantedBy=multi-user.target. This gives the
following behaviour:

  • systemctl {start,stop,restart} of any target will restart
    all dependent services (e.g.: systemctl restart ceph.target
    will restart all services; systemctl restart ceph-mon.target
    will restart all the mons, and so forth).
  • systemctl {enable,disable} for the second level targets
    (ceph-mon.target etc.) will cause depenent services to come
    up on boot, or not (of course the individual services can
    be enabled or disabled as well - for a service to start
    on boot, both the service and its target must be enabled;
    disabling either will cause the service to be disabled).
  • systemctl {enable,disable} ceph.target has no effect on
    whether or not services come up at boot; if the second level
    targets and services are enabled, they'll start regardless of
    whether ceph.target is enabled. This is due to the second
    level targets all having WantedBy=multi-user.target.
  • The OSDs will always start regardless of ceph-osd.target
    (unless they are explicitly masked), thanks to udev magic.

So far, so good. Except, several users have encountered
services not starting with the following error:

Failed to start ceph-osd@5.service: Transaction order is
cyclic. See system logs for details.

I've not reproduced this myself, but have inspected systems
with this problem. It seems that somehow systemd gets
confused, and thinks:

  1. ceph-osd@5.service needs to start after ceph-mon.target
  2. ceph-mon.target needs to start after ceph.target
  3. ceph.target needs to start after ceph-osd.target
  4. ceph-osd.target needs to start after ceph-osd@5.service

i.e. ceph.target has wound up stuck in the middle of a
dependency cycle, rather than being after everything where
it should be. My best guess is that somehow the automatic
enablement and starting of OSDs on boot (thanks to udev
rules) is causing systemd to become confused about what's
meant to happen when.

Like I said, I haven't reproduced this myself, but I do have
a workaround. We can change PartOf= and WantedBy= in the
various service and target files to flatten the hierarchy,
removing all dependencies between the current second level
targets and ceph.target. This gives the following structure:

The behaviour remains the same as described above for the
existing hierarchy (start/stop/restart/enable/disable still
all work the same way), but by splitting ceph.target out, there's
no way for it to end up being stuck at point 3 in the cycle.

There is one caveat: existing systems using the original hierarchy
will have systemd's symlinks set up something like this:

[...]
/etc/systemd/system/ceph.target.wants/ceph-mon.target
/etc/systemd/system/ceph.target.wants/ceph-osd.target
/etc/systemd/system/ceph-mon.target.wants/ceph-mon@ses4-3.service
/etc/systemd/system/ceph-osd.target.wants/ceph-osd@2.service
[...]

But, with this change, the ceph.target.wants symlinks need to point
to services, not the old second level targets, i.e. it needs to
be as follows:

[...]
/etc/systemd/system/ceph.target.wants/ceph-osd@2.service
/etc/systemd/system/ceph.target.wants/ceph-mon@ses4-3.service
/etc/systemd/system/ceph-osd.target.wants/ceph-osd@2.service
/etc/systemd/system/ceph-mon.target.wants/ceph-mon@ses4-3.service
[...]

To fix these symlinks, you need to run systemctl disable then
systemctl enable for each of the target files, and each of the
individual services. Until you do this, systemctl start, stop and
restart on ceph.target will have no effect (start, stop and restart
on the other targets will work just fine though).

Fixes: https://bugzilla.suse.com/show_bug.cgi?id=1042973
Signed-off-by: Tim Serong tserong@suse.com

@smithfarm smithfarm added the build/ops label Jun 22, 2017

@tserong tserong changed the title from systemd: flatten target hierarchy to [DNM] systemd: flatten target hierarchy Jun 23, 2017

@jcsp

This comment has been minimized.

Contributor

jcsp commented Jun 23, 2017

We're only using tracker.ceph.com URLs for Fixes: lines -- at Red Hat we usually end up with tracker tickets that mirror-ish buzilla tickets, and reference the tracker one from the commit.

(I came via twitter for the comically long commit message... I don't think anyone would object to some of that text going into doc/ somewhere too for the benefit of sysadmins who are into their systemd stuff)

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 23, 2017

Can we add the systemd disable/enable calls to the upgrade section of the ceph.spec and debian postinst (or whatever it is) files?

@tserong

This comment has been minimized.

Contributor

tserong commented Jun 23, 2017

I've got an alternate fix, which adds After=ceph-fuse.target ceph-mds.target ceph-mgr.target ceph-mon.target ceph-osd.target ceph-radosgw.target ceph-rbd-mirror.target to ceph.target to force a sane ordering without flattening the hierarchy (see SUSE#129 (comment)). If we do that, we don't need the service re-enablement piece.

If we did go with the flattened hierarchy, then yes, we could do the re-enablement on upgrade on %post/postinst.

I'm happy to update doc/ too :-)

@tserong

This comment has been minimized.

Contributor

tserong commented Jun 27, 2017

Some more information (partially cribbed from SUSE#129).

To reproduce the problem, you need a node hosting both MON and OSD(s). AFAICT the problem occurs because systemd is confused by the combination of a two-level "hierarchy" of targets, plus a "cross-dependency" between ceph-osd@.service and ceph-mon.target (if ceph-osd@.service didn't specify After=ceph-mon.target, I suspect the problem would go away, but we don't want to remove that, so it's a moot point).

# systemctl disable ceph.target
# systemctl disable ceph-mon.target
# systemctl disable ceph-osd.target
# systemctl disable ceph-mon@ses4-2.service
# for n in 8 9 12 6 10 7 1 5 11 ; do systemctl disable ceph-osd@$n.service ; done
# for n in 8 9 12 6 10 7 1 5 11 ; do systemctl disable --runtime ceph-osd@$n.service ; done

Everything is now disabled. A clean slate. So let's re-enable everything:

# systemctl enable ceph.target
# systemctl enable ceph-mon.target
# systemctl enable ceph-osd.target
# systemctl enable ceph-mon@ses4-2.service
# for n in 8 9 12 6 10 7 1 5 11 ; do systemctl enable --runtime ceph-osd@$n.service ; done

Now, check ceph.target:

# systemctl show ceph.target|grep '^Before\|^After'
Before=ceph-mds.target multi-user.target ceph-mgr.target
After=ceph-mon.target ceph-osd.target

Restart ceph.target works fine (no output, and everything gets restarted):

# systemctl restart ceph.target

Now, disable ceph.target, and check the ordering again:

# systemctl disable ceph.target
# systemctl show ceph.target|grep '^Before\|^After'
Before=ceph-mon.target ceph-mgr.target ceph-mds.target
After=ceph-osd.target

Suddenly, ceph.target when disabled is Before ceph-mon.target. If we try to use it now, it'll break:

# systemctl restart ceph.target
Failed to restart ceph.target: Transaction order is cyclic. See system logs for details.

# journalctl |tail
Jun 23 14:11:49 ses4-2 systemd[1]: ceph-mon.target: Found ordering cycle on ceph-mon.target/restart
Jun 23 14:11:49 ses4-2 systemd[1]: ceph-mon.target: Found dependency on ceph.target/restart
Jun 23 14:11:49 ses4-2 systemd[1]: ceph-mon.target: Found dependency on ceph-osd.target/restart
Jun 23 14:11:49 ses4-2 systemd[1]: ceph-mon.target: Found dependency on ceph-osd@7.service/restart
Jun 23 14:11:49 ses4-2 systemd[1]: ceph-mon.target: Found dependency on ceph-mon.target/restart
Jun 23 14:11:49 ses4-2 systemd[1]: Unable to break cycle
Jun 23 14:11:49 ses4-2 systemd[1]: Requested transaction contains an unfixable cyclic ordering dependency: Transaction order is cyclic. See system logs for details.

I've done the above on SLES. I would greatly appreciate if someone with access to RHEL/Fedora/Ubuntu could try the same, and see if they have the same problem.

I have two proposed fixes:

  1. Flatten the target hierarchy. This requires us to add extra code to package postinstall to disable then re-enable all services on upgrade; if this is not done, then subsequent systemctl {start,stop,restart} ceph.target invocations will have no effect.
  2. Leave the hierarchy as-is, but add After=ceph-fuse.target ceph-mds.target ceph-mgr.target ceph-mon.target ceph-osd.target ceph-radosgw.target ceph-rbd-mirror.target to ceph.target. This requires no special action on upgrade; everything just works.

The behaviour of the various services for users invoking systemctl remains unchanged in either case.

I have mixed feelings about both fixes. Mostly it comes down to:

  1. I really dislike adding magic upgrade code (those things never get tested properly).
  2. I somewhat dislike adding an explicit After= line (surely systemd should be able to figure this stuff out by itself without getting tied in knots; it seems ...messy... to need to specify this).

So, what's the best (or least worst) solution here? (Ping @ddiss and @jan--f, who've commented on SUSE#129 already).

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Jun 27, 2017

In my mind, this is a workaround for a systemd bug (or odd systemd behavior) and the After= option is less intrusive.

Also, I wonder what role udev plays in triggering the bug.

@jan--f

This comment has been minimized.

Member

jan--f commented Jun 27, 2017

AFAICT the problem occurs because systemd is confused by the combination of a two-level "hierarchy" of targets, plus a "cross-dependency" between ceph-osd@.service and ceph-mon.target (if ceph-osd@.service didn't specify After=ceph-mon.target, I suspect the problem would go away, but we don't want to remove that, so it's a moot point).

That is indeed the case in my tests. I also found this detail in the systemd documentation:

Note that Wants= or Requires= must be defined in the target unit itself — if you for example define Wants=some.target in some.service, the implicit ordering will not be added.

Would that suggest that ceph-osd.target doesn not pick up the dependency to ceph-mon.target via the ceph-osd@.service files? Though how that creates a cycle is still a bit foggy to me.
Also adding After=ceph-mon.target to ceph-osd.target didn't work very reliable for me, though I might have messed that up. Will retry.

@tserong

This comment has been minimized.

Contributor

tserong commented Jun 28, 2017

Would that suggest that ceph-osd.target doesn not pick up the dependency to ceph-mon.target via the ceph-osd@.service files?

I think it just means that there won't be an ordering dependency created by magic.

Also adding After=ceph-mon.target to ceph-osd.target didn't work very reliable for me, though I might have messed that up. Will retry.

My guess is that still won't prevent ceph.target potentially ending up in a weird place.

@tserong

This comment has been minimized.

Contributor

tserong commented Jun 28, 2017

Also, I wonder what role udev plays in triggering the bug.

My original guess was that udev triggering systemctl disable and systemctl enable during boot was screwing things up in the cases where OSDs failed to start on boot due to "Transaction order is cyclic". I still think that's plausible, but never managed to reproduce that case myself.

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 28, 2017

Is this fixing the same bug as http://tracker.ceph.com/issues/14839 ?

@liewegas

This comment has been minimized.

Member

liewegas commented Jun 28, 2017

Also, would it makes sense to keep ceph-*.target as PartOf ceph.target so that they also appear to "stop" when ceph.target is stopped? (So there would be 2 paths by which a single daemon should stop.. both via ceph.target directly and indirectly via ceph.target->ceph-foo.target?)

@vasukulkarni

This comment has been minimized.

Member

vasukulkarni commented Jun 28, 2017

There is a systemd test that runs on centos/xenial and tests various stop/starts/reboot, If the build is in chacra I would like to run those 2 tests with this wip* branch.

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Jun 28, 2017

@vasukulkarni Pushed wip-flatten-systemd-target-hierarchy-master to Shaman.

@vasukulkarni

This comment has been minimized.

Member

vasukulkarni commented Jun 28, 2017

@smithfarm thanks, will post the results here when it completes.

@b-ranto b-ranto self-requested a review Jun 28, 2017

@b-ranto

This comment has been minimized.

Contributor

b-ranto commented Jun 28, 2017

Hmm, ok, this happens because we only define a dependency (WantedBy) and restart/stop behaviour (PartOf), not the order (with Before) in the second level targets. Adding Before=ceph.target to each of the subtargets should help here and should probably be the preferred fix as not all the targets are always installed. You can check e.g. /usr/lib/systemd/system/machines.target which is in a similar situation (it is a target wanted by another target) and needs to have the Before= line defined as well.

@tserong

This comment has been minimized.

Contributor

tserong commented Jun 29, 2017

I tried adding Before=ceph.target to all the second level targets (instead of flattening the hierarchy, and instead of adding After= to ceph.target), and this seems to fix it for me as well.

(Although I don't think having After= for units that aren't installed matters, as that only defines ordering, not a dependency on existence, but I digress).

So, now we have three options:

  1. Flatten the target hierarchy (no explicit ordering)
  2. Add After=[second level targets] to ceph.target
  3. Add Before=ceph.target to second level targets

2 and 3 both don't need any service re-enablement on upgrade.

Preferences, anyone?

@b-ranto

This comment has been minimized.

Contributor

b-ranto commented Jun 29, 2017

@tserong the option 3 should address your concern of 2 not being clean enough, I thought that 2 will print some warnings in the journal log if the targets are not installed but I am not so sure about it any more.

I have a rather bad feeling about the option 1, it looks like it should work but I just think it will have some unexpected consequences that we will hit at some point in the future.

systemd: Add explicit Before=ceph.target
The PartOf= and WantedBy= directives in the various systemd
unit files and targets create the following logical hierarchy:

- ceph.target
  - ceph-fuse.target
    - ceph-fuse@.service
  - ceph-mds.target
    - ceph-mds@.service
  - ceph-mgr.target
    - ceph-mgr@.service
  - ceph-mon.target
    - ceph-mon@.service
  - ceph-osd.target
    - ceph-osd@.service
  - ceph-radosgw.target
    - ceph-radosgw@.service
  - ceph-rbd-mirror.target
    - ceph-rbd-mirror@.service

Additionally, the ceph-{fuse,mds,mon,osd,radosgw,rbd-mirror}
targets have WantedBy=multi-user.target.  This gives the
following behaviour:

- `systemctl {start,stop,restart}` of any target will restart
  all dependent services (e.g.: `systemctl restart ceph.target`
  will restart all services; `systemctl restart ceph-mon.target`
  will restart all the mons, and so forth).
- `systemctl {enable,disable}` for the second level targets
  (ceph-mon.target etc.) will cause depenent services to come
  up on boot, or not (of course the individual services can
  be enabled or disabled as well - for a service to start
  on boot, both the service and its target must be enabled;
  disabling either will cause the service to be disabled).
- `systemctl {enable,disable} ceph.target` has no effect on
  whether or not services come up at boot; if the second level
  targets and services are enabled, they'll start regardless of
  whether ceph.target is enabled.  This is due to the second
  level targets all having WantedBy=multi-user.target.
- The OSDs will always start regardless of ceph-osd.target
  (unless they are explicitly masked), thanks to udev magic.

So far, so good.  Except, several users have encountered
services not starting with the following error:

  Failed to start ceph-osd@5.service: Transaction order is
  cyclic. See system logs for details.

I've not been able to reproduce this myself in such a way as to
cause OSDs to fail to start, but I *have* managed to get systemd
into that same confused state, as follows:

- Disable ceph.target, ceph-mon.target, ceph-osd.target,
  ceph-mon@$(hostname).service and all ceph-osd instances.
- Re-enable all of the above.

At this point, everything is fine, but if I then subseqently
disable ceph.target, *then* try `systemctl restart ceph.target`,
I get "Failed to restart ceph.target: Transaction order is cyclic.
See system logs for details."

Explicitly adding Before=ceph.target to each second level target
prevents systemd from becoming confused in this situation.

Signed-off-by: Tim Serong <tserong@suse.com>

@tserong tserong changed the title from [DNM] systemd: flatten target hierarchy to systemd: Add explicit Before=ceph.target Jun 30, 2017

@tserong

This comment has been minimized.

Contributor

tserong commented Jun 30, 2017

OK, I've redone this per option 3. @smithfarm, @jan--f, any objections?

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Jun 30, 2017

Jenkins re-test this please

@smithfarm

No objections. LGTM

@jan--f

This comment has been minimized.

Member

jan--f commented Jun 30, 2017

yep...lgtm.

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Jun 30, 2017

make check failed with a python backtrace while bootstrapping setuptools:

...
  File "/home/jenkins-build/build/workspace/ceph-pull-requests/install-deps-python2.7_tmp/share/python-wheels/urllib3-1.13.1-py2.py3-none-any.whl/urllib3/util/retry.py", line 228, in increment
    total -= 1
TypeError: unsupported operand type(s) for -=: 'Retry' and 'int'
@tserong

This comment has been minimized.

Contributor

tserong commented Jul 3, 2017

I assume that make check failure isn't actually caused by this change?

@smithfarm

This comment has been minimized.

Contributor

smithfarm commented Jul 3, 2017

Jenkins re-test this please

@liewegas liewegas merged commit a96ebf3 into ceph:master Jul 3, 2017

5 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
default Build finished.
Details
make check make check succeeded
Details
make check (arm64) make check succeeded
Details

@tserong tserong deleted the SUSE:wip-flatten-systemd-target-hierarchy-master branch Jul 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment