New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd units revision required: After= vs Wants= and Before= vs WantedBy= #2198

Closed
adrelanos opened this Issue Jul 25, 2016 · 16 comments

Comments

Projects
None yet
3 participants
@adrelanos
Member

adrelanos commented Jul 25, 2016

Applies to all Qubes versions and templates. R3.2 as release target is probably good enough. No backport required [and also not advised since this is a huge, non-trivial bug fix].

I think all Qubes and Whonix systemd units should be reviewed and lots of them revised. There might be a misunderstanding about After=, Before=, Wants= and WantedBy=. I think every time we are using After=some.service we also should add Wants=some.service.

As I understand now, After= and Before= is just for ordering. It does not declare strict dependencies. From the systemd manual:

It is a common pattern to include a unit name in both the After= and Requires= option.

So After= and Before= apparently focus more on speed than dependencies.

For at least qubes-network.service, qubes-firewall.service and qubes-iptables.service we should replace WantedBy=multi-user.target by WantedBy=network-pre.target.

Old target wants symlinks such as /etc/systemd//system/multi-user.target.wants/qubes-network.service are not removed during Debian package upgrades. This is a Debian bug:
init-system-helpers: deb-systemd-helper doesn't remove old links when changing WantedBy= at package upgrade.

Having multiple instances of WantedBy= should be fine in theory, but that did not work for me. Also to make upgrades templates more similar to fresh templates, cleaning up the deprecated symlinks is advisable.

The motivation for doing this should be preventing lots of race condition "strange" bug experiences.

This issue has been found by me because I was wondering why in sys-whonix the Tor systemd service is being started for 4 times. The network-proxy-setup script / qubes-whonix-network.service appeared after trying and failing to start to start Tor for the first in syslog. And that even though qubes-whonix-network.service includes Before=network-pre.target. After using WantedBy=network-pre.target and cleaning up the old multi-user wants symlink, that was fixed.

I am not entirely sure about this, but to get more certain I asked on the systemd mailing list so they will probably soon confirm or deny my current understand.
[systemd-devel] understanding systemd ordering vs dependencies
https://lists.freedesktop.org/archives/systemd-devel/2016-July/037221.html

//cc @rustybird

Summary:

    1. lets combine After= with Wants=
    1. lets combine Before= with WantedBy=
    1. make sure that old target wants symlinks such as /etc/systemd//system/multi-user.target.wants/qubes-network.service as properly removed during upgrade
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 25, 2016

Member

It is bit late in 3.2 release cycle for big changes, so lets focus only on fixing bugs.

For at least qubes-network.service, qubes-firewall.service and qubes-iptables.service we should replace WantedBy=multi-user.target by WantedBy=network-pre.target.

👍

  1. lets combine After= with Wants=

👎 This would always pull in that unit. Even if manually disabled (for any reason).

Member

marmarek commented Jul 25, 2016

It is bit late in 3.2 release cycle for big changes, so lets focus only on fixing bugs.

For at least qubes-network.service, qubes-firewall.service and qubes-iptables.service we should replace WantedBy=multi-user.target by WantedBy=network-pre.target.

👍

  1. lets combine After= with Wants=

👎 This would always pull in that unit. Even if manually disabled (for any reason).

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 25, 2016

Member

Marek Marczykowski-Górecki:

  1. lets combine After= with Wants=
    👎 This would always pull in that unit. Even if manually disabled (for any reason).

I don't think so. Wants= is a weaker form of Requires=. Just now tested.

service_a.service:
Wants=service_b.service

sudo systemctl mask service_b

service_a was still starting.

I guess your concern applies to Requires=, but we do not have to use that.

Member

adrelanos commented Jul 25, 2016

Marek Marczykowski-Górecki:

  1. lets combine After= with Wants=
    👎 This would always pull in that unit. Even if manually disabled (for any reason).

I don't think so. Wants= is a weaker form of Requires=. Just now tested.

service_a.service:
Wants=service_b.service

sudo systemctl mask service_b

service_a was still starting.

I guess your concern applies to Requires=, but we do not have to use that.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 26, 2016

Member

sudo systemctl mask service_b

This isn't canonical way for disabling units. And when you use Wants=, systemctl disable is no longer effective.

Member

marmarek commented Jul 26, 2016

sudo systemctl mask service_b

This isn't canonical way for disabling units. And when you use Wants=, systemctl disable is no longer effective.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 26, 2016

Member

Marek Marczykowski-Górecki:

sudo systemctl mask service_b

This isn't canonical way for disabling units.

And when you use Wants=, systemctl disable is no longer effective.

Right.

I think we should still go for After=. Let's take for example:
systemd-random-seed.service.d/30_qubes.conf

It uses:
After=qubes-db.service

[1] By not using Wants=qubes-db.service someone could experience a race
condition, that Qubes random seed runs actually before qubes-db, thereby
failing and leaving the user with worse entropy. A pretty severe issue.

[2] On the other hand by using Wants=qubes-db.service, the user looses
the ability to use 'systemctl disable qubes-db'. This is also a problem,
but a much smaller, more more geeky problem. So geeky, we might never
have someone on the Qubes mailing list complain "I cannot systemd
disable qubes-db.service". I'd say such cases should be unsupported
since it is a super minority and invalid use case. And if not
unsupported, we could still say "use systemctl mask qubes-db". Or the
user could use a systemd drop-in file to override/disable the Wants=
directive.

[1] Seems very much more important to me than [2]. So [2] is a
compromise I recommend to make.

Member

adrelanos commented Jul 26, 2016

Marek Marczykowski-Górecki:

sudo systemctl mask service_b

This isn't canonical way for disabling units.

And when you use Wants=, systemctl disable is no longer effective.

Right.

I think we should still go for After=. Let's take for example:
systemd-random-seed.service.d/30_qubes.conf

It uses:
After=qubes-db.service

[1] By not using Wants=qubes-db.service someone could experience a race
condition, that Qubes random seed runs actually before qubes-db, thereby
failing and leaving the user with worse entropy. A pretty severe issue.

[2] On the other hand by using Wants=qubes-db.service, the user looses
the ability to use 'systemctl disable qubes-db'. This is also a problem,
but a much smaller, more more geeky problem. So geeky, we might never
have someone on the Qubes mailing list complain "I cannot systemd
disable qubes-db.service". I'd say such cases should be unsupported
since it is a super minority and invalid use case. And if not
unsupported, we could still say "use systemctl mask qubes-db". Or the
user could use a systemd drop-in file to override/disable the Wants=
directive.

[1] Seems very much more important to me than [2]. So [2] is a
compromise I recommend to make.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 26, 2016

Member

[1] By not using Wants=qubes-db.service someone could experience a race condition, that Qubes random seed runs actually before qubes-db, thereby failing and leaving the user with worse entropy. A pretty severe issue.

No. After=qubes-db means the service will be started after starting qubes-db (if qubes-db is enabled/installed at all). Wants= has nothing to do with start order. Please read systemd.unit man page.

Member

marmarek commented Jul 26, 2016

[1] By not using Wants=qubes-db.service someone could experience a race condition, that Qubes random seed runs actually before qubes-db, thereby failing and leaving the user with worse entropy. A pretty severe issue.

No. After=qubes-db means the service will be started after starting qubes-db (if qubes-db is enabled/installed at all). Wants= has nothing to do with start order. Please read systemd.unit man page.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 26, 2016

Member

The problem you're referring to may be related to #2194 and some attempts discussed there - which resulted in dependency cycles, so some services were not started at all. Wants= would not fix that in any way.

Member

marmarek commented Jul 26, 2016

The problem you're referring to may be related to #2194 and some attempts discussed there - which resulted in dependency cycles, so some services were not started at all. Wants= would not fix that in any way.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 26, 2016

Member

Wants=qubes-db.service may be good idea, but for totally different reason - to mitigate problems of disabled services as a result of some bug (like failed post-install script). But as said before - it is too late to have it in 3.2.

Member

marmarek commented Jul 26, 2016

Wants=qubes-db.service may be good idea, but for totally different reason - to mitigate problems of disabled services as a result of some bug (like failed post-install script). But as said before - it is too late to have it in 3.2.

@andrewdavidwong

This comment has been minimized.

Show comment
Hide comment
@andrewdavidwong

andrewdavidwong Jul 26, 2016

Member

Assigning labels and milestone based on this:

It is bit late in 3.2 release cycle for big changes, so lets focus only on fixing bugs.

Recommend a separate issue for any post-3.2 changes.

Member

andrewdavidwong commented Jul 26, 2016

Assigning labels and milestone based on this:

It is bit late in 3.2 release cycle for big changes, so lets focus only on fixing bugs.

Recommend a separate issue for any post-3.2 changes.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 27, 2016

Member

Ordering cycles in whonix gateway:

Jul 27 01:04:00 host systemd[1]: Found ordering cycle on basic.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on sysinit.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on rpcbind.service/start
Jul 27 01:04:00 host systemd[1]: Found dependency on network-online.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on NetworkManager-wait-online.service/start
Jul 27 01:04:00 host systemd[1]: Found dependency on basic.target/start
Jul 27 01:04:00 host systemd[1]: Breaking ordering cycle by deleting job rpcbind.service/start
Jul 27 01:04:00 host systemd[1]: Job rpcbind.service/start deleted to break ordering cycle starting with basic.target/start

(this one is also on plain Debian-8 VM)
I think I know how to solve this one.

Jul 27 01:04:00 host systemd[1]: Found ordering cycle on basic.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on timers.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on qubes-update-check.timer/start
Jul 27 01:04:00 host systemd[1]: Found dependency on qubes-whonix-network.service/start
Jul 27 01:04:00 host systemd[1]: Found dependency on basic.target/start
Jul 27 01:04:00 host systemd[1]: Breaking ordering cycle by deleting job timers.target/start
Jul 27 01:04:00 host systemd[1]: Job timers.target/start deleted to break ordering cycle starting with basic.target/start

@adrelanos This one looks Whonix-specific. qubes-whonix-network.service is ordered before qubes-update-check.timer (in addition to qubes-update-check.service). Why? timer unit is only about scheduling updates check, but I believe the intention was to have start qubes-whonix-network.service before actual check. I don't see anything in qubes-whonix-network.service interfering with scheduling itself (like disabling it or so).
So, I think that Before=qubes-update-check.timer should be removed.
But if you have valid reason to start it before scheduling timer, you need to modify qubes-whonix-network.service to start before timers.target, so also before basic.target - which means DefaultDependencies=no. Pretty drastic step. Or alternatively I need to modify qubes-update-chec.timer to not be ordered before timers.target, which also means DefaultDependencies=no there...

Anyway I do not consider this a release blocker, since systemd is dealing with it pretty well - by simply delaying that timer startup.

Member

marmarek commented Jul 27, 2016

Ordering cycles in whonix gateway:

Jul 27 01:04:00 host systemd[1]: Found ordering cycle on basic.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on sysinit.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on rpcbind.service/start
Jul 27 01:04:00 host systemd[1]: Found dependency on network-online.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on NetworkManager-wait-online.service/start
Jul 27 01:04:00 host systemd[1]: Found dependency on basic.target/start
Jul 27 01:04:00 host systemd[1]: Breaking ordering cycle by deleting job rpcbind.service/start
Jul 27 01:04:00 host systemd[1]: Job rpcbind.service/start deleted to break ordering cycle starting with basic.target/start

(this one is also on plain Debian-8 VM)
I think I know how to solve this one.

Jul 27 01:04:00 host systemd[1]: Found ordering cycle on basic.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on timers.target/start
Jul 27 01:04:00 host systemd[1]: Found dependency on qubes-update-check.timer/start
Jul 27 01:04:00 host systemd[1]: Found dependency on qubes-whonix-network.service/start
Jul 27 01:04:00 host systemd[1]: Found dependency on basic.target/start
Jul 27 01:04:00 host systemd[1]: Breaking ordering cycle by deleting job timers.target/start
Jul 27 01:04:00 host systemd[1]: Job timers.target/start deleted to break ordering cycle starting with basic.target/start

@adrelanos This one looks Whonix-specific. qubes-whonix-network.service is ordered before qubes-update-check.timer (in addition to qubes-update-check.service). Why? timer unit is only about scheduling updates check, but I believe the intention was to have start qubes-whonix-network.service before actual check. I don't see anything in qubes-whonix-network.service interfering with scheduling itself (like disabling it or so).
So, I think that Before=qubes-update-check.timer should be removed.
But if you have valid reason to start it before scheduling timer, you need to modify qubes-whonix-network.service to start before timers.target, so also before basic.target - which means DefaultDependencies=no. Pretty drastic step. Or alternatively I need to modify qubes-update-chec.timer to not be ordered before timers.target, which also means DefaultDependencies=no there...

Anyway I do not consider this a release blocker, since systemd is dealing with it pretty well - by simply delaying that timer startup.

marmarek added a commit to marmarek/old-qubes-core-agent-linux that referenced this issue Jul 27, 2016

systemd: improve ordering of systemd units
- qubes-misc-post.service is no longer responsible for mounting /rw
- both qubes-sysinit.service and qubes-mount-dirs.service are part of
  basic.target, so no need to mention them explicitly (as long as
  DefaultDependencies=yes)

QubesOS/qubes-issues#2198

marmarek added a commit to marmarek/old-qubes-core-agent-linux that referenced this issue Jul 27, 2016

WIP systemd: make sure IP address is configured before network.target
setup-ip script can be called in two places:
1. from udev rule
2. from qubes-misc-post.service

The former one may fail if happen before qubes-db daemon is started. The
later one will fix it, but in the meantime some services (like whonix
update proxy sanity check) may expect to have already working network
(before are ordered after network.target or even network-online.target).

Delaying udev rules processing seems unreasonable, so introduce another
systemd service responsible for just configuring IP address. This does
not replace udev rule (but during startup will duplicate the same task),
because it is still needed to handle dynamic network switch. Also
the call in qubes-misc-post may be needed to handle DispVM startup
(network needs to be re-configured after restoring savefile).

Another idea would be to order qubes-misc-post.service after
network-online.target.

This commit is work in progres. It doesn't deal with the case where
initial setup-ip call from udev rule succeed. Second call will fail,
marking qubes-setup-ip service as failed. Also the third call (from
qubes-misc-post.service) will almost always fail - probably in every VM
but DispVM.

QubesOS/qubes-issues#2198
@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 27, 2016

Member

Another problem, described in marmarek/qubes-core-agent-linux@7eb7561:

setup-ip script can be called in two places:

  1. from udev rule
  2. from qubes-misc-post.service

The former one may fail if happen before qubes-db daemon is started. The
later one will fix it, but in the meantime some services (like whonix
update proxy sanity check) may expect to have already working network
(before are ordered after network.target or even network-online.target).

That commit attempts to fix it, but it is incomplete yet - details in the commit message.

But even when fixed, qubes-whonix-torified-updates-proxy-check.service may still fail, because it is ordered after network.target, not network-online.target. Since it tries to access the network, I think it should be ordered after network-online.target. But such a change may be complex and have multiple consequences. /cc @adrelanos

Member

marmarek commented Jul 27, 2016

Another problem, described in marmarek/qubes-core-agent-linux@7eb7561:

setup-ip script can be called in two places:

  1. from udev rule
  2. from qubes-misc-post.service

The former one may fail if happen before qubes-db daemon is started. The
later one will fix it, but in the meantime some services (like whonix
update proxy sanity check) may expect to have already working network
(before are ordered after network.target or even network-online.target).

That commit attempts to fix it, but it is incomplete yet - details in the commit message.

But even when fixed, qubes-whonix-torified-updates-proxy-check.service may still fail, because it is ordered after network.target, not network-online.target. Since it tries to access the network, I think it should be ordered after network-online.target. But such a change may be complex and have multiple consequences. /cc @adrelanos

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jul 27, 2016

Member

@unman could you help us here?

Member

marmarek commented Jul 27, 2016

@unman could you help us here?

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 27, 2016

Member

Marek Marczykowski-Górecki:

👎 This would always pull in that unit. Even if manually disabled (for any reason).

I finally wrapped my head around After= vs Wants= and I understand and
agree with that now.

@adrelanos This one looks Whonix-specific. qubes-whonix-network.service is ordered before qubes-update-check.timer (in addition to qubes-update-check.service). Why? timer unit is only about scheduling updates check, but I believe the intention was to have start qubes-whonix-network.service before actual check. I don't see anything in qubes-whonix-network.service interfering with scheduling itself (like disabling it or so).
So, I think that Before=qubes-update-check.timer should be removed.
But if you have valid reason to start it before scheduling timer, you need to modify qubes-whonix-network.service to start before timers.target, so also before basic.target - which means DefaultDependencies=no. Pretty drastic step. Or alternatively I need to modify qubes-update-chec.timer to not be ordered before timers.target, which also means DefaultDependencies=no there...

Anyway I do not consider this a release blocker, since systemd is dealing with it pretty well - by simply delaying that timer startup.

Yes. This is fixed in qubes-whonix git master and will be included in
the upcoming update of that package.

But even when fixed, qubes-whonix-torified-updates-proxy-check.service may still fail, because it is ordered after network.target, not network-online.target. Since it tries to access the network, I think it should be ordered after network-online.target. But such a change may be complex and have multiple consequences. /cc @adrelanos

I am testing qubes-whonix-torified-updates-proxy-check.service
After=network-online.target, seems much better to me.

Member

adrelanos commented Jul 27, 2016

Marek Marczykowski-Górecki:

👎 This would always pull in that unit. Even if manually disabled (for any reason).

I finally wrapped my head around After= vs Wants= and I understand and
agree with that now.

@adrelanos This one looks Whonix-specific. qubes-whonix-network.service is ordered before qubes-update-check.timer (in addition to qubes-update-check.service). Why? timer unit is only about scheduling updates check, but I believe the intention was to have start qubes-whonix-network.service before actual check. I don't see anything in qubes-whonix-network.service interfering with scheduling itself (like disabling it or so).
So, I think that Before=qubes-update-check.timer should be removed.
But if you have valid reason to start it before scheduling timer, you need to modify qubes-whonix-network.service to start before timers.target, so also before basic.target - which means DefaultDependencies=no. Pretty drastic step. Or alternatively I need to modify qubes-update-chec.timer to not be ordered before timers.target, which also means DefaultDependencies=no there...

Anyway I do not consider this a release blocker, since systemd is dealing with it pretty well - by simply delaying that timer startup.

Yes. This is fixed in qubes-whonix git master and will be included in
the upcoming update of that package.

But even when fixed, qubes-whonix-torified-updates-proxy-check.service may still fail, because it is ordered after network.target, not network-online.target. Since it tries to access the network, I think it should be ordered after network-online.target. But such a change may be complex and have multiple consequences. /cc @adrelanos

I am testing qubes-whonix-torified-updates-proxy-check.service
After=network-online.target, seems much better to me.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 29, 2016

Member

@marmarek

Wants=qubes-db.service may be good idea, but for totally different reason - to mitigate problems of disabled services as a result of some bug (like failed post-install script). But as said before - it is too late to have it in 3.2.

@andrewdavidwong

Recommend a separate issue for any post-3.2 changes.

Created #2208 for it.

Member

adrelanos commented Jul 29, 2016

@marmarek

Wants=qubes-db.service may be good idea, but for totally different reason - to mitigate problems of disabled services as a result of some bug (like failed post-install script). But as said before - it is too late to have it in 3.2.

@andrewdavidwong

Recommend a separate issue for any post-3.2 changes.

Created #2208 for it.

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 29, 2016

Member

I am migrating / splitting this ticket into short, concise, clear, actionable follow up tickets.

fix Qubes firewall dependencies / Qubes firewall may be load too late during boot:
#2209

Fixing this in Whonix is being tracked here:
fix qubes-whonix-firewall systemd service start in Q2198 branch
https://phabricator.whonix.org/T528

Member

adrelanos commented Jul 29, 2016

I am migrating / splitting this ticket into short, concise, clear, actionable follow up tickets.

fix Qubes firewall dependencies / Qubes firewall may be load too late during boot:
#2209

Fixing this in Whonix is being tracked here:
fix qubes-whonix-firewall systemd service start in Q2198 branch
https://phabricator.whonix.org/T528

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 29, 2016

Member

make sure IP address is configured before network.target:
#2210

Member

adrelanos commented Jul 29, 2016

make sure IP address is configured before network.target:
#2210

@adrelanos

This comment has been minimized.

Show comment
Hide comment
@adrelanos

adrelanos Jul 29, 2016

Member

I have carefully re-read this whole ticket and believe I have created follow up tasks for everything.

Therefore I suggest to close this ticket.

(To make sure that this approach was sensible...
qubes-devel: restarting tickets / migrating old tickets to new ones
https://groups.google.com/forum/#!topic/qubes-devel/52sM-SlMX2Q )

Member

adrelanos commented Jul 29, 2016

I have carefully re-read this whole ticket and believe I have created follow up tasks for everything.

Therefore I suggest to close this ticket.

(To make sure that this approach was sensible...
qubes-devel: restarting tickets / migrating old tickets to new ones
https://groups.google.com/forum/#!topic/qubes-devel/52sM-SlMX2Q )

@marmarek marmarek closed this Aug 6, 2016

marmarek added a commit to QubesOS/qubes-core-agent-linux that referenced this issue Nov 20, 2016

systemd: improve ordering of systemd units
- qubes-misc-post.service is no longer responsible for mounting /rw
- both qubes-sysinit.service and qubes-mount-dirs.service are part of
  basic.target, so no need to mention them explicitly (as long as
  DefaultDependencies=yes)

QubesOS/qubes-issues#2198

(cherry picked from commit 60d16ea)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment