Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd: 239 -> 242 #61321

Merged
merged 15 commits into from Jun 3, 2019
Merged

systemd: 239 -> 242 #61321

merged 15 commits into from Jun 3, 2019

Conversation

@andir
Copy link
Member

@andir andir commented May 11, 2019

Motivation for this change

This is a follow-up to #56184.

As explained in #56184 (comment) I rebased @Mic92's attempt for systemd version 242 and added the two patches that I wrote last year.

The current head of my systemd fork is at https://github.com/andir/systemd/commits/nixos-v242. I had to add one additional patch to make udev rules work again (andir/systemd@7283141).

While running all the NixOS release.nix tests I discovered a few issues that required changes to our modules. Those are included in this PR.

From all the failing tests only the mysqlReplication test is seemingly relevant. It requires some refactoring in order to work on v242. The failure log can be seen here: https://gist.github.com/a46e172235ae9066afd758cbf9e86564 It would be great if someone with knowledge in the area could have a look. It was broken before. Nothing we've to take care of.

I am in the process of switching my notebook to this branch to see if anything that we aren't testing is obviously broken.

Feedback from others is highly appreciated.

My private hydra build of release.nix: https://hydra.h4ck.space/eval/155 (ipv6 only)

If we like these changes it would be nice if someone with access to https://github.com/nixos/systemd could push my systemd changes to the nixos-v242 branch.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@@ -282,6 +278,8 @@ in

services.udev.path = [ pkgs.coreutils pkgs.gnused pkgs.gnugrep pkgs.utillinux udev ];

boot.kernelParams = mkIf (!config.networking.usePredictableInterfaceNames) [ "net.ifnames=0" ];
Copy link
Member Author

@andir andir May 11, 2019

Does this work for all our cases? Previously we did exclude a udev rule. That could have affected containers and similar constructs. Now we only affect "real" systems.

Copy link
Member

@fpletz fpletz May 17, 2019

tl;dr I don't think this should affect containers or custom network namespace applications.

As I understand it, the interface renaming should only effectively applied to physical interfaces (or if the kernel declares it as predictable) because the physical location is coded into the interface name (see NamePolicy in systemd.link(5) manpage). When creating virtual netdevs like vlans or bridges a custom name has to be supplied anyway. When moving physical interfaces to containers the interface has already been renamed on the host.

@@ -59,7 +59,14 @@ in
in {
DHCP = override (dhcpStr cfg.useDHCP);
} // optionalAttrs (gateway != [ ]) {
gateway = override gateway;
routes = override [
Copy link
Member Author

@andir andir May 11, 2019

I am not convinced this is the best way to solve this right now.

The issue was that we would add onlink routes for all network devices on the system. Introduced in systemd/systemd@4912ab7 systemd would set the onlink attribute for all routes that didn't come with static addresses.

Copy link
Member

@fpletz fpletz May 17, 2019

It's certainly not ideal but then again adding the default gateway to every interface is FUBAR anyway. This fix is IMHO fine until we finally deprecate networking.defaultGateway and move it to the per interface configurations.

Copy link
Member Author

@andir andir May 17, 2019

👍 I created an issue for this (#61629) and also added it to our systemd project board https://github.com/NixOS/nixpkgs/projects/22. (I am trying to populate that more with all the things we come across wile working on this PR).

@petabyteboy
Copy link
Member

@petabyteboy petabyteboy commented May 12, 2019

I'm testing this now

@andir
Copy link
Member Author

@andir andir commented May 12, 2019

@petabyteboy
Copy link
Member

@petabyteboy petabyteboy commented May 12, 2019

Yup, saw that too late, this is what it looked like:
https://termbin.com/scs1
After a reboot everything seems to work fine until now. I will report here if I find any issues.

@flokli
Copy link
Contributor

@flokli flokli commented May 12, 2019

we should probably also check why the failure with #56265 (comment) can't be reproduced anymore, and drop the systemd patch referenced there, if it's not effective anymore.

@flokli
Copy link
Contributor

@flokli flokli commented May 12, 2019

I figured out why we can't reproduce the error anymore - see explanation in #56265 (comment).

@andir, could you incorporate flokli@92600a9 into this PR, and drop andir/systemd@51077a9 from your systemd branch?

I'll try having a look at how to best fix nixops send-keys.

@Mic92
Copy link
Member

@Mic92 Mic92 commented May 13, 2019

@andir loosing control as in no longer being able to contact via dbus?

@petabyteboy
Copy link
Member

@petabyteboy petabyteboy commented May 14, 2019

Not 100% certain this is related
After switching to a new system configuration, systemd-timesyncd started crashing:

May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /nix/store/5rnzb2cr3vw536px3p156d1g3arf18v0-systemd-242/lib/systemd/systemd-timesyncd: Not a directory

/var/lib exists and is a directory.

@andir
Copy link
Member Author

@andir andir commented May 14, 2019

@andir loosing control as in no longer being able to contact via dbus?

Yes, the output looked very similar to #61321 (comment).

Not 100% certain this is related
After switching to a new system configuration, systemd-timesyncd started crashing:

May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /nix/store/5rnzb2cr3vw536px3p156d1g3arf18v0-systemd-242/lib/systemd/systemd-timesyncd: Not a directory

/var/lib exists and is a directory.

I am having the same. It seems like the daemon never came up after switching to the new systemd version. I have some idea what might cause this and will debug this tonight.

@petabyteboy
Copy link
Member

@petabyteboy petabyteboy commented May 14, 2019

The timesyncd issue has been reported for Arch-based distros too:
systemd/systemd#12131

@nixos-discourse
Copy link

@nixos-discourse nixos-discourse commented May 14, 2019

This pull request has been mentioned on Nix community. There might be relevant details there:

https://discourse.nixos.org/t/what-are-your-goals-for-19-09/2875/22

@flokli
Copy link
Contributor

@flokli flokli commented May 14, 2019

There's some more WIP patches over at https://github.com/flokli/nixpkgs/commits/systemd-v242-fk (not merged in yet here).

Planning to add some more tests for LVM and crypted volumes, plus testing of the send-keys functionality, to ensure it doesn't break.

@andir
Copy link
Member Author

@andir andir commented May 14, 2019

@andir
Copy link
Member Author

@andir andir commented May 15, 2019

I dropped the commit @flokli mentioned in #61321 (comment). Rerunning all the tests 🎉

# The interface version prevents NixOS from switching to an
# incompatible systemd at runtime. (Switching across reboots is
# fine, of course.) It should be increased whenever systemd changes
# in a backwards-incompatible way. If the interface version of two
# systemd builds is the same, then we can switch between them at
# runtime; otherwise we can't and we need to reboot.
passthru.interfaceVersion = 2;
passthru.interfaceVersion = 3;
Copy link
Member

@edolstra edolstra Jul 31, 2019

What's the reason for bumping interfaceVersion? This is really undesirable because it requires every user to reboot.

Copy link
Member Author

@andir andir Jul 31, 2019

After re-exec the userspace was no longer able to talk to the daemon.

See #61321 (comment) & #61321 (comment). Unfortunately the paste did expire :/

If we come up with a better way to handle these kinds of scenarios that would be great.

Rebooting every once in a while after your pid 1 did a major version change doesn't sound unreasonable to me.

Copy link
Member

@edolstra edolstra Jul 31, 2019

Sounds like an upstream systemd bug that should be reported there. AFAIK systemd is supposed to be able to re-exec itself across version changes.

flokli added a commit to flokli/nixpkgs that referenced this issue Aug 31, 2019
With local-fs.target part of sysinit.target
(NixOS#61321), we don't need to add it
explicitly to certain units anymore, and can change dependencies like
they are in other distros (I picked from Google's official CentOS 7
image here).

Like them, use StandardOutput=journal+console to pipe google-*.service
output to the serial console as well.
flokli added a commit to flokli/nixpkgs that referenced this issue Sep 1, 2019
Since NixOS#61321, local-fs.target is
part of sysinit.target again, meaning units without
DefaultDependencies=no will automatically depend on it, and the manual
set dependencies can be dropped.
@Ma27
Copy link
Member

@Ma27 Ma27 commented Sep 10, 2019

@fpletz @andir Not too experienced with udev and systemd internals, but this seems to break when using a predictably named network interface during a stage-1 boot (in my case to provide SSH access in initrd in order to enter the LUKS passphrase remotely). Until now I've used the 80-net-setup-links.rules from nixpkgs as described in NixOS discourse for a predictably named network interface which is configured like this:

{
  boot.kernelParams = [
    "ip=<v4-addr>::<gateway>:255.255.0.0::ens3:none"
  ];
}

May I ask what's the recommended way to implement such a feature in NixOS 19.09? AFAICT only udev is started in stage 1, so there's no predicatably named network interface then. I also checked the 80-net-setup-links.rules from our systemd fork (https://github.com/NixOS/systemd/blob/nixos-v243/rules/80-net-setup-link.rules) which doesn't rename the eth0 interface in my case (not sure why though, sorry!).

This is why I decided to continue using the old udev rules from nixpkgs (https://github.com/NixOS/nixpkgs/blob/release-19.03/nixos/modules/services/hardware/80-net-setup-link.rules) for now as my network is configured properly in stage-1 then, but I'm not sure if that's a suitable solution :)

@Mic92
Copy link
Member

@Mic92 Mic92 commented Sep 11, 2019

I think this update might broke systemd --user for me. I am using lightdm as a login manager:

$ systemctl status --user
Failed to read server status: The name org.freedesktop.systemd1 was not provided by any .service files

@andir
Copy link
Member Author

@andir andir commented Sep 11, 2019

@Mic92
Copy link
Member

@Mic92 Mic92 commented Sep 11, 2019

I still see my systemd user session, both the process as well as the dbus session in busctl - systemctl currently does not connect to those.

UPDATE need to have a look at that later. Currently I am at work.

@andir
Copy link
Member Author

@andir andir commented Sep 11, 2019

dtzWill added a commit to dtzWill/nixpkgs that referenced this issue Sep 12, 2019
Since NixOS#61321, local-fs.target is
part of sysinit.target again, meaning units without
DefaultDependencies=no will automatically depend on it, and the manual
set dependencies can be dropped.

(cherry picked from commit f74735c)
nixos-discourse
Copy link

nixos-discourse commented on 1f03f6f Sep 16, 2019

This commit has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/predictable-network-interface-names-in-initrd/4055/1

@nixos-discourse
Copy link

@nixos-discourse nixos-discourse commented Sep 17, 2019

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/predictable-network-interface-names-in-initrd/4055/6

@arianvp
Copy link
Member

@arianvp arianvp commented Sep 17, 2019

I am having an issue with my systemd --user too, but slightly different. It is certainly running:

● user@0.service - User Manager for UID 0
   Loaded: loaded (/nix/store/niw0jbw29x0rg85m7z8j5gll16y37g5n-systemd-242/example/systemd/system/user@.service; static; vendor preset: enabled)
  Drop-In: /nix/store/kiiihixa3dcbb947sn4pwq5z90sksijn-system-units/user@.service.d
           └─overrides.conf
   Active: active (running) since Fri 2019-09-13 10:19:56 UTC; 4 days ago
     Docs: man:user@.service(5)
 Main PID: 25811 (systemd)
   Status: "Startup finished in 45ms."
       IP: 0B in, 0B out
    Tasks: 2
   Memory: 3.4M
      CPU: 113ms
   CGroup: /user.slice/user-0.slice/user@0.service
           └─init.scope
             ├─25811 /nix/store/4vw3gb6dk116y38vwgjv3ymq5mfdcfli-systemd-243/lib/systemd/systemd --user
             └─25812 (sd-pam)

Sep 13 10:19:56 arianvp.me systemd[1]: Starting User Manager for UID 0...
Sep 13 10:19:56 arianvp.me systemd[25811]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
Sep 13 10:19:56 arianvp.me systemd[25811]: Reached target Paths.
Sep 13 10:19:56 arianvp.me systemd[25811]: Reached target Sockets.
Sep 13 10:19:56 arianvp.me systemd[25811]: Reached target Timers.
Sep 13 10:19:56 arianvp.me systemd[25811]: Reached target Basic System.
Sep 13 10:19:56 arianvp.me systemd[1]: Started User Manager for UID 0.
Sep 13 10:19:56 arianvp.me systemd[25811]: Reached target Main User Target.
Sep 13 10:19:56 arianvp.me systemd[25811]: Startup finished in 45ms.

but it cannot connect to the bus:

[root@arianvp:~]# systemctl status --user
Failed to connect to bus: No such file or directory

and if I look at systemctl status it seems there is no dbus.service running for the systemd --user instance:
strace:

(...snip..)
openat(AT_FDCWD, "/run/current-system/sw/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=128690032, ...}) = 0
mmap(NULL, 128690032, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5331506000
close(3)                                = 0
openat(AT_FDCWD, "/proc/self/stat", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "24430 (systemctl) R 24427 24427 "..., 1024) = 320
read(3, "", 1024)                       = 0
close(3)                                = 0
prlimit64(0, RLIMIT_NOFILE, {rlim_cur=512*1024, rlim_max=512*1024}, NULL) = 0
rt_sigaction(SIGBUS, {sa_handler=0x7f53397a9920, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f53396ac860}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
newfstatat(AT_FDCWD, "/proc/1/root", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
getpid()                                = 24430
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
getsockopt(3, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
setsockopt(3, SOL_SOCKET, SO_RCVBUF, [8388608], 4) = 0
getsockopt(3, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
setsockopt(3, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0
connect(3, {sa_family=AF_UNIX, sun_path="/run/user/0/systemd/private"}, 30) = -1 ECONNREFUSED (Connection refused)
(..snip...)
connect(3, {sa_family=AF_UNIX, sun_path="/run/user/0/bus"}, 18) = -1 ENOENT (No such file or directory)
close(3)                                = 0
openat(AT_FDCWD, "/nix/store/6yaj6n8l925xxfbcd65gzqx3dz7idrnn-glibc-2.27/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=2997, ...}) = 0
read(3, "# Locale name alias data base.\n#"..., 4096) = 2997
read(3, "", 4096)                       = 0

writev(2, [{iov_base="Failed to connect to bus: No suc"..., iov_len=51}, {iov_base="\n", iov_len=1}], 2Failed to connect to bus: No such file or directory
) = 52
exit_group(1)                           = ?
+++ exited with 1 +++

Interestingly it also tries to connect to /run/user/0/systemd/private but fails. Whilst that file exists on my system

@arianvp
Copy link
Member

@arianvp arianvp commented Sep 19, 2019

FYI, rebooting my machine fixed this for me. It broke when doing nixos-rebuild switch between 239 -> 243 . We should probably tell people to reboot when going from 19.03 to 19.09

@andir
Copy link
Member Author

@andir andir commented Sep 19, 2019

@edolstra
Copy link
Member

@edolstra edolstra commented Sep 19, 2019

You're already required to reboot, but #68906 reverts that.

dtzWill added a commit to dtzWill/nixpkgs that referenced this issue Jan 21, 2020
With local-fs.target part of sysinit.target
(NixOS#61321), we don't need to add it
explicitly to certain units anymore, and can change dependencies like
they are in other distros (I picked from Google's official CentOS 7
image here).

Like them, use StandardOutput=journal+console to pipe google-*.service
output to the serial console as well.

(cherry picked from commit 106a1fe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment