Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd: 239 -> 242 #61321

Merged
merged 15 commits into from Jun 3, 2019

Conversation

@andir
Copy link
Member

commented May 11, 2019

Motivation for this change

This is a follow-up to #56184.

As explained in #56184 (comment) I rebased @Mic92's attempt for systemd version 242 and added the two patches that I wrote last year.

The current head of my systemd fork is at https://github.com/andir/systemd/commits/nixos-v242. I had to add one additional patch to make udev rules work again (andir/systemd@7283141).

While running all the NixOS release.nix tests I discovered a few issues that required changes to our modules. Those are included in this PR.

From all the failing tests only the mysqlReplication test is seemingly relevant. It requires some refactoring in order to work on v242. The failure log can be seen here: https://gist.github.com/a46e172235ae9066afd758cbf9e86564 It would be great if someone with knowledge in the area could have a look. It was broken before. Nothing we've to take care of.

I am in the process of switching my notebook to this branch to see if anything that we aren't testing is obviously broken.

Feedback from others is highly appreciated.

My private hydra build of release.nix: https://hydra.h4ck.space/eval/155 (ipv6 only)

If we like these changes it would be nice if someone with access to https://github.com/nixos/systemd could push my systemd changes to the nixos-v242 branch.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nix-review --run "nix-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Assured whether relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@flokli flokli referenced this pull request May 11, 2019
0 of 10 tasks complete
@@ -282,6 +278,8 @@ in

services.udev.path = [ pkgs.coreutils pkgs.gnused pkgs.gnugrep pkgs.utillinux udev ];

boot.kernelParams = mkIf (!config.networking.usePredictableInterfaceNames) [ "net.ifnames=0" ];

This comment has been minimized.

Copy link
@andir

andir May 11, 2019

Author Member

Does this work for all our cases? Previously we did exclude a udev rule. That could have affected containers and similar constructs. Now we only affect "real" systems.

This comment has been minimized.

Copy link
@fpletz

fpletz May 17, 2019

Member

tl;dr I don't think this should affect containers or custom network namespace applications.

As I understand it, the interface renaming should only effectively applied to physical interfaces (or if the kernel declares it as predictable) because the physical location is coded into the interface name (see NamePolicy in systemd.link(5) manpage). When creating virtual netdevs like vlans or bridges a custom name has to be supplied anyway. When moving physical interfaces to containers the interface has already been renamed on the host.

@@ -59,7 +59,14 @@ in
in {
DHCP = override (dhcpStr cfg.useDHCP);
} // optionalAttrs (gateway != [ ]) {
gateway = override gateway;
routes = override [

This comment has been minimized.

Copy link
@andir

andir May 11, 2019

Author Member

I am not convinced this is the best way to solve this right now.

The issue was that we would add onlink routes for all network devices on the system. Introduced in systemd/systemd@4912ab7 systemd would set the onlink attribute for all routes that didn't come with static addresses.

This comment has been minimized.

Copy link
@fpletz

fpletz May 17, 2019

Member

It's certainly not ideal but then again adding the default gateway to every interface is FUBAR anyway. This fix is IMHO fine until we finally deprecate networking.defaultGateway and move it to the per interface configurations.

This comment has been minimized.

Copy link
@andir

andir May 17, 2019

Author Member

👍 I created an issue for this (#61629) and also added it to our systemd project board https://github.com/NixOS/nixpkgs/projects/22. (I am trying to populate that more with all the things we come across wile working on this PR).

@petabyteboy

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

I'm testing this now

@andir

This comment has been minimized.

Copy link
Member Author

commented May 12, 2019

@petabyteboy

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

Yup, saw that too late, this is what it looked like:
https://termbin.com/scs1
After a reboot everything seems to work fine until now. I will report here if I find any issues.

@flokli

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

we should probably also check why the failure with #56265 (comment) can't be reproduced anymore, and drop the systemd patch referenced there, if it's not effective anymore.

@flokli

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

I figured out why we can't reproduce the error anymore - see explanation in #56265 (comment).

@andir, could you incorporate flokli@92600a9 into this PR, and drop andir/systemd@51077a9 from your systemd branch?

I'll try having a look at how to best fix nixops send-keys.

@Mic92

This comment has been minimized.

Copy link
Contributor

commented May 13, 2019

@andir loosing control as in no longer being able to contact via dbus?

@petabyteboy

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

Not 100% certain this is related
After switching to a new system configuration, systemd-timesyncd started crashing:

May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /nix/store/5rnzb2cr3vw536px3p156d1g3arf18v0-systemd-242/lib/systemd/systemd-timesyncd: Not a directory

/var/lib exists and is a directory.

@andir

This comment has been minimized.

Copy link
Member Author

commented May 14, 2019

@andir loosing control as in no longer being able to contact via dbus?

Yes, the output looked very similar to #61321 (comment).

Not 100% certain this is related
After switching to a new system configuration, systemd-timesyncd started crashing:

May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed to set up special execution directory in /var/lib: Not a directory
May 14 11:47:12 tachyon.pbb.lc systemd[21126]: systemd-timesyncd.service: Failed at step STATE_DIRECTORY spawning /nix/store/5rnzb2cr3vw536px3p156d1g3arf18v0-systemd-242/lib/systemd/systemd-timesyncd: Not a directory

/var/lib exists and is a directory.

I am having the same. It seems like the daemon never came up after switching to the new systemd version. I have some idea what might cause this and will debug this tonight.

@petabyteboy

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

The timesyncd issue has been reported for Arch-based distros too:
systemd/systemd#12131

@nixos-discourse

This comment has been minimized.

Copy link

commented May 14, 2019

This pull request has been mentioned on Nix community. There might be relevant details there:

https://discourse.nixos.org/t/what-are-your-goals-for-19-09/2875/22

@flokli

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019

There's some more WIP patches over at https://github.com/flokli/nixpkgs/commits/systemd-v242-fk (not merged in yet here).

Planning to add some more tests for LVM and crypted volumes, plus testing of the send-keys functionality, to ensure it doesn't break.

@andir

This comment has been minimized.

Copy link
Member Author

commented May 14, 2019

@andir andir force-pushed the andir:nixos-v242 branch from 65541a7 to e5a5d08 May 15, 2019

@andir

This comment has been minimized.

Copy link
Member Author

commented May 15, 2019

I dropped the commit @flokli mentioned in #61321 (comment). Rerunning all the tests 🎉

fpletz and others added some commits May 17, 2019

systemd: disable building tests
We are currently not running any tests but building them takes
signitifcant amounts of time since they account to about 40% of all the
compilation targets.
systemd: remove references to $out/lib/systemd/catalog
On aarch64 we "leak" a reference to $out/lib/systemd/catalog in the lib
output. The result of that is a dependency cycle between $out and $lib.
Thus nix (rightfully) marks the build as failed. That reference
originates from an array of strings (catalog_file_dirs) in systemd
(src/src/journal/catalog.{c,h}).  The only consumer (as of v242) of the
symbol is the main function of journalctl.  Still libsystemd.so contains
the VALUE but not the symbol.  Systemd seems to be properly using
function & data sections together with the linker flags to garbage
collect unused sections (-Wl,--gc-sections).  For unknown reasons those
flags do not eliminate the unused string constants, in this case on
aarch64-linux. The hacky way is to just remove the reference after we
finished compiling.  Since it can not be used (there is no symbol to
actually refer to it) there should not be any harm.  It is a bit odd and
I really do not like starting these kind of hacks but there doesn't seem
to be a straight forward way at this point in time.

The reference will be replaced by the same reference the usual nukeRefs
tooling uses.  The standard tooling can not / should not be uesd since
it is a bit too excessive and could potentially do us some (more) harm.
nixos/misc: warn when someone is using the nixops autoLuks module
The autoLuks module is not really compatible with the updated systemd
version anymore. We started dropping NixOS specific patches that caused
unwanted side effects that we had to work around otherwise.

This change points users towards the relevant PR and spits out a bit of
information on how to deal with the situation.
nixos/test: remove the stateVersion statement from the test-instrumen…
…tation

We set stateVersion to `mkDefault 18.03` in
`nixos/modules/testing/test-instrumentation.nix` and in
`modules/installer/cd-dvd/installation-cd-base.nix`.

Accessing the stateVersion in the module system from within the tests
results in the following error:
> The unique option `system.stateVersion' is defined multiple times, in
> `nixpkgs/nixos/modules/installer/cd-dvd/installation-cd-base.nix' and
> `nixpkgs/nixos/modules/testing/test-instrumentation.nix'.

There are other tests that use it as well. Namely the radicale test also
verifies behaviour between state versions is as expected. It switches a
package default value. Others switched on the state directory default.
It seems like having the timesyncd switch as part of every rendered
activationScript might cause this weird error.

Removing this line seems like a reasonable thing to do since we actually
set the default to the very same value in the module system. This line
should have been no-op besides the issue that we've two statements
setting it in this very specific case.

@andir andir force-pushed the andir:nixos-v242 branch from 6fb4061 to 7508490 Jun 3, 2019

@andir andir changed the base branch from master to staging Jun 3, 2019

@andir

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

Rebased onto staging so we can proceed with merging it.

@flokli flokli merged commit 2812b5c into NixOS:staging Jun 3, 2019

15 of 16 checks passed

multipath-tools, systemd on x86_64-darwin No attempt
Details
Evaluation Performance Report Evaluator Performance Report
Details
grahamcofborg-eval ^.^!
Details
grahamcofborg-eval-check-maintainers matching changed paths to changed attrs...
Details
grahamcofborg-eval-check-meta config.nix: checkMeta = true
Details
grahamcofborg-eval-darwin nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A darwin-tested
Details
grahamcofborg-eval-nixos nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release-combined.nix -A tested
Details
grahamcofborg-eval-nixos-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release.nix -A manual
Details
grahamcofborg-eval-nixos-options nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./nixos/release.nix -A options
Details
grahamcofborg-eval-nixpkgs-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A manual
Details
grahamcofborg-eval-nixpkgs-tarball nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A tarball
Details
grahamcofborg-eval-nixpkgs-unstable-jobset nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="ofborg"; } ./pkgs/top-level/release.nix -A unstable
Details
grahamcofborg-eval-package-list nix-env -qa --json --file .
Details
grahamcofborg-eval-package-list-no-aliases nix-env -qa --json --file . --arg config { allowAliases = false; }
Details
multipath-tools, systemd on aarch64-linux Success
Details
multipath-tools, systemd on x86_64-linux Success
Details
@aristidb aristidb referenced this pull request Jul 23, 2019
2 of 8 tasks complete
# The interface version prevents NixOS from switching to an
# incompatible systemd at runtime. (Switching across reboots is
# fine, of course.) It should be increased whenever systemd changes
# in a backwards-incompatible way. If the interface version of two
# systemd builds is the same, then we can switch between them at
# runtime; otherwise we can't and we need to reboot.
passthru.interfaceVersion = 2;
passthru.interfaceVersion = 3;

This comment has been minimized.

Copy link
@edolstra

edolstra Jul 31, 2019

Member

What's the reason for bumping interfaceVersion? This is really undesirable because it requires every user to reboot.

This comment has been minimized.

Copy link
@andir

andir Jul 31, 2019

Author Member

After re-exec the userspace was no longer able to talk to the daemon.

See #61321 (comment) & #61321 (comment). Unfortunately the paste did expire :/

If we come up with a better way to handle these kinds of scenarios that would be great.

Rebooting every once in a while after your pid 1 did a major version change doesn't sound unreasonable to me.

This comment has been minimized.

Copy link
@edolstra

edolstra Jul 31, 2019

Member

Sounds like an upstream systemd bug that should be reported there. AFAIK systemd is supposed to be able to re-exec itself across version changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.