Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mdadm monitor service #123466

Merged
merged 1 commit into from Sep 10, 2023

Conversation

ctheune
Copy link
Contributor

@ctheune ctheune commented May 18, 2021

Motivation for this change

In #72394 we see a situation where enabling the built-in systemd units for mdadm caused issues for users. I have reviewed the discussion in the thread and am providing a fix that

a) works out of the box without failing units
b) allows to explicitly enable the mdadm monitor service, and
c) takes care to remind the admin to provide an email address.

This can likely be improved further inthe future but should be sufficient to mitigate #72394 and work as a reliable starting point.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • [] macOS
    • [] other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • [] Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@ctheune ctheune changed the title Work on fix for #72394 Fix mdadm monitor service May 18, 2021
@ctheune ctheune marked this pull request as draft May 18, 2021 08:34
@ofborg ofborg bot requested a review from Ekleog May 18, 2021 08:43
@azazel75
Copy link
Contributor

Nice fixup, seems reasonable to me. Thanks @ctheune

Copy link
Member

@aanderse aanderse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this.

I feel like this PR is throwing the baby out with the bathwater on the package changes. I think the upstream systemd unit provides us what we need via MDADM_MONITOR_ARGS, we just need to take advantage of it.

That being said I have not actually tested my suggestions, so I suppose if you entirely ignore them you don't need to feel guilty...


services.udev.packages = [ pkgs.mdadm ];

systemd.packages = [ pkgs.mdadm ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line does nothing after your change to the package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I thought I just undid the change that introduced the systemd units and didn't check back on whether that change was related to changes in swraid.nix at that time. I'll double-check this.

@@ -12,15 +12,12 @@ stdenv.mkDerivation rec {

makeFlags = [
"NIXOS=1" "INSTALL=install" "BINDIR=$(out)/sbin"
"SYSTEMD_DIR=$(out)/lib/systemd/system"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes all of the following:

├── lib
│   ├── systemd
│   │   ├── system
│   │   │   ├── mdadm-grow-continue@.service
│   │   │   ├── mdadm-last-resort@.service
│   │   │   ├── mdadm-last-resort@.timer
│   │   │   ├── mdmonitor.service
│   │   │   └── mdmon@.service
│   │   └── system-shutdown
│   │       └── mdadm.shutdown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Unfortunately the original change was kind of opaque and didn't have any tests either - so I'm not really sure whether those are doing anything helpful anyway ... (see below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmap Would you be available to comment on this topic, as you added this line? My guess would be it might be useful for running Nixpkgs' mdadm on a non-NixOS machine, but not using one myself I couldn't say for sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we should leave this in and then override the upstream units where necessary. I have already pointed out in another comment how this can be achieved... Let me know if any help is needed.

Comment on lines 41 to 103
systemd.services.mdadm-monitor = {
description = "Monitor RAID disks";
wantedBy = [ "multi-user.target" ];
script = "${pkgs.mdadm}/bin/mdadm --monitor -m ${assert cfg.adminAddr != null; cfg.adminAddr} --scan";
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it have worked equally well to not remove the systemd units from the package, and instead use this:

systemd.services.mdadm-monitor = {
  wantedBy = [ "multi-user.target" ];
  environment.MDADM_MONITOR_ARGS = "-m ${assert cfg.adminAddr != null; cfg.adminAddr} --scan";
};

Additionally we could create an option like systemd.services.monitor.extraArgs and rewrite the above environment directive as such:

environment.MDADM_MONITOR_ARGS = "-m ${assert cfg.adminAddr != null; cfg.adminAddr} --scan " + (lib.concatMapStringsSep " " cfg.monitor.extraArgs);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a possibility, yes. I was under the (probably wrong) impression that the original change that introduced the units didn't really do anything helpful. I'd be happy to do it this way as well. I'm planning to consider adding more tests for the things that we expect to happen so we don't have to scratch our heads in the future. This may likely need a few days until I get around to adding more effort here, but I'll get to it. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be awesome if you could add tests indeed! That said if you don't have the time, I think this PR already makes things better so could probably land :)

systemd.services.mdadm-monitor = {
description = "Monitor RAID disks";
wantedBy = [ "multi-user.target" ];
script = "${pkgs.mdadm}/bin/mdadm --monitor -m ${assert cfg.adminAddr != null; cfg.adminAddr} --scan";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use a NixOS assertion rather than a raw assert, as it makes the user experience better. You can find more information about NixOS assertions here: https://nixos.org/manual/nixos/stable/index.html#sec-assertions

@@ -12,15 +12,12 @@ stdenv.mkDerivation rec {

makeFlags = [
"NIXOS=1" "INSTALL=install" "BINDIR=$(out)/sbin"
"SYSTEMD_DIR=$(out)/lib/systemd/system"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmap Would you be available to comment on this topic, as you added this line? My guess would be it might be useful for running Nixpkgs' mdadm on a non-NixOS machine, but not using one myself I couldn't say for sure.

Comment on lines 41 to 103
systemd.services.mdadm-monitor = {
description = "Monitor RAID disks";
wantedBy = [ "multi-user.target" ];
script = "${pkgs.mdadm}/bin/mdadm --monitor -m ${assert cfg.adminAddr != null; cfg.adminAddr} --scan";
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be awesome if you could add tests indeed! That said if you don't have the time, I think this PR already makes things better so could probably land :)

@devhell
Copy link
Contributor

devhell commented Jun 30, 2021

This looks awesome, can't wait for it to be merged. Thank you for the hard work @ctheune!

@stale
Copy link

stale bot commented Jan 6, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 6, 2022
@ctheune
Copy link
Contributor Author

ctheune commented Sep 18, 2022

I just stumbled over this again on my personal server and wondered whether we should reactivate this as it was so close to being merged anyway? I'd be happy update the PR.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 18, 2022
@azazel75
Copy link
Contributor

azazel75 commented Sep 19, 2022 via email

@ctheune ctheune force-pushed the 72394-cleanup-mdmonitor-service branch from fe49099 to a534b4c Compare September 9, 2023 19:38
@ofborg ofborg bot requested a review from Ekleog September 9, 2023 20:27
@ctheune ctheune force-pushed the 72394-cleanup-mdmonitor-service branch 2 times, most recently from b7e8c2a to 67a110f Compare September 10, 2023 13:01
@ctheune ctheune marked this pull request as ready for review September 10, 2023 13:02
@ctheune
Copy link
Contributor Author

ctheune commented Sep 10, 2023

@GrahamcOfBorg test systemd-initrd-swraid

@ctheune ctheune force-pushed the 72394-cleanup-mdmonitor-service branch from 67a110f to 380d73c Compare September 10, 2023 13:20
@ctheune
Copy link
Contributor Author

ctheune commented Sep 10, 2023

@GrahamcOfBorg test systemd-initrd-swraid

@mweinelt mweinelt merged commit 7f341bb into NixOS:master Sep 10, 2023
8 of 10 checks passed
@trofi
Copy link
Contributor

trofi commented Sep 10, 2023

I don't knowingly use mdadm on my system (default desktop). Eval started failing on master as:

$ nix build --no-link -f nixos system --keep-going
trace: warning: mdadm: Neither MAILADDR nor PROGRAM has been set. This will cause the `mdmon` service to crash.
error:
       … while calling the 'derivationStrict' builtin

         at /builtin/derivation.nix:9:12: (source not available)

       … while evaluating derivation 'nixos-system-nz-23.11pre-git'
         whose name attribute is located at /mnt/archive/big/git/nixpkgs-master/pkgs/stdenv/generic/make-derivation.nix:300:7

       … while evaluating attribute 'buildCommand' of derivation 'nixos-system-nz-23.11pre-git'

         at /mnt/archive/big/git/nixpkgs-master/nixos/modules/system/activation/top-level.nix:53:5:

           52|     passAsFile = [ "extraDependencies" ];
           53|     buildCommand = systemBuilder;
             |     ^
           54|

       (stack trace truncated; use '--show-trace' to show the full trace)

       error: The option `boot.initrd.extraFiles."/etc/mdadm.conf".enable' does not exist. Definition values:
       - In `/mnt/archive/big/git/nixpkgs-master/nixos/modules/tasks/swraid.nix': true

@@ -59,12 +67,10 @@ in {
$out/bin/mdadm --version
'';

extraFiles."/etc/mdadm.conf".source = pkgs.writeText "mdadm.conf" config.boot.swraid.mdadmConf;
extraFiles."/etc/mdadm.conf" = mdadm_conf;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe because of this. extraFiles."/etc/mdadm.conf".source = mdadm_conf.source; could fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. However, I can't figure out why and I would like to adjust the test so that it doesn't happen again, because we explicitly thought this would work and the tests and my live system agreed with me.

Even when I explicitly set environment.etc."mdadm.conf".enable = true the test doesn't break and I also do not even see why it should, as nothing iterates over the superfluous attributes.

@ajs124
Copy link
Member

ajs124 commented Sep 10, 2023

see #254429 @trofi

@trofi
Copy link
Contributor

trofi commented Sep 10, 2023

see #254429 @trofi

This fixes the eval for me. Thank you!

The only annoyance is that I'm still getting the following warning at eval time: "trace: warning: mdadm: Neither MAILADDR nor PROGRAM has been set. This will cause the mdmon service to crash."

@ajs124
Copy link
Member

ajs124 commented Sep 10, 2023

It's interesting that you're getting this, with your normal system config.
That means something sets boot.swraid.enable = true;, in your config. e.g. if I'm evaluating nixosTests.simple it doesn't print the warning.

There's only one file in nixpkgs that I can find that would do that, which is modules/profiles/installation-device.nix.

@trofi
Copy link
Contributor

trofi commented Sep 10, 2023

Ah, that's probably because I'm using system with an old state version: system.stateVersion = "21.11"; which defaults to swraid.enable in a roundabout way:

# nixos/modules/tasks/swraid.nix
  options.boot.swraid = {
    enable = lib.mkEnableOption (lib.mdDoc "swraid support using mdadm") // {
      default = lib.versionOlder config.system.stateVersion "23.11";
# ...

@ajs124
Copy link
Member

ajs124 commented Sep 10, 2023

Good point… maybe printing this warning (by default) on every system with a stateVersion < 23.11 isn't that great 😬

@johnrichardrinehart
Copy link
Contributor

johnrichardrinehart commented Sep 11, 2023

Ah, that's probably because I'm using system with an old state version: system.stateVersion = "21.11"; which defaults to swraid.enable in a roundabout way:

# nixos/modules/tasks/swraid.nix
  options.boot.swraid = {
    enable = lib.mkEnableOption (lib.mdDoc "swraid support using mdadm") // {
      default = lib.versionOlder config.system.stateVersion "23.11";
# ...

Wow, thanks. I ran into this today, too.

warning: Git tree '/home/john/code/repos/mine/JohnOS' is dirty
building the system configuration...
warning: Git tree '/home/john/code/repos/mine/JohnOS' is dirty
trace: warning: mdadm: Neither MAILADDR nor PROGRAM has been set. This will cause the `mdmon` service to crash.
error:
       … while calling the 'derivationStrict' builtin

         at //builtin/derivation.nix:9:12: (source not available)

       … while evaluating derivation 'nixos-system-framie-23.11.20230911.ca40349'
         whose name attribute is located at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/pkgs/stdenv/generic/make-derivation.nix:300:7

       … while evaluating attribute 'buildCommand' of derivation 'nixos-system-framie-23.11.20230911.ca40349'

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/nixos/modules/system/activation/top-level.nix:53:5:

           52|     passAsFile = [ "extraDependencies" ];
           53|     buildCommand = systemBuilder;
             |     ^
           54|while calling 'g'

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/attrsets.nix:599:19:

          598|           g =
          599|             name: value:
             |                   ^
          600|             if isAttrs value && cond value

       … from call site

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/attrsets.nix:602:20:

          601|               then recurse (path ++ [name]) value
          602|               else f (path ++ [name]) value;
             |                    ^
          603|         in mapAttrs g;while calling anonymous lambda

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/modules.nix:242:72:

          241|           # For definitions that have an associated option
          242|           declaredConfig = mapAttrsRecursiveCond (v: ! isOption v) (_: v: v.value) options;
             |                                                                        ^
          243|while evaluating the option `system.systemBuilderCommands':

       … while calling anonymous lambda

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/modules.nix:815:28:

          814|         # Process mkMerge and mkIf properties.
          815|         defs' = concatMap (m:
             |                            ^
          816|           map (value: { inherit (m) file; inherit value; }) (builtins.addErrorContext "while evaluating definitions from `${m.file}':" (dischargeProperties m.value))

       … while evaluating definitions from `/nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/nixos/modules/system/boot/kernel.nix':

       … from call site

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/modules.nix:816:137:

          815|         defs' = concatMap (m:
          816|           map (value: { inherit (m) file; inherit value; }) (builtins.addErrorContext "while evaluating definitions from `${m.file}':" (dischargeProperties m.value))
             |                                                                                                                                         ^
          817|         ) defs;

       … while calling 'dischargeProperties'

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/modules.nix:887:25:

          886|   */
          887|   dischargeProperties = def:
             |                         ^
          888|     if def._type or "" == "merge" then

       … from call site

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/modules.nix:893:11:

          892|         if def.condition then
          893|           dischargeProperties def.content
             |           ^
          894|         else

       … while calling 'dischargeProperties'

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/modules.nix:887:25:

          886|   */
          887|   dischargeProperties = def:
             |                         ^
          888|     if def._type or "" == "merge" then

       … while evaluating derivation 'initrd-linux-6.5.2'
         whose name attribute is located at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/pkgs/stdenv/generic/make-derivation.nix:300:7

       … while evaluating attribute 'exportReferencesGraph' of derivation 'initrd-linux-6.5.2'

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/pkgs/build-support/kernel/make-initrd.nix:107:3:

          106|   # See #36268.
          107|   exportReferencesGraph =
             |   ^
          108|     lib.zipListsWith

       … while calling anonymous lambda

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/types.nix:541:22:

          540|       merge = loc: defs:
          541|         mapAttrs (n: v: v.value) (filterAttrs (n: v: v ? value) (zipAttrsWith (name: defs:
             |                      ^
          542|             (mergeDefinitions (loc ++ [name]) elemType defs).optionalValue

       … from call site

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/modules.nix:837:59:

          836|       if isDefined then
          837|         if all (def: type.check def.value) defsFinal then type.merge loc defsFinal
             |                                                           ^
          838|         else let allInvalid = filter (def: ! type.check def.value) defsFinal;

       … while calling 'merge'

         at /nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/lib/types.nix:762:22:

          761|         check = x: isAttrs x || isFunction x || path.check x;
          762|         merge = loc: defs:
             |                      ^
          763|           (base.extendModules {

       error: The option `boot.initrd.extraFiles."/etc/mdadm.conf".enable' does not exist. Definition values:
       - In `/nix/store/j6d62blp6npckwcj6cjxf0hyga4w94c1-source/nixos/modules/tasks/swraid.nix': true

Manually editing system.stateVersion to be 23.11 (so, ignoring the scary "Did you read the comment?" message) removed the warning and fixed my build.

@vcunat
Copy link
Member

vcunat commented Sep 11, 2023

nixpkgs-unstable channel seems to be blocked on this, BTW.
https://hydra.nixos.org/build/234796902/nixlog/1/tail

EDIT: oh, also nixos-unstable's jobset won't even evaluate.

@ctheune
Copy link
Contributor Author

ctheune commented Sep 11, 2023

@vcunat yeah, I've already prepared #254429 that fixes this and it's waiting to be merged.

@ncfavier
Copy link
Member

ncfavier commented Oct 2, 2023

The default ISO image configuration now raises a warning.

@SuperSandro2000
Copy link
Member

Maybe we should add an option to disable the mon service sending mail for the iso.

@ctheune
Copy link
Contributor Author

ctheune commented Nov 15, 2023

Maybe we should add an option to disable the mon service sending mail for the iso.

The ISO can simply set a non-sensical e-mail-adress like "nobody@example.com".

Disabling individual services is really a fight against windmills - we'd have to talk to upstream about that as they enable/disable services automatically depending on various udev events...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet