-
-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos/slurm: rewrite module to RFC 0042 style settings #161815
base: master
Are you sure you want to change the base?
Conversation
aa15fd0
to
d3283e0
Compare
@GrahamcOfBorg test slurm |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/breaking-changes-announcement-for-unstable/17574/8 |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
b068173
to
007d239
Compare
Rebased for 22.11 release. |
So sorry this did not get any attention, can you fix the conflicts? I'm interested into getting this merged. Though, I do not have extensive experience with Slurm. |
@RaitoBezarius There seems to be a bug that causes an infinite recursion which still needs to be fixed. However, due to the lack of feedback I did not prioritize working on it lately. It would also be good to get some feedback on the "settings" conversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would advise keeping the legacy options for a release cycle, while deprecating them.
Then, we can proceed to remove them.
I don't know how many NixOS users of slurm are out there, but I think it would be annoying to find out about if some of them do not read the release notes.
''; | ||
}; | ||
|
||
controlMachine = mkOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO all the legacy options should be kept at least for one release cycle while putting a deprecation warning.
Then, we can all mkRemovedOption all of them after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to make a hard change here for the following reason. It may lead to confusion when both, legacy and settings
are used. Which one should be used if it affects one an the same setting? It would make the module more messy trying to handle both. The conversion of a config file should be rather straight forward.
We could use mkRenamedOptionModule
where applicable and mkRemovedOption
for the rest?
EDIT: I added this version now to the module to ease the transition.
cb49a3f
to
7fa2a11
Compare
@posch Are you using |
No, I'm not. I'm using a stand-alone slurm.conf. My Nixos config basically contains only the systemd services:
|
example = literalExpression '' | ||
settings = { | ||
SlurmctldHost = "control"; | ||
nodeName = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that Nix attribute sets seem to be the obvious choice for Node lines, but it looks a bit overengineered to me. I usually copy+paste Node lines from the output of slurmd -C:
# /.../slurmd -C
NodeName=... CPUs=192 Boards=1 SocketsPerBoard=4 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=772645
Is there a concrete advantage of having this split up and rewritten into an attribute set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantages that I see is that you can now (1) access the parameters for a node or a partition directly and use them in another place in your config (e.g. for CPUs or memory). (2) The attributes can be merged, when they are defined in separate locations. This is otherwise only possible if you define nodeName
, partitionName
separately (now also nodeSet
exists). (3) It makes the module easier to maintain and can automatically handle new features without the need to modify the module.
It may look over engineered at first glance, but I think it is more consistent to go all the way here.
dbdserver.storagePass and .configFile were already removed in NixOS-21.03. Add moved and removed options.
f144184
to
b68d61b
Compare
b68d61b
to
39f699a
Compare
@infinisil @RaitoBezarius Are good to go, or do you have other comments? |
ProctrackType=${cfg.procTrackType} | ||
${cfg.extraConfig} | ||
''; | ||
configFile = settingsFormat.generate "slurm.conf" cfg.settings; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ease transition and offer escape hatches (as I know of one complicated NixOS module where it would have been great to have this), I wish this could be an actual option overrideable by the user who could just bypass settings
and write the configuration file itself and the settings would be the defaults.
Plus, an assertion could be done to prevent settings & config file being written.
Once this is done, I think it would be fine to merge it. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the change, but it was not what I suggested — adding a extraConfig
is not the same as adding an escape hatch that replaces the whole configFile
and is an acceptable alternative to RFC42 rather than stringly-typed extraConfig
IMHO (because the user can plug his generator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I put the extraConfig
back in. Its content now gets appended. You can now override settings
completely by just setting it to an empty set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you a lot for your patience!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I see what you mean now. Maybe as it is right now, it is a good solution? Using an alternative configFile
adds more complexity as we need to define StateSaveLocation
and SlurmSpoolDir
separately, since they are needed by the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more or less fine, the only problem with the current solution is that it's not clear we can sidestep completely all the settings, but I think it's fine for now.
@ofborg test slurm |
Motivation for this change
Use free-form
settings
following RFC 0042 to generate config files. This should make the module more flexible, but it also is a breaking change, which requires users to adapt their config. In either case, it should make the module easier to maintain.See also #144575
Things done
slurm.conf
,slurmdbd.cong
, andcgroup.conf
SLURM_CONF
to all binaries. It is now exported directly as a system-wide environment variable.sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)nixos/doc/manual/md-to-db.sh
to update generated release notes