Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use Troika directives in TroikaHost (#31) #34

Merged
merged 9 commits into from
Jul 19, 2024

Conversation

oiffrig
Copy link
Collaborator

@oiffrig oiffrig commented Nov 21, 2023

(#31) Converts the Slurm directives in TroikaHost to Troika directives

WIP: directives kept as plain Slurm for now:

  • distribution, reservation are supported from Troika 0.2.2 only, support disabled for now
  • tmpdir, sthost are ECMWF-specific, and should be in the local Troika configuration rather than here
  • hint is not supported by Troika in its most general form, is it needed past --[no]multithread?

@FussyDuck
Copy link

FussyDuck commented Nov 21, 2023

CLA assistant check
All committers have signed the CLA.

pyflow/host.py Outdated
@@ -1014,35 +1014,70 @@ def script_submit_arguments(self, submit_arguments):
"""
Accepted submit arguments:
"""

TROIKA_0_2_2 = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal of this? To set it to True once Troika 0.2.2 is deployed on the HPC? Shouldn't we put this in the constructor to allow users to change it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was the idea. Not sure how best to implement it, to be honest. The nicest would be to wait till Troika 0.2.2 is widely adopted, but if we want to go forward we will need a way for the user to set this flag indeed.

pyflow/host.py Outdated
}

slurm_resources = {
"tmpdir": "--gres=ssdtmp:", # XXX: ECMWF-specific
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be in the troika config? We would then rely here on the troika tmdir_size definition instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should. I left it here for backwards compatibility, but I would very much prefer using the Troika directive with an overload in the local config.

@corentincarton
Copy link
Collaborator

@oiffrig, I just had a chat with some colleagues from production who are running on LUMI and they need new options like "#SBATCH --partition=standard-g" and "#SBATCH --exclusive". At the moment, we always need to add manually the new options in pyflow, which is not very flexible. Couldn't we have a special key in the submit arguments (user_defined for instance) that would give a list of directives. For instance:

execution_contexts:
    parallel:
        nodes: 40
        tasks_per_node: 8 
        user_defined:
             - "#SBATCH --exclusive"
             - "#SBATCH --partition=standard-g"
             - "#TROIKA --my_troika_option=values"

It's the first design that comes out of my mind so happy to discuss if you have a better idea!

@oiffrig
Copy link
Collaborator Author

oiffrig commented Dec 11, 2023

Could be an option, I don't have an issue with that. What I'm doing in Troika is allowing to configure a mapping that extends/overrides the set of directives. We could have something similar in pyflow, although such a mapping is very much host-dependent.

@oiffrig
Copy link
Collaborator Author

oiffrig commented Dec 11, 2023

Also, Troika supports #TROIKA partition=foo already, I just need to add it to the TroikaHost

@corentincarton
Copy link
Collaborator

That's a good point.. maybe we could extend the mapping in the constructor. We could then have a special configuration in wellies for atos or lumi for instance.

@oiffrig
Copy link
Collaborator Author

oiffrig commented Dec 12, 2023

I was wondering, do you think it would be useful to pass all submit arguments to Troika, keeping the overrides where necessary? That way, if somebody wants to use a new Troika directive, they will not be limited by pyflow supporting it. We can easily keep the translation where needed.

@corentincarton
Copy link
Collaborator

corentincarton commented Dec 12, 2023

I'm just wondering if we shouldn't just drop the translation and throw an error to force people to use the troika directives instead... It is the TroikaHost after all!
EDIT: OK, maybe not an error but a deprecated warning ;)

@corentincarton corentincarton marked this pull request as ready for review July 12, 2024 09:26
@corentincarton corentincarton merged commit 0adbcb3 into develop Jul 19, 2024
5 checks passed
@corentincarton corentincarton deleted the feature/troika-directives branch July 19, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants