-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward arbitrary environment variables over SSH #5709
Conversation
Allow users to specify environment variables that get forwarded from the submit host to the scheduler and task submission. Define variables to forward in `global.cylc` like: ``` [platforms] [[localhost]] ssh forward environment variables = PROJECT, LUSTRE_DISK ``` This will add `PROJECT` and `LUSTRE_DISK` to the list of variables exported in SSH commands to launch the scheduler on remote hosts (if they have been set in the current environment). Once they are available on the scheduler they can further be forwarded to other platforms, where they may interact with the scheduler to set a default project code or be used to set storage paths: ``` [platforms] [[mtu]] ssh forward environment variables = PROJECT, LUSTRE_DISK install target = mtu [install] [[symlink dirs]] [[[mtu]]] share = $LUSTRE_DISK/$PROJECT/$USER ```
Thanks @ScottWales ! This is needed for "project-dir" symlinking if job platforms don't share the local FS (which happily is not the case at my site, but I was aware it would be a problem some places) |
This branch works for run-directory symlinking on job hosts, if global symlink config is expressed in terms of raw environment variables that will be evaluated on the remote end during job host init. (i.e. the first time a task runs on the job platform). However, we don't actually need that, because we can use Jinja2 in global config to evaluate the (e.g.) PROJECT variable on the scheduler run host and send the result over, rather than sending the local value over and evaluating the variable remotely. Further, I'm pretty sure your branch doesn't do this bit:
because the ssh command for re-invocation of We do need forwarding to the scheduler run host (the bit that I think doesn't work on this branch) IF it is just a transient environment variable on the login host where the user types Sorry if I've caused some confusion by not thinking this through clearly myself! |
Ignoring scheduler run hosts for the moment (imagine we run the scheduler on the login host), consider symlinking run directories to [install]
[[symlink dirs]]
[[[remote]]] # where "remote" is the intall target for your job platform
run = "$LUSTRE_DISK/$PROJECT" If I do it like this, the literal string But that's unnecessary because we can do this instead: #!Jinja2
[install]
[[symlink dirs]]
[[[remote]]] # where "remote" is the intall target for your job platform
run = "$LUSTRE_DISK/{{ environ['PROJECT'] }}" This Jinja2 gets processed when the global config is parsed by the scheduler, so if PROJECT evaluates |
Now I'll put scheduler run hosts back on the table. In this case, if Alternatively a pre-configure plugin could (I think) read PROJECT from However, I think we need to support forwarding to run hosts anyway, because if workflow-specific symlinking is needed then setting PROJECT on the fly (rather than in a source file) would be entirely reasonable. But we'll need to shift your new global config settings, to here (not under [scheduler]
[[run hosts]]
# environment variables to forward from login host to scheduler run hosts
ssh forward environment variables = PROJECT, ... |
So, do you want to rejig this branch a bit? (change the location of the new config item, and move the loading of it to the ssh command that gets used for |
And we're going to need to document this well, so that users (or site admins at least) don't have to figure it out the hard way. |
Thanks @hjoliver. I did also want to have variables forwarded to the job host to be picked up by qsub for setting up accounting defaults, but that can also be handled through directives and it's probably better to be explicit as you say.
The cylc play re-invocation appears to be using the 'localhost' platform at Lines 438 to 443 in 2536745
Could you please clarify if you're meaning something different? I'm happy to move the configuration option to under I agree on the documentation once the change is ironed out. |
dde4bb9
to
9707f0a
Compare
OK interesting, I hadn't appreciated that you can set PBS options via environment rather than job script directives. If that is something you might want to do, we'll need to think more carefully about where to put the new config item.
Ah, do you mean it was picking up the ssh forward variables from the localhost platform, for reinvoking If so, that's a "platforms" detail I had forgotten or missed. @wxtim - should command invocation on run hosts be using localhost platforms config, if run hosts are not also job hosts? |
Looking at the code, I think the only bits of platform information used by the re-invocation (if host is given) are
I think that run hosts should be similar enough to localhost that this is correct? |
@hjoliver, these days the scheduler host uses the config from See: |
Roger that. Sorry if I put you wrong there @ScottWales Still, @oliver-sanders , we have the scheduler So I think we need to definitively decide whether these variables need to be forward to job host as well. From @ScottWales :
I'm not currently a PBS user. Can you set PBS directives via the environment? And if so, do we need to support that, or should we recommend |
I am happy for this to be strictly a run hosts thing, and set any PBS flags explicitly via directives. I'm not aware of any settings that have to come from the environment instead of directives, the environment just gives defaults. In terms of implementation the logical place to put it is with the rest of the environment variables in |
No. I think it makes more sense to put this option in platforms where we're already configuring the It's also easier to implement this way as the |
I've removed forwarding variables to task hosts from the documented behaviour and updated the pull request description. |
@@ -64,7 +64,7 @@ requests_). | |||
- Prasanna Challuri | |||
- David Matthews | |||
- Tim Whitcomb | |||
- (Scott Wales) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
cylc/flow/remote.py
Outdated
*(glbl_cfg().get(['scheduler']) | ||
['run hosts'] | ||
['ssh forward environment variables']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration will apply to all SSH commands made by Cylc, not just the one made by cylc play
and not just SSH'es too or from the scheduler run hosts
.
For context, here are some examples of SSH use in Cylc:
- play (client => scheduler-host): Automatic distribution of workflows onto scheduler hosts.
- clean (client => remote-platform): Removal of files on remote platforms.
- job-submission (scheduler-host => remote-platform): Submit jobs to remote platforms.
Suggest moving the configuration into the [platforms]
section:
[platforms]
[[myplatform]]
ssh forward environment variables = FOO, BAR, PROJECT
It can then be used here like so:
*(glbl_cfg().get(['scheduler']) | |
['run hosts'] | |
['ssh forward environment variables']), | |
*platform['ssh forward environment variables'], |
Ping @hjoliver from his earlier comment which lead in this direction. In order to configure this in run hosts
and have it apply only to run host comms we would need to compare the FQDN of the host name we are contacting to determine whether it is in run hosts
in the first place. The [platforms][localhost]
section is used for all run-host SSH's where it is used to configure the ssh command
, etc for things including cylc play
and workflow auto-migration (which this feature would also need to cover). So we might as well configure this in platforms opening this functionality up to other uses right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have different platforms with different forwarded variables? Any variable used by a specific platform will also need to be sent to the scheduler for it to work properly, I can see things becoming confusing if they are out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any use cases in mind for per-platform configuration. It could potentially make sense, e.g. for your use case if the project codes differ from one platform to another. There might potentially be other use cases for this sort of functionality e.g. configuring things at the Cylc level which you might otherwise have to configure in shell profile files.
The options for implementation are either a per-platform configuration, or a global configuration (as implemented). IMO it would make more sense to colocate this with the other SSH/rsync configurations, but a global config is ok too. I think putting the global configuration in the run hosts
section is a bit too misleading as it also configures SSH commands which are neither to or from the run hosts.
Note we don't currently have platform inheritance which makes the per-platform configuration a little clunkier to configure than it strictly needs to be. Inheritance was planned as a more convenient way of sharing configuration between multiple platforms, however, we haven't got around to it yet.
Ok @ScottWales, let's go with Oliver's suggestion. |
Not a problem, updated to have the configuration applied per-platform, with cylc/cylc-doc#650 updated to match. Test failures appear unrelated to this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ScottWales 👍
Allow users to specify environment variables that get forwarded from the submit host to the workflow host.
Define variables to forward in
global.cylc
like:This will add
PROJECT
andLUSTRE_DISK
to the list of variables exported in SSH commands to launch the scheduler on remote hosts (if they have been set in the current environment).Once they are available on the scheduler they can be used in the
global.cylc
templating, e.g:See #5418
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.