Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward arbitrary environment variables over SSH #5709

Merged
merged 7 commits into from
Oct 29, 2023

Conversation

ScottWales
Copy link
Contributor

@ScottWales ScottWales commented Aug 29, 2023

Allow users to specify environment variables that get forwarded from the submit host to the workflow host.

Define variables to forward in global.cylc like:

[scheduler]
    [[run hosts]]
        ssh forward environment variables = PROJECT, LUSTRE_DISK

This will add PROJECT and LUSTRE_DISK to the list of variables exported in SSH commands to launch the scheduler on remote hosts (if they have been set in the current environment).

Once they are available on the scheduler they can be used in the global.cylc templating, e.g:

[scheduler]
    [[run hosts]]
        ssh forward environment variables = PROJECT, LUSTRE_DISK

[install]
    [[symlink dirs]]
        [[[hpc]]]
           run = {{ environ['LUSTRE_DISK'] }}
[platforms]
    [[hpc]]
        submit method = pbs
        [[[directives]]]
            -P = {{ environ['PROJECT'] }}

See #5418

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • CHANGES.md entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at Document ssh forward environment variables cylc-doc#650.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

Allow users to specify environment variables that get forwarded from the
submit host to the scheduler and task submission.

Define variables to forward in `global.cylc` like:
```
[platforms]
    [[localhost]]
        ssh forward environment variables = PROJECT, LUSTRE_DISK
```

This will add `PROJECT` and `LUSTRE_DISK` to the list of variables
exported in SSH commands to launch the scheduler on remote hosts (if
they have been set in the current environment).

Once they are available on the scheduler they can further be forwarded
to other platforms, where they may interact with the scheduler to
set a default project code or be used to set storage paths:
```
[platforms]
    [[mtu]]
        ssh forward environment variables = PROJECT, LUSTRE_DISK
        install target = mtu

[install]
    [[symlink dirs]]
        [[[mtu]]]
            share = $LUSTRE_DISK/$PROJECT/$USER
```
@hjoliver
Copy link
Member

hjoliver commented Aug 30, 2023

Thanks @ScottWales !

This is needed for "project-dir" symlinking if job platforms don't share the local FS (which happily is not the case at my site, but I was aware it would be a problem some places)

@hjoliver
Copy link
Member

hjoliver commented Sep 13, 2023

@ScottWales -

This branch works for run-directory symlinking on job hosts, if global symlink config is expressed in terms of raw environment variables that will be evaluated on the remote end during job host init. (i.e. the first time a task runs on the job platform).

However, we don't actually need that, because we can use Jinja2 in global config to evaluate the (e.g.) PROJECT variable on the scheduler run host and send the result over, rather than sending the local value over and evaluating the variable remotely.

Further, I'm pretty sure your branch doesn't do this bit:

export [the env vars] in SSH commands to launch the scheduler on remote hosts

because the ssh command for re-invocation of cylc play on a scheduler run host gets constructed differently (another function in the remote module that you've modified) and it doesn't get access to platforms config (for your ssh forward environment variables) because "platforms" configure job hosts, not scheduler run hosts.

We do need forwarding to the scheduler run host (the bit that I think doesn't work on this branch) IF it is just a transient environment variable on the login host where the user types cylc play, because the global config will be evaluated on the scheduler run host.

Sorry if I've caused some confusion by not thinking this through clearly myself!

@hjoliver
Copy link
Member

hjoliver commented Sep 13, 2023

Ignoring scheduler run hosts for the moment (imagine we run the scheduler on the login host), consider symlinking run directories to $LUSTRE_DISK/$PROJECT where $LUSTRE_DISK is in the default environment and $PROJECT needs to be set on the fly depending on the workflow.

[install]
   [[symlink dirs]]
      [[[remote]]]  # where "remote" is the intall target for your job platform
         run = "$LUSTRE_DISK/$PROJECT"

If I do it like this, the literal string $LUSTRE_DISK/$PROJECT gets forwarded has to be evaluated at the remote end. That's the right thing to do for $LUSTRE_DISK which might be platform dependent, but PROJECT is set locally and is workflow dependent. So for that to work we would have to send PROJECT to the remote so that we can evaluate $PROJECT there. (Which your PR branch in fact does).

But that's unnecessary because we can do this instead:

#!Jinja2
[install]
   [[symlink dirs]]
      [[[remote]]]  # where "remote" is the intall target for your job platform
         run = "$LUSTRE_DISK/{{ environ['PROJECT'] }}"

This Jinja2 gets processed when the global config is parsed by the scheduler, so if PROJECT evaluates weather (say) on the scheduler run host, we'll send $LUSTRE_DISK/weather to the remote for symlinking - no need to forward the PROJECT variable.

@hjoliver
Copy link
Member

hjoliver commented Sep 13, 2023

Now I'll put scheduler run hosts back on the table.

In this case, if $PROJECT is defined on the fly by the user on the login host (where they type cylc play) then PROJECT will need to be forwarded to the run host, so that it's present in the environment when the global config gets parsed at scheduler start-up.

Alternatively a pre-configure plugin could (I think) read PROJECT from rose-suite.conf or some other workflow source file at scheduler start-up. Then there's no need to forward PROJECT from the login host.

However, I think we need to support forwarding to run hosts anyway, because if workflow-specific symlinking is needed then setting PROJECT on the fly (rather than in a source file) would be entirely reasonable.

But we'll need to shift your new global config settings, to here (not under platforms):

[scheduler]
   [[run hosts]]
      # environment variables to forward from login host to scheduler run hosts
      ssh forward environment variables = PROJECT, ...

@hjoliver
Copy link
Member

hjoliver commented Sep 13, 2023

So, do you want to rejig this branch a bit? (change the location of the new config item, and move the loading of it to the ssh command that gets used for cylc play re-invocation on run hosts - and not for ssh to job platforms).

@hjoliver
Copy link
Member

hjoliver commented Sep 13, 2023

And we're going to need to document this well, so that users (or site admins at least) don't have to figure it out the hard way.

@ScottWales
Copy link
Contributor Author

Thanks @hjoliver. I did also want to have variables forwarded to the job host to be picked up by qsub for setting up accounting defaults, but that can also be handled through directives and it's probably better to be explicit as you say.

because the ssh command for re-invocation of cylc play on a scheduler run host gets constructed differently (another function in the remote module that you've modified) and it doesn't get access to platforms config (for your ssh forward environment variables) because "platforms" configure job hosts, not scheduler run hosts.

The cylc play re-invocation appears to be using the 'localhost' platform at

return remote_cylc_cmd(
cmd,
get_platform(), # use localhost settings
host=host,
**kwargs,
)

Could you please clarify if you're meaning something different?

I'm happy to move the configuration option to under [run hosts], it seems sensible to have just one place to set the option since that avoides needing the localhost and run hosts settings to match up.

I agree on the documentation once the change is ironed out.

@hjoliver
Copy link
Member

hjoliver commented Sep 13, 2023

I did also want to have variables forwarded to the job host to be picked up by qsub for setting up accounting defaults, but that can also be handled through directives and it's probably better to be explicit as you say.

OK interesting, I hadn't appreciated that you can set PBS options via environment rather than job script directives. If that is something you might want to do, we'll need to think more carefully about where to put the new config item.

The cylc play re-invocation appears to be using the 'localhost' platform at

Ah, do you mean it was picking up the ssh forward variables from the localhost platform, for reinvoking cylc play?

If so, that's a "platforms" detail I had forgotten or missed.

@wxtim - should command invocation on run hosts be using localhost platforms config, if run hosts are not also job hosts?

ScottWales added a commit to ScottWales/cylc-doc that referenced this pull request Sep 14, 2023
ScottWales added a commit to ScottWales/cylc-doc that referenced this pull request Sep 14, 2023
ScottWales added a commit to ScottWales/cylc-doc that referenced this pull request Sep 14, 2023
@wxtim
Copy link
Member

wxtim commented Sep 20, 2023

@wxtim - should command invocation on run hosts be using localhost platforms config, if run hosts are not also job hosts?

Looking at the code, I think the only bits of platform information used by the re-invocation (if host is given) are

  • ssh command
  • use login shell
  • cylc path

I think that run hosts should be similar enough to localhost that this is correct?

@oliver-sanders
Copy link
Member

oliver-sanders commented Sep 21, 2023

But we'll need to shift your new global config settings, to here (not under platforms):

[scheduler]
[[run hosts]]
# environment variables to forward from login host to scheduler run hosts
ssh forward environment variables = PROJECT, ...

@hjoliver, these days the scheduler host uses the config from [platforms][localhost] as loaded on the scheduler host.

See:

@hjoliver
Copy link
Member

@hjoliver, these days the scheduler host uses the config from [platforms][localhost] as loaded on the scheduler host.

Roger that. Sorry if I put you wrong there @ScottWales

Still, @oliver-sanders , we have the scheduler run hosts section. If this PR is only about forwarding environment variables to the run host (from the login host where the user invokes cylc play) - as I initially thought - then presumably this should not be localhost platforms config, because that is primarily intended for running local jobs. Do you agree?

So I think we need to definitively decide whether these variables need to be forward to job host as well.

From @ScottWales :

I did also want to have variables forwarded to the job host to be picked up by qsub for setting up accounting defaults, but that can also be handled through directives and it's probably better to be explicit as you say.

I'm not currently a PBS user. Can you set PBS directives via the environment? And if so, do we need to support that, or should we recommend --account= {{ environ["ACCOUNT" }} (or whatever the PBS directive is), to be evaluated during file parsing at start-up on the scheduler host?

@ScottWales
Copy link
Contributor Author

I am happy for this to be strictly a run hosts thing, and set any PBS flags explicitly via directives. I'm not aware of any settings that have to come from the environment instead of directives, the environment just gives defaults.

In terms of implementation the logical place to put it is with the rest of the environment variables in construct_ssh_command. These get sent regardless of the target host, so as an implementation detail the variables will be sent to task hosts as well. Adding a filter based on the hostname could be done but seems a bit too complex to me.

@oliver-sanders
Copy link
Member

oliver-sanders commented Sep 22, 2023

this PR is only about forwarding environment variables to the run hos
Do you agree?

No.

I think it makes more sense to put this option in platforms where we're already configuring the ssh command. In this general section the config might be able to serve other use cases.

It's also easier to implement this way as the platform is already passed through to construct_ssh_cmd, whereas pulling it out of [scheduler][run hosts] would presumably require comparing the fqdn of localhost to the hosts in [scheduler][run hosts]avaliable in order to restrict this functionality to the run hosts?

@ScottWales
Copy link
Contributor Author

I've removed forwarding variables to task hosts from the documented behaviour and updated the pull request description.

cylc/flow/cfgspec/globalcfg.py Outdated Show resolved Hide resolved
@@ -64,7 +64,7 @@ requests_).
- Prasanna Challuri
- David Matthews
- Tim Whitcomb
- (Scott Wales)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Comment on lines 303 to 305
*(glbl_cfg().get(['scheduler'])
['run hosts']
['ssh forward environment variables']),
Copy link
Member

@oliver-sanders oliver-sanders Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration will apply to all SSH commands made by Cylc, not just the one made by cylc play and not just SSH'es too or from the scheduler run hosts.

For context, here are some examples of SSH use in Cylc:

  • play (client => scheduler-host): Automatic distribution of workflows onto scheduler hosts.
  • clean (client => remote-platform): Removal of files on remote platforms.
  • job-submission (scheduler-host => remote-platform): Submit jobs to remote platforms.

Suggest moving the configuration into the [platforms] section:

[platforms]
  [[myplatform]]
    ssh forward environment variables = FOO, BAR, PROJECT

It can then be used here like so:

Suggested change
*(glbl_cfg().get(['scheduler'])
['run hosts']
['ssh forward environment variables']),
*platform['ssh forward environment variables'],

Ping @hjoliver from his earlier comment which lead in this direction. In order to configure this in run hosts and have it apply only to run host comms we would need to compare the FQDN of the host name we are contacting to determine whether it is in run hosts in the first place. The [platforms][localhost] section is used for all run-host SSH's where it is used to configure the ssh command, etc for things including cylc play and workflow auto-migration (which this feature would also need to cover). So we might as well configure this in platforms opening this functionality up to other uses right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have different platforms with different forwarded variables? Any variable used by a specific platform will also need to be sent to the scheduler for it to work properly, I can see things becoming confusing if they are out of sync.

Copy link
Member

@oliver-sanders oliver-sanders Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any use cases in mind for per-platform configuration. It could potentially make sense, e.g. for your use case if the project codes differ from one platform to another. There might potentially be other use cases for this sort of functionality e.g. configuring things at the Cylc level which you might otherwise have to configure in shell profile files.

The options for implementation are either a per-platform configuration, or a global configuration (as implemented). IMO it would make more sense to colocate this with the other SSH/rsync configurations, but a global config is ok too. I think putting the global configuration in the run hosts section is a bit too misleading as it also configures SSH commands which are neither to or from the run hosts.

Note we don't currently have platform inheritance which makes the per-platform configuration a little clunkier to configure than it strictly needs to be. Inheritance was planned as a more convenient way of sharing configuration between multiple platforms, however, we haven't got around to it yet.

@hjoliver
Copy link
Member

Ok @ScottWales, let's go with Oliver's suggestion.

@ScottWales
Copy link
Contributor Author

Not a problem, updated to have the configuration applied per-platform, with cylc/cylc-doc#650 updated to match. Test failures appear unrelated to this change.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ScottWales 👍

@hjoliver hjoliver merged commit abfe30d into cylc:master Oct 29, 2023
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants