Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access the config from within a job #3619

Open
multimeric opened this issue May 18, 2021 · 6 comments
Open

Access the config from within a job #3619

multimeric opened this issue May 18, 2021 · 6 comments

Comments

@multimeric
Copy link
Contributor

multimeric commented May 18, 2021

I'm trying to fix toil-container, with reference to #1768. One key aspect to this is being able to set --singularity or --docker on the command line, storing that in the Toil options, and then checking this value later on when we go to run a container. How can a running job access the toil options?

┆Issue is synchronized with this Jira Story
┆Issue Number: TOIL-903

@mr-c
Copy link
Contributor

mr-c commented May 18, 2021

This is for non-toil-cwl-runner scripts, so please make sure that the new --docker option does not flow to toil-cwl-runner and the new --singularity option does not override toil-cwl-runner's --singularity option. Thanks!

@multimeric
Copy link
Contributor Author

Yes more likely it would be --container-engine docker or similar

@mr-c
Copy link
Contributor

mr-c commented May 18, 2021

Yes more likely it would be --container-engine docker or similar

The name(s) can overlap, but we need to make sure it doesn't show up as a toil-cwl-runner option, as that has its own methods..

@adamnovak
Copy link
Member

This would also be relevant to #4142 if we want to make the way Toil passes around its config information a little more extensible.

The Toil architecture is to take the options object from ArgParse and copy a bunch of information into a Config instance. So we don't actually have the original ArgParse Namespace available in the jobs to be gotten. Usually when I've written Python pipelines I've ended up just passing it along to all my jobs as an argument.

Since the JobDescription refactor, we have jobs keeping references to the Toil Config in their JobDescriptions, which we use for filling in default resource requirements from the config when they are not set at the job level. When the job is deserialized, it is hooked up to the config by calling assignConfig() on it.

So if you want a custom job class to get ahold of the config, you could override assignConfig() and stash it somewhere where it won't get pickled again, or you can look at self.description._config.

A real solution to this would probably involve:

  1. A way to get the config from a getter method, without digging into the internals of the JobDescription which might change.
  2. A good way to actually send user data along with the Toil config, maybe letting the user hook their options into a more-unified Toil config-file/option/environment-var/config-object system, or maybe just giving the user a free-form Namespace they can stick stuff in that the Toil Config will carry along.

@adamnovak
Copy link
Member

If you want to get at the object on the leader, it would be in the config field on the Toil context manager, when you are inside it.

@adamnovak
Copy link
Member

We should come up with a good way to make the config system as updated in #4569 officially user-extensible, and document it with an example in the docs.

The workaround is to just cram more fields into it on the leader, and reach into the internals of Toil to get it from the current job on the worker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants