Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support imported suite modules. #1829

Closed
hjoliver opened this issue May 4, 2016 · 13 comments
Closed

Support imported suite modules. #1829

hjoliver opened this issue May 4, 2016 · 13 comments

Comments

@hjoliver
Copy link
Member

hjoliver commented May 4, 2016

This supersedes #173.

[UPDATE] Description edited to clarify the basic concept - which is sufficient for all use cases I know of - and to remove (my) unnecessary speculation beyond that (sorry if this removes context for some comments below!)

We should support imported suite modules to allow sharing - instead of duplicating - of common (sub-)workflows.

Concept

A suite module

  • defines one or more workflows that can be imported into other suites, e.g. a build workflow to be imported into the R1 graph of another suite, and a processing workflow to be imported into its cycling graph.
  • can be stored like a normal suite (and may contain bin scripts and other files for use by its tasks)
  • (it may be a valid non-cycling suite by itself, for standalone testing purposes)
  • defines a module interface that allows it to be treated as a black box by the importing suite, e.g.:
    • inputs for its initial tasks,
    • output locations for its final products,
    • (some internal configurability could also be exposed to the importing suite)

Proposed Interface

%import MODULE as NAME
[scheduling]
    [[dependencies]]
         graph = get-data => proc-data => NAME
[runtime]
    [[NAME]]
         # (an externally defined workflow containing many tasks)
         [[[module interface]]]
               INPUT_DIR = $CYLC_SUITE_SHARE_DIR
               OUTPUT_DIR = $CYLC_SUITE_SHARE_DIR
               # etc.

When this suite is parsed, the module

  • will be installed into the suite from its source location
  • its [runtime] will be added to that of the importing suite,
    • with names qualified by "NAME" to avoid clashes,
    • and inside a family called "NAME"
  • its graph will be inserted into the suite graph, also with qualified task names
  • (at run time the full module workflow is present, inside a collapsible family).
@hjoliver hjoliver added this to the later milestone May 4, 2016
@hjoliver hjoliver changed the title Modular suites. Support for importable suite modules. May 4, 2016
@hjoliver hjoliver changed the title Support for importable suite modules. Support importable suite modules. May 4, 2016
@arjclark
Copy link
Contributor

arjclark commented May 5, 2016

Things that are occurring to me as to be thought about:

Can (or should) the importer supply or override module runtime config such as task hosting details, or could this be made part of the module interface, or should we just accept that modules aren't portable in that way? (perhaps the latter, at least initially).

Additionally, if something is changed in the imported module, how do we handle that in terms of reloads and restarts? Possible situations being:

  • file in some central location has been changed (non-revision based solution), I want to reload my suite, how does that work?
  • file in some central location has changed (again, non-revision based solution), I want to restart my suite but not pick up changes to that file, just continue as was.

How do we go about validating the imported suite - as "In terms of suite design, the module workflow looks to the importing suite like a single task (albeit without some of the usual runtime settings)." - so how would we capture validating graphing in the imported suite to avoid cases where it might be invalid and would break things at runtime?

Do you envisage us being able to work with graphing brought in from the module in a standard way? i.e. can we do things like PRODUCTS.FAM_FOO => fail-any => some_task and/or PRODUCTS.foo:fail => recovery_task_specific_to_my_suite => !PRODUCTS.foo

Do we need to do something special to avoid:

[runtime]
[[PRODUCTS]]

being mixed up with the result of:

%import <location>:<module-name>@<revision-tag> as PRODUCTS

@matthewrmshin
Copy link
Contributor

Can a module run as a standalone suite, e.g. during development and testing?

  • what to do with the module interface in a standalone run?
  • support an internal cycling section that is ignored/replaced by the importing suite?

We can do something like Python's if __name__ == "__main__" kind of logic.

@matthewrmshin
Copy link
Contributor

Additionally, if something is changed in the imported module, how do we handle that in terms of reloads and restarts? Possible situations being:

  • file in some central location has been changed (non-revision based solution), I want to reload my suite, how does that work?
  • file in some central location has changed (again, non-revision based solution), I want to restart my suite but not pick up changes to that file, just continue as was.

If you want ultimate control of suite configuration, you should only import from version controlled location, or locations that you have full control of.

However, it is sometimes advantageous to import from a moving HEAD or latest stable. In which case:

  • The suite reloads only if the user has to tell the suite to reload, so it should not be problematic.
    I would imagine that the full running suite configuration will be stored under log/suiterc/
    like we do now, so it should be clear what we are running with.
  • Restart is problematic under the current logic.
    In fact, I think we should modify the current logic so that a restart does not automatically loads
    the suite.rc from the registered location, but should loads the relevant suite.rc from the
    log/suiterc/ location, unless a reload is also requested for the restart.

@arjclark
Copy link
Contributor

arjclark commented May 5, 2016

Restart is problematic under the current logic. In fact, I think we should modify the current logic so that a restart does not automatically loads the suite.rc from the registered location, but should loads the relevant suite.rc from the log/suiterc/ location, unless a reload is also requested for the restart.

Which I guess means that we'll need to keep a copy of the processed out contents of the imported content in there rather than a reference to some external (possibly processed) suite files.

@matthewrmshin
Copy link
Contributor

%import <location>:<module-name>@<revision-tag> as PRODUCTS

We should probably support a search path + import from relative path mechanism - so that the import statement can be detached from site-specific locations.

@hjoliver hjoliver changed the title Support importable suite modules. Support imported suite modules. May 5, 2016
@hjoliver
Copy link
Member Author

hjoliver commented May 5, 2016

@arjclark (above #1829 (comment)):

if something is changed in the imported module, how do we handle that in terms of reloads and restarts? ... (non-revision based solution)

I suppose we could design a solution such that restart and reload do not re-import by default, but in my view we shouldn't do that. If using a "non-revision based solution" (or similarly, extracting from HEAD of a branch in other contexts) you get what you paid for - so be aware, and be careful!

How do we go about validating the imported suite ...

By "in terms of suite design" I mean all the suite writer has to worry about assuming that imported modules are not broken or incompatible with the importer. I suppose we could have a validation that doesn't actually do the import, but I was envisaging that validation would do a full suite parse as for a suite start-up, in which case a broken module would cause the validation to fail (c.f. importing a broken Python module).

Do you envisage us being able to work with graphing brought in from the module in a standard way? ...

Yes, although I haven't thought about this in much depth. As described above, the imported module does end up as a regular task family with internal dependencies, so MODULE:succeed-all => foo would actually be equivalent to triggering off the final internal task(s), without having to know (at suite design time) the internal task names.

Do we need to do something special to avoid...

Yes - abort if a name collision is detected.

@hjoliver
Copy link
Member Author

hjoliver commented May 5, 2016

@matthewrmshin (above #1829 (comment)):

Restart is problematic under the current logic. ...

Good point. We also need to take into account our plans to move the "rose suite-run" installed-suite paradigm into cylc, which separates the installed/registered location from the source location.

@hjoliver
Copy link
Member Author

hjoliver commented May 5, 2016

Does import MODULE-URL@MODULE-REV as NAME inside a suite.rc file seem a little odd given that cylc suites have hitherto only been objects of (/subject to) revision control, not actually aware of the concept themselves? Maybe we should only support import MODULE as NAME and require that the location (and potentially revision) of MODULE is somehow provided by external means ?

@arjclark
Copy link
Contributor

arjclark commented May 6, 2016

Does import MODULE-URL@MODULE-REV as NAME inside a suite.rc file seem a little odd given that cylc suites have hitherto only been objects of (/subject to) revision control, not actually aware of the concept themselves?

Yes, to me at least, and kind of prompted my questions about things in relation to non-revision controlled files i.e. ones where you can't use a @MODULE-REV extension.

@matthewrmshin
Copy link
Contributor

(I am thinking about a system similar to Python's import statement and sys.path, or FCM's configuration file's include and include-path.)

@hjoliver
Copy link
Member Author

hjoliver commented May 12, 2016

Module source locations:

This should wait on migration of rose suite-run #1885. Then, the suite.rc syntax should only support import MODULE as NAME, with MODULE expected to be located in a local sub-directory of the suite directory, modules/MODULE say, or perhaps externally via $CYLC_MODULE_PATH or similar. For revision controlled modules, some cylc file analogous to rose-suite.conf can specify URL+revision of the module, to be extracted and installed into the suite installation.

@hjoliver
Copy link
Member Author

[meeting] - we agreed on the proposal as described in the updated issue description, plus previous comment on source locations.

An additional thought was:

  • don't need to wait on Migrate "rose suite-run" functionality into cylc. #1885 for the initial implementation, using modules in a local sub-directory
  • in fact a module could be imported from a suite (where it exists in the modules sub-directory)
    • so you could modularize an existing suite without actually taking the module config out of it, but still make it available to others
    • this automatically solves the problem of standalone testing discussed above, without requiring additional cycling config etc. that is ignored on import: just store a module within a simple suite designed for standalone testing.

@hjoliver
Copy link
Member Author

Closing this issue: the proposed mechanism worked reasonably well within the confines of the config file format, but we decided quite some time ago to wait on the future Python API to get proper modularity via Python itself.

@matthewrmshin matthewrmshin removed this from the soon milestone Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants