Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized tasks. #1953

Merged
merged 39 commits into from
Sep 9, 2016
Merged

Conversation

hjoliver
Copy link
Member

@hjoliver hjoliver commented Jul 27, 2016

Close #1937.
Close #1900.

Refactored graph-string parser.

New functionality:

  • parameterized tasks and families - i.e. auto-generate workflow without Jinja2 loops
    • (this also enables "chunking" - splitting model runs into chunks (within a single cycle point) - without Jinja2)
    • parameter values are passed to task environments
    • any number of parameters (foo<m>, bar<m,n>, baz<m,n,o>, qux<m,n,o,p>, etc.)
    • works in cycling and one-off suites
  • member-to-member family semantics, for same-size same-sort-order families (removed)
  • syntax-driven line continuation

Works and the test battery passes, BUT:

  • I removed cylc-5 start-up and cold-start task back-compatibility and associated tests - the associated graph-parsing code looked like a nasty complication
    • (need to either restore this or officially obsolete cylc-5 syntax)

Still TODO:

  • add more doc-strings and comments
  • address a bunch of TODOs that I left in the code
  • add tests for the new functionality
  • add more error-detection for mis-use of the new syntax
  • user guide documentation
  • (address the back-compat issue mentioned above).

@hjoliver hjoliver added this to the soon milestone Jul 27, 2016
@hjoliver
Copy link
Member Author

hjoliver commented Jul 27, 2016

Examples

Paramterized tasks

Basic:

[cylc]
    [[parameters]]
        r = 2
        m = 2
[scheduling]
    [[dependencies]]
        graph = """
            prep => init_run<r> => sim<r,m> => post<r,m> => close_run<r> => done
                """
[runtime]
    [[prep]]
    [[init_run<r>]]
    [[sim<r,m>]]
    [[post<r,m>]]
    [[close_run<r>]]
    [[done]]

params1

With inter-chunk dependence:

[cylc]
    [[parameters]]
        r = 2
        c = 2
[scheduling]
    [[dependencies]]
        graph = """
    prep => init_run<r> => sim<r,c> => post<r,c> => close_run<r> => done
    sim<r,c-1> => sim<r,c>
                """
[runtime]
    [[prep]]
    [[init_run<r>]]
    [[sim<r,c>]]
    [[post<r,c>]]
    [[close_run<r>]]
    [[done]]

chunk1

Similar, but avoid unnecessary triggers between chunks and the init and close tasks:

[cylc]
    [[parameters]]
        r = 2
        c = 2
[scheduling]
    [[dependencies]]
        graph = """
    prep => init_run<r> => sim<r,c=0>
    sim<r,c> => post<r,c>
    sim<r,c-1> => sim<r,c>
    post<r,c-1> => post<r,c>
    post<r,c=1> => close_run<r> => done
                """
[runtime]
    [[prep]]
    [[init_run<r>]]
    [[sim<r,c>]]
    [[post<r,c>]]
    [[close_run<r>]]
    [[done]]

chunk2

How to do three 4-month model runs in each cycle of a yearly-cycling suite:

[cylc]
    cycle point format = %Y
    [[parameters]]
        c = 3
[scheduling]
    initial cycle point = 2010
    [[dependencies]]
        [[[R1]]]
            graph = prep => sim<c=0>
        [[[P1Y]]]
            graph = """sim<c> => post<c>
                       sim<c-1> => sim<c>
                       sim<c=2>[-P1Y] => sim<c=0>"""
[runtime]
    [[prep]]
    [[sim<c>]]
        script = echo $START_DATE
        [[[environment]]]
            INCR=P$((c*4))M
            START_DATE=$(cylc cyclepoint --offset=$INCR)
    [[post<c>]]

chunk3

Member-to-member triggers

[UPDATE: this feature has been removed - see the discussion below]

[scheduling]
    [[dependencies]]
        graph = prep => GET_PETS => FEED_PETS => done
[runtime]
    [[GET_PETS]]
    [[get_cats, get_dogs, get_fish]]
        inherit = GET_PETS
    [[FEED_PETS]]
    [[feed_cats, feed_dogs, feed_fish]]
        inherit = FEED_PETS

mem-to-mem

@matthewrmshin
Copy link
Contributor

(Test battery passes in my environment with master merged in. The conflict is a tiny one in cylc.scheduler.)

@hjoliver hjoliver force-pushed the 1937.parameterized-tasks branch 21 times, most recently from cbc9f41 to d0227f6 Compare August 12, 2016 06:56
@hjoliver
Copy link
Member Author

Branch rebased (I still need to address my own TODOs before review...)

@hjoliver hjoliver self-assigned this Aug 12, 2016
@hjoliver hjoliver force-pushed the 1937.parameterized-tasks branch 2 times, most recently from 3943575 to 07b4de7 Compare August 17, 2016 03:06
@hjoliver
Copy link
Member Author

@matthewrmshin et. al. - this is now ready for review. I'll quickly add documentation to the user guide once the notation and implementation is agreed. Also can you discuss, do I need to add cylc-5 support back in as per remaining TODOs that I've left at the top of config.py:

# NOTE: OBSOLETED:
#  * OLD INITIAL TASKS (start-up and cold-start)
#  * IMPLICIT CYCLING
#  * ASYNC GRAPH IN CYCLING SUITE
#  * Ignored qualifiers on RHS, e.g. foo => FAM:succeed-all => bar
#    now must be "foo => FAM" and "FAM:succeed-all => bar".
#  * note cannot get-config retrieve an async graph anymore, it is moved to R1.
# TODO: CONSIDER OBSOLETING ALL cylc-5 syntax?

(As far as NIWA is concerned, cylc-5 is long dead and I'd rather not mess up the new graph parser with it!)

\caption[Static chunked (top) and dynamic cycled (bottom) versions of the
same workflow.]{\scriptsize Static chunked and dynamic cycled versions of
the same workflow.}
\label{fig-eg2}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing the images for this in my html copy of the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, hadn't checked the HTML - will fix tomorrow NZ.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@arjclark
Copy link
Contributor

arjclark commented Sep 7, 2016

./tests/validate/40-fail-suicide-left.t also failing to pass in my environment

@hjoliver
Copy link
Member Author

hjoliver commented Sep 7, 2016

./tests/validate/40-fail-suicide-left.t also failing to pass in my environment

Huh, it passes here; I don't get the deprecation message (as reported by @matthewrmshin) from Python 2.7.5...

But I do at Python 2.6. Odd that a deprecation warning is removed at later versions!

Fixed.

@hjoliver
Copy link
Member Author

hjoliver commented Sep 8, 2016

New draft CUG section on "cycling" added. I still need to address the remaining feedback above.

@hjoliver
Copy link
Member Author

hjoliver commented Sep 8, 2016

New draft section revised. Will address remaining feedback tomorrow (and review the latest tutorial draft)

@matthewrmshin
Copy link
Contributor

Tests OK in my environment.

@hjoliver
Copy link
Member Author

hjoliver commented Sep 9, 2016

OK:

  • Revised Section 5. Formerly On the meaning of "cycle" and "cycle point" in cylc, now it is Cycling and Cycle Points in cylc, with short subsections on terminology and "ways of cycling" (dynamic vs static)
  • Revised the new Parameterized Tasks section in light of feedback.

I'm personally pretty happy with the documentation now. If you disagree only mildly, maybe consider improvements via later PRs, depending on current levels of urgency (when do you want to get the tutorial VM out?)

@arjclark
Copy link
Contributor

arjclark commented Sep 9, 2016

@matthewrmshin - review 2 ok. In the interests of time, agree with @hjoliver that documentation can be refined at a later date (I still don't like the way "dynamic" is being used here but it's not a blocker on the otherwise good work).

As noted by @hjoliver above, the only thing I've noted is a slight slowdown in validation times with master (at time of investigating) giving a validation time of:

time CYLC_CONF_PATH= cylc validate .
Valid for cylc-6.10.2-191-gadd7f

real    0m29.154s
user    0m27.716s
sys 0m1.001s

vs this branch of:

time CYLC_CONF_PATH= cylc validate .
Valid for cylc-6.10.2-217-g09085

real    0m37.087s
user    0m34.902s
sys 0m0.901s

This is testing with a suite based on the horrible numbers of inter-family triggers as:

#!jinja2
{% set NUM_MEMBERS_FAM1=200 %}
{% set NUM_MEMBERS_FAM2=200 %}
[scheduling]
[[dependencies]]
graph = FAM1:succeed-all => FAM2
[runtime]
[[root]]
script = true
[[FAM1]]
[[FAM2]]
{% for member in range(1,NUM_MEMBERS_FAM1) %}
[[fam1_member_{{member}}]]
inherit = FAM1
{% endfor %}
{% for member in range(1,NUM_MEMBERS_FAM2) %}
[[fam2_member_{{member}}]]
inherit = FAM2
{% endfor %}

I don't think this is a showstopper and not an unexpected slowdown given the increased work being done, just noted here for completeness.

@matthewrmshin - over to you to press the button when you're happy for this to go in.

@matthewrmshin
Copy link
Contributor

Test OK in my environment (with master merged in). New code and code changes are good. I'll merge this to allow it to stabilise on master.

@matthewrmshin matthewrmshin merged commit 42ca6cc into cylc:master Sep 9, 2016
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Sep 9, 2016
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Sep 9, 2016
@hjoliver
Copy link
Member Author

Woo-hoo, it's merged at last!

For the record, I'm seeing these timings on the suite with "horrible numbers of inter-family triggers"

  • cylc-6.10.2 : 16.5s
  • cylc-master (with this now merged in): 20.3s

This isn't too much of a hit, but I'll do some more profiling in due course...

@hjoliver hjoliver deleted the 1937.parameterized-tasks branch September 10, 2016 04:09
arjclark added a commit to metomi/rose that referenced this pull request Sep 12, 2016
matthewrmshin added a commit to matthewrmshin/rose that referenced this pull request Sep 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants