Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config parsing optimization #170

Closed
hjoliver opened this issue Nov 4, 2012 · 7 comments
Closed

config parsing optimization #170

hjoliver opened this issue Nov 4, 2012 · 7 comments

Comments

@hjoliver
Copy link
Member

hjoliver commented Nov 4, 2012

As reported by @matthewrmshin: the following 1000-task suite takes > 40 seconds to validate on his desktop, which is a problem for use of "cylc get-config" in an event handler script. Memory usage is also very high during the suite parsing.

#!jinja2
 {% set N_TASKS = 1000 %}
{% set N_FAMS = 4 %}
 {% set ITEMS = ["egg", "ham", "bacon", "sausage", "beans", "tomato", "chips", "mushrooms",   "blackpudding"] %}
[scheduling]
    [[dependencies]]
        graph = """
        {% for i in range(N_TASKS) %}
        build => run{{ "%04d"|format(i) }}
        {% endfor %}
        """
[runtime]
    [[root]]
        command scripting = "echo $CYLC_TASK_ID"
    [[FAM00]]
        inherit = root
    {% for i in range(1, N_FAMS) %}
    [[FAM{{ "%02d"|format(i) }}]]
        inherit = FAM{{ "%02d"|format(i - 1) }}
        [[[environment]]]
           {% for j in range(10) %}
           {% for item in ITEMS %}
           {{ item|upper }}_{{ "%02d_%02d"|format(i, j) }} = "{{ item }} {{ "%02d %02d"|format(i, j) }}"
           {% endfor %}
           {% endfor %}
    {% endfor %}
    {% for i in range(N_TASKS) %}
    [[run{{ "%04d"|format(i) }}]]
        inherit = FAM{{ "%02d"|format(N_FAMS - 1) }}
    {% endfor %}``` 
@hjoliver
Copy link
Member Author

hjoliver commented Nov 4, 2012

On my desktop box, the suite takes 32s to validate with the current cylc master head. Notably, increasing the number of families has a big impact even though the inheritance graph is just root -> FAM1 -> FAM2 -> FAM3 -> FAM4 -> (all tasks). Here's what I found after some execution profiling with cProfile, and code analysis:

  1. runtime inheritance is currently done using the linear list of ancestors for every namespace rather than by traversing the inheritance tree. So for this suite, copy and override of the runtime nested dict for root -> FAM1 -> FAM2 -> FAM3 -> FAM4 is computed for every task (1000 times) even though it only needs to be done once and the final result used by all tasks (this dates back to my initial cut at the inheritance implementation - oops!) [addendum: in this suite, the number of environment variables supplied to each task also goes up dramatically with the number of families].

  2. moreover, during inheritance computation the full nested dict structure is passed through the entire hierarchy, even though typically only a few runtime items are set in each namespace. When configobj parses the suite.rc file it returns a sparse nested dict containing only those items that are actually set in the file. But subsequently configobj validation , sets defaults for all items (even if 'None') and fleshes out the full dense config structure. We do this to put defaults into the root namespace, and the dense structure is then carried through to all levels of inheritance to the tasks.

  3. copying runtime namespaces (nested dicts) for inheritance (etc.) is done using copy.deepcopy() and an inordinately large amount of time is spent in that function. Turns out deepcopy() is slow because it has lots of safety checks to avoid circular reference loops and similar potential problems with copying complex data structures.

@hjoliver
Copy link
Member Author

hjoliver commented Nov 5, 2012

With the following changes I can get a full validation of the above suite down to less than 5 seconds, and get-config for a single task to less than 2 seconds:

  1. replace deepcopy() of runtime namespaces with a custom recursive nested-dict-copy function

  2. fix inheritance processing so that each parent-child inheritance is only done once

  3. do a full configobj validation for every runtime namespace only via "cylc validate", not every time the suite is parsed (e.g. for get-config).

  4. "cylc get-config SUITE TASK ..." does not need to do inheritance etc. for all the other tasks in the suite

  5. sparse inheritance: instead of using configobj validation to insert defaults into the root namespace, generating a large data structure that persists through every level of the hierarchy, we can inherit just items that have been explicitly set in ancestor namespaces, then at the very end of the process have the resulting task namespaces inherit from a single dense structure containing the defaults. NOTE: even this need not be dense if we hardwire the defaults into the task proxy definition module instead of taking them from conf/suiterc.spec.

  6. use of an ordered dict coded in C (found on the internet, easy to build). The cylc OrderedDict module now uses the C-coded one first, if it is installed, then defaults to the ActiveState version packaged with cylc, then the Python standard library collections version (Python 2.7+).

@hjoliver
Copy link
Member Author

A small step back from the gains reported above, unfortunately - the test battery showed up a problem in my new inheritance implementation. The result now, for the above test suite, is 9s to validate and 5s to get-config a single item from a task, down from 32s (on my desktop box). Still a factor of 3 or more.

This may represent the best we can get, pretty much, without more radical changes to the way cylc parses suites to define task proxy classes, etc.

@hjoliver
Copy link
Member Author

I've merged this work into master: 6fbbfd3

@hjoliver hjoliver reopened this Nov 14, 2012
@hjoliver
Copy link
Member Author

So, the current state of affairs in suite.rc parsing is this: several unnecessary inefficiencies have been removed and inheritance is now done entirely with sparse structures (just what is specified in the suite.rc file), but the full dense structure (including defaults for all possible values) is fleshed out in every dynamically-generated task proxy class definition (one for every task in the graph). Thus we are computing the runtime config data more efficiently but we are still storing a full runtime structure for every task in the suite - which may result in heavy memory requirements for large suites?

It might further improve things, at least in terms of memory usage, to not store the final dense runtime structure for every task, but to generate it on the fly every time it is needed (at run time: just before job submission). Then, memory requirement on parsing the suite would be just be the information in the suite.rc file with inheritance expanded out. Validation and get-config would still have to flesh out the defaults for each task, but would not have to hold all tasks in memory at once.

We could even go one step further and only do inheritance processing on the fly, just before job submission. Then memory requirement on parsing the suite would just be for exactly what's recorded in the suite.rc file (i.e. common values factored out by inheritance would still be stored just once). However, we might still want "inherit out" higher levels of the hierarchy to avoid much repetition (e.g. in "root -> fam -> (m1, m2, m3)" do we want to re-do "root -> fam" for every member task, or just once? - maybe the repetition doesn't matter if it happens just prior to job submission inside the job submission worker thread.

It may even be possible to do absolutely everything on the fly for each task just prior to job submission (or as required for validation and get-config), i.e. parse the suite.rc file anew each time but compute as little as possible, just the data required for the task of interest. This seems a rather extreme solution though!

@hjoliver
Copy link
Member Author

Originally, config.py parsed and validated the suite and computed everything required to populate the initial task pool for running the suite. Then get-config came along, to extract individual items from the end result. Now get-config is a bit more efficient in that it tries to only process the particular task targeted by a request, and its ancestors.

We can probably make get-config even more efficient by taking only requested [runtime] items of targeted tasks through the inheritance process. And also, by not doing any inheritance or graph parsing at all for get-config of non-[runtime] items.

A bit of refactoring of config.py would be nice, to enable this kind of flexibility without loads of if-then constructs embedded in code originally designed to parse the entire suite prior to running it.

@hjoliver
Copy link
Member Author

The changes made already have been deemed sufficient for the 5.0 milestone. I'm closing this now, having moved ideas for further improvements to Issue #184.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant