Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler, task_pool, task_proxy refactor #2157

Merged
merged 27 commits into from May 11, 2017

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented Feb 10, 2017

The main drives of this change are:

  • Efficiency.
  • Reduce strongly coupled logic where possible.

Some headlines:

  • Reduce number of attributes in cylc.task_proxy and related objects.
  • Reduce line counts in cylc.scheduler, cylc.task_pool and cylc.task_proxy.
  • Remove get_task_proxy.
    • Replaced by SuiteConfig.get_taskdef and constructor of TaskProxy.
  • Singletons related to job management are now instantiated normally. These include:
    • Batch system manager.
    • Job file writer.
    • Job host initialisation manager.
    • Multi-processing pool for job commands, suite events and task events.
  • New module cylc.task_job_mgr: job management logic from cylc.task_pool and cylc.task_proxy.
  • New module cylc.suite_events: suite events logic from cylc.scheduler.
  • New module cylc.task_events_mgr:
    • Task message and task event handler logic in cylc.scheulder and cylc.task_proxy.
    • Manage task event handlers: these no longer hang onto their originating task proxy objects. The new logic allows task pool to clean up tasks without having to worry about their incomplete event handlers.
  • New module cylc.suite_db_mgr:
    • Database management logic from cylc.scheduler, cylc.task_pool and cylc.task_proxy.
    • Manage queueing of database operations - improve efficiency by using a single set of queues.
    • Manage database recovery, clean up, etc.
  • Rationalise task state reset logic, by reducing the number of entry points to the reset_state and set_state methods. Apart from hold and release, all state change will now call reset_state.
  • Close Allow state reset of individual message outputs.  #1550. Rewrite cylc.task_outputs to allow manual reset of individual outputs to complete/incomplete. (Outputs can be referenced by either the trigger names or by the message strings.)
  • Close "cylc submit" should be able to submit a family #1182. cylc submit can now submit multiple families and tasks in a single command.

[HJO: Close #2228.]

@matthewrmshin matthewrmshin added this to the soon milestone Feb 10, 2017
@matthewrmshin matthewrmshin self-assigned this Feb 10, 2017
@matthewrmshin matthewrmshin force-pushed the refactor-task-proxy branch 5 times, most recently from 04f7a49 to 6df5d60 Compare February 24, 2017 12:59
@matthewrmshin matthewrmshin changed the title Refactor cylc.task_proxy 1 Refactor cylc.scheduler, cylc.task_pool and cylc.task_proxy Mar 6, 2017
@matthewrmshin matthewrmshin changed the title Refactor cylc.scheduler, cylc.task_pool and cylc.task_proxy scheduler, task_pool, task_proxy refactor Mar 6, 2017
@matthewrmshin matthewrmshin force-pushed the refactor-task-proxy branch 2 times, most recently from b62e471 to 38559cb Compare March 13, 2017 11:19
@matthewrmshin matthewrmshin force-pushed the refactor-task-proxy branch 5 times, most recently from e573c0b to 1e06098 Compare March 24, 2017 10:54
@matthewrmshin
Copy link
Contributor Author

matthewrmshin commented Mar 24, 2017

Although I have got some more ideas, I think I have done enough here already. I am running the profile battery at the moment. The 1st set of results based on the complex suite is showing promises.

Version Run Elapsed Time (s) CPU Time - Total (s) Max Memory (kb)
7.2.1 complex suite 1547.7 609.3 90024.0
HEAD complex suite 1380.7 489.2 86108.0

(where HEAD is HEAD of this branch.)

@matthewrmshin
Copy link
Contributor Author

Some other ideas that should be in new issues.

(I'd quite like to have a rethink of our usage of the multi-processing pool. Currently, we start a pool of N child processes before the suite starts, which are kept in memory. This should not be necessary if we can just manage a simple pool of sub-processes - given that all our usages are to run external commands. Not keeping child processes in memory should, in theory, reduces the memory footprint during quiet times while a suite is waiting for events.)

(I'd also like to have a rethink of the poll timer logic. Instead of a poll timer per task, it is probably more efficient to have a poll timer per (host, batch system) combo. If we are polling a task on a given (host, batch system), we might as well poll the other tasks on the same combo as well. On job submission, a task will subscribe its job to the relevant poll timer, which will then manage the poll timing to suit all the subscribed jobs. E.g. if several jobs need to be polled within a small interval of time, they will actually be done together once.)

@matthewrmshin
Copy link
Contributor Author

cylc run of the complex suite today (adding 7.0.0 in the mix) gives the following stats:

Version Run Elapsed Time (s) CPU Time - Total (s) Max Memory (kb)
7.0.0 complex suite 1510.3 588.0 94064.0
7.2.1 complex suite 1584.9 621.8 89876.0
HEAD complex suite 1378.7 488.0 84444.0

cylc validate of the complex suite gives the following stats:

Version Run Elapsed Time (s) CPU Time - Total (s) Max Memory (kb)
7.0.0 complex suite 16.5 16.1 79780.0
7.2.1 complex suite 16.2 15.9 79788.0
HEAD complex suite 15.6 15.4 78452.0

All numbers going the right direction for this branch.

@matthewrmshin
Copy link
Contributor Author

(This branch will conflict with #2220, so I'll suggest that we don't merge this until after #2220. I am happy to deal with the conflicts.)

@hjoliver @oliver-sanders please have an initial look of this change.

Task event handlers no longer hang onto their originating task proxy
objects.
Run time database management moved from scheduler and task_pool to
own module.
Decouple logging and database update between task_proxy and task_state.
Move task message processing logic from `cylc.task_proxy` to
`cylc.task_events_mgr`.
Move task poll timer logic from `cylc.task_proxy` to
`cylc.task_job_mgr`.
Task events setup logic moved from `cylc.task_proxy` to
`cylc.task_events_mgr`.
Remove database logic from `cylc.task_proxy`.
Move task logging logic to `cylc.suite_logging`.
Unify task state reset logic.

Refactor task output logic, and allow resetting completion status of
outputs by trigger string or message string.
Remove broadcast server dependency from task_proxy.
Simplify (re)try timers logic.
`cylc submit` can now submit multiple tasks and families.
Test `cylc reset --output=OUTPUT ...`.
Test `cylc submit FAM.CYCLE` - multi-level family.
Minor tweaks to cylc.scheduler related to review comments.
Improve execution time limit timer logic.
More comments in new modules.
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to approve and not spend several more days staring at the code

Agreed.

This all looks in order to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants