Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Update memory and threading estimates #1428

Merged
merged 94 commits into from
Jan 30, 2021

Conversation

shnizzedy
Copy link
Member

Fixes

Related to #1404 by @shnizzedy (possibly resolves)

Description

  • Increases default memory usage per Node estimate from 0.2 to 2.0
  • Sets default limit of threads for MapNodes to 1
  • Increases several specific Nodes' memory usage estimates
  • For each subject run, generates
    • interactive HTML Gantt chart
    • text report for observed resource usage above estimates/limits

Technical details

Resource estimates and limits

  1. Creates our own Node and MapNode subclasses of the Nipype classes to set new global defaults:
    class Node(pe.Node):
    __doc__ = _doctest_skiplines(
    pe.Node.__doc__,
    {" >>> realign.inputs.in_files = 'functional.nii'"}
    )
    __init__ = partialmethod(pe.Node.__init__, mem_gb=DEFAULT_MEM_GB)
    class MapNode(pe.MapNode):
    __doc__ = _doctest_skiplines(
    f'mem_gb={DEFAULT_MEM_GB}\n\nn_procs=1\n\n{pe.MapNode.__doc__}',
    {" ... 'functional3.nii']"}
    )
    __init__ = partialmethod(pe.MapNode.__init__, mem_gb=DEFAULT_MEM_GB,
    n_procs=1)
  2. Set global default mem_gb in one place:
  3. For our Nipype pipeline engine, import all interfaces from the supermodule and override just Node and MapNode:
    from nipype.pipeline import engine as pe
    # import __all__ from nipype.pipeline.engine
    from nipype.pipeline.engine import * # noqa F401
    # import DEFAULT_MEM_GB and override Node, MapNode
    from .engine import DEFAULT_MEM_GB, Node, MapNode
    __all__ = [
    interface for interface in dir(pe) if not interface.startswith('_')
    ] + ['DEFAULT_MEM_GB', 'Node', 'MapNode']
    del pe
  4. Use our Nipype pipeline engine anywhere we use a Nipype pipeline engine, e.g.,
    from CPAC.pipeline import nipype_pipeline_engine as pe
    instead of
    import nipype.pipeline.engine as pe

Reporting resource usage

  1. Converts CPAC.utils.monitoring from a single-file submodule to a full directory submodule
  2. Imports the Nipype Gantt chart fix from FIX: Restore generate_gantt_chart functionality nipy/nipype#3290
  3. Adds a script to check observed memory and estimated thread usage in each node in callback.log against that node's estimated memory usage and requested thread limit:
    cb_dict_list = log_to_dict(cblog)
    excessive = {node['id']: [
    node['runtime_memory_gb']if node.get('runtime_memory_gb', 0)
    > node.get('estimated_memory_gb', 1) else None,
    node['estimated_memory_gb'] if node.get('runtime_memory_gb', 0)
    > node.get('estimated_memory_gb', 1) else None,
    node['runtime_threads'] - 1 if node.get('runtime_threads', 0) - 1
    > node.get('num_threads', 1) else None,
    node['num_threads'] if node.get('runtime_threads', 0) - 1
    > node.get('num_threads', 1) else None
    ] for node in [node for node in cb_dict_list if (
    node.get('runtime_memory_gb', 0) > node.get('estimated_memory_gb', 1)
    or node.get('runtime_threads', 0) - 1 > node.get('num_threads', 1)
    )]}
    text_report = ''
    if excessive:
    text_report += 'The following nodes used excessive resources:\n'
    dotted_line = '-' * (len(text_report) - 1) + '\n'
    text_report += dotted_line
    for node in excessive:
    node_id = '\n .'.join(node.split('.'))
    text_report += f'\n{node_id}\n'
    if excessive[node][0]:
    text_report += ' **memory_gb**\n' \
    ' runtime > estimated\n' \
    f' {excessive[node][0]} ' \
    f'> {excessive[node][1]}\n'
    if excessive[node][2]:
    text_report += ' **threads**\n runtime > limit\n' \
    f' {excessive[node][2]} ' \
    f'> {excessive[node][3]}\n'
    text_report += dotted_line
    return text_report, excessive
  4. Runs both Gantt HTML report generation and text resource overusage report generation after a subject finishes running. Throws a warning if either report generation fails.

Tests

Checklist

  • My pull request has a descriptive title (not a vague title like Update index.md).
  • My pull request targets the develop_v1.8_convergence branch of the repository.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added tests for the changes I made (if applicable).
  • I updated the changelog.
  • I added or updated documentation (if applicable): 📝 Add Nodes to developer documentation fcp-indi.github.io#247
  • I tried running the project locally and verified that there are no
    visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

jarrodmillman and others added 30 commits July 15, 2009 19:55
Re-basing code with nipype master branch
…untime Bunch object only for runtime stats storage instead of using results dictionary, renamed ResourceMultiProc to MultiProc for backwards-compatiblity
New interfaces update into sgiavasis/nipype
3dAllineate out_matrix output file handling fix
shnizzedy and others added 22 commits January 6, 2021 09:54
* exclude nodes without timing information from Gantt chart
* fall back on "id" or empty string if no "name" in node
@sgiavasis
Copy link
Collaborator

Long-awaited improvement 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.