⚡️ Update memory and threading estimates #1428

shnizzedy · 2021-01-29T20:38:51Z

Fixes

Related to #1404 by @shnizzedy (possibly resolves)

Description

Increases default memory usage per Node estimate from 0.2 to 2.0
Sets default limit of threads for MapNodes to 1
Increases several specific Nodes' memory usage estimates
For each subject run, generates
- interactive HTML Gantt chart
- text report for observed resource usage above estimates/limits

Technical details

Resource estimates and limits

Creates our own Node and MapNode subclasses of the Nipype classes to set new global defaults:

C-PAC/CPAC/pipeline/nipype_pipeline_engine/engine.py

Lines 46 to 62 in d7cd96e

    
           class Node(pe.Node): 
        
               __doc__ = _doctest_skiplines( 
        
                   pe.Node.__doc__, 
        
                   {"    >>> realign.inputs.in_files = 'functional.nii'"} 
        
               ) 
        
               __init__ = partialmethod(pe.Node.__init__, mem_gb=DEFAULT_MEM_GB) 
        
           class MapNode(pe.MapNode): 
        
               __doc__ = _doctest_skiplines( 
        
                   f'mem_gb={DEFAULT_MEM_GB}\n\nn_procs=1\n\n{pe.MapNode.__doc__}', 
        
                   {"    ...                           'functional3.nii']"} 
        
               ) 
        
               __init__ = partialmethod(pe.MapNode.__init__, mem_gb=DEFAULT_MEM_GB, 
        
                                        n_procs=1)

Set global default mem_gb in one place:

C-PAC/CPAC/pipeline/nipype_pipeline_engine/engine.py

Line 10 in d7cd96e

DEFAULT_MEM_GB = 2.0

For our Nipype pipeline engine, import all interfaces from the supermodule and override just Node and MapNode:

C-PAC/CPAC/pipeline/nipype_pipeline_engine/__init__.py

Lines 6 to 16 in d7cd96e

    
           from nipype.pipeline import engine as pe 
        
           # import __all__ from nipype.pipeline.engine 
        
           from nipype.pipeline.engine import *  # noqa F401 
        
           # import DEFAULT_MEM_GB and override Node, MapNode 
        
           from .engine import DEFAULT_MEM_GB, Node, MapNode 
        
           __all__ = [ 
        
               interface for interface in dir(pe) if not interface.startswith('_') 
        
           ] + ['DEFAULT_MEM_GB', 'Node', 'MapNode'] 
        
           del pe

Use our Nipype pipeline engine anywhere we use a Nipype pipeline engine, e.g.,

C-PAC/CPAC/pipeline/cpac_pipeline.py

Line 18 in d7cd96e

from CPAC.pipeline import nipype_pipeline_engine as pe

instead of

C-PAC/CPAC/pipeline/cpac_pipeline.py

Line 18 in 0583f19

import nipype.pipeline.engine as pe

Reporting resource usage

Converts CPAC.utils.monitoring from a single-file submodule to a full directory submodule
Imports the Nipype Gantt chart fix from FIX: Restore generate_gantt_chart functionality nipy/nipype#3290

Adds a script to check observed memory and estimated thread usage in each node in callback.log against that node's estimated memory usage and requested thread limit:

C-PAC/CPAC/utils/monitoring/draw_gantt_chart.py

Lines 490 to 522 in d7cd96e

    
           cb_dict_list = log_to_dict(cblog) 
        
           excessive = {node['id']: [ 
        
               node['runtime_memory_gb']if node.get('runtime_memory_gb', 0) 
        
               > node.get('estimated_memory_gb', 1) else None, 
        
               node['estimated_memory_gb'] if node.get('runtime_memory_gb', 0) 
        
               > node.get('estimated_memory_gb', 1) else None, 
        
               node['runtime_threads'] - 1 if node.get('runtime_threads', 0) - 1 
        
               > node.get('num_threads', 1) else None, 
        
               node['num_threads'] if node.get('runtime_threads', 0) - 1 
        
               > node.get('num_threads', 1) else None 
        
           ] for node in [node for node in cb_dict_list if ( 
        
               node.get('runtime_memory_gb', 0) > node.get('estimated_memory_gb', 1) 
        
               or node.get('runtime_threads', 0) - 1 > node.get('num_threads', 1) 
        
           )]} 
        
           text_report = '' 
        
           if excessive: 
        
               text_report += 'The following nodes used excessive resources:\n' 
        
               dotted_line = '-' * (len(text_report) - 1) + '\n' 
        
               text_report += dotted_line 
        
               for node in excessive: 
        
                   node_id = '\n  .'.join(node.split('.')) 
        
                   text_report += f'\n{node_id}\n' 
        
                   if excessive[node][0]: 
        
                       text_report += '      **memory_gb**\n' \ 
        
                                      '        runtime > estimated\n' \ 
        
                                      f'        {excessive[node][0]} ' \ 
        
                                      f'> {excessive[node][1]}\n' 
        
                   if excessive[node][2]: 
        
                       text_report += '      **threads**\n        runtime > limit\n' \ 
        
                                      f'        {excessive[node][2]} ' \ 
        
                                      f'> {excessive[node][3]}\n' 
        
               text_report += dotted_line 
        
           return text_report, excessive

Runs both Gantt HTML report generation and text resource overusage report generation after a subject finishes running. Throws a warning if either report generation fails.

Tests

Text resource overusage reports and HTML gantt charts now generated upon completion of each run.
Small unit test

C-PAC/CPAC/pipeline/nipype_pipeline_engine/engine.py

Lines 30 to 31 in d7cd96e

>>> _doctest_skiplines('skip this line', {'skip this line'})

'skip this line # doctest: +SKIP'

Checklist

My pull request has a descriptive title (not a vague title like Update index.md).
My pull request targets the develop_v1.8_convergence branch of the repository.
My commit messages follow best practices.
My code follows the established code style of the repository.
I added tests for the changes I made (if applicable).
I updated the changelog.
I added or updated documentation (if applicable): 📝 Add Nodes to developer documentation fcp-indi.github.io#247
I tried running the project locally and verified that there are no
visible errors.

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

git-svn-id: https://nipy.svn.sourceforge.net/svnroot/nipy/nipype/trunk@2 ead46cd0-7350-4e37-8683-fc4c6f79bf00

git-svn-id: https://nipy.svn.sourceforge.net/svnroot/nipy/nipype/trunk@3 ead46cd0-7350-4e37-8683-fc4c6f79bf00

git-svn-id: https://nipy.svn.sourceforge.net/svnroot/nipy/nipype/trunk@4 ead46cd0-7350-4e37-8683-fc4c6f79bf00

…ed_memory

Re-basing code with nipype master branch

…untime Bunch object only for runtime stats storage instead of using results dictionary, renamed ResourceMultiProc to MultiProc for backwards-compatiblity

New interfaces update into sgiavasis/nipype

Debug runtime prof

3dAllineate out_matrix output file handling fix

* exclude nodes without timing information from Gantt chart * fall back on "id" or empty string if no "name" in node

Ref https://miykael.github.io/nipype_tutorial/notebooks/basic_mapnodes.html Co-authored-by: Michael Notter <michaelnotter@hotmail.com>

…ates

sgiavasis · 2021-01-30T01:22:41Z

Long-awaited improvement 🔥

⚡️ Update memory and threading estimates

jarrodmillman and others added 30 commits July 15, 2009 19:55

default directory layout

4e27c23

git-svn-id: https://nipy.svn.sourceforge.net/svnroot/nipy/nipype/trunk@2 ead46cd0-7350-4e37-8683-fc4c6f79bf00

added structure of basic directories to trunk

5b14f74

git-svn-id: https://nipy.svn.sourceforge.net/svnroot/nipy/nipype/trunk@3 ead46cd0-7350-4e37-8683-fc4c6f79bf00

added subdirectories to nipype trunk

bda300e

git-svn-id: https://nipy.svn.sourceforge.net/svnroot/nipy/nipype/trunk@4 ead46cd0-7350-4e37-8683-fc4c6f79bf00

add memory and thread to gantt chart, callback handles errors

e68ed07

add tests

f487de3

add attribute real_memory to interface, change attr memory to estimat…

66f1a2a

…ed_memory

Removed all of the ResourceMultiProc plugin so the S3 datasink

9a1e2a3

change old namespaces

c209f88

Merge pull request #6 from nipy/master

e1946f8

Re-basing code with nipype master branch

Merged resource_multiproc into s3_multiproc

8bf2725

Merge branch 's3_multiproc' into resource_multiproc

8c90deb

Re-pulled in changes from github

bdd7d95

Cleaned up some of the code to PEP8 and checked for errors

8f46956

Changed memory parameters to be memory_gb to be more explicit, used r…

943b34e

…untime Bunch object only for runtime stats storage instead of using results dictionary, renamed ResourceMultiProc to MultiProc for backwards-compatiblity

Fixed exception formatting and import error

de848aa

improve thread draw algorithm

0f04b8a

minor bugs

6ec02e3

partial commit to gantt chart

57eccbe

Fixed up gantt chart to plot real time memory

86a3763

Finished working prototype of gantt chart generator

7212dd7

remove white space, add labels

4d62930

Added global watcher

741cff1

Merge pull request #10 from FCP-INDI/new_interfaces

be49692

New interfaces update into sgiavasis/nipype

Debug code

ae39f26

Removed print debug statement

9ef0da3

Merge pull request #17 from FCP-INDI/debug_runtime_prof

62bd5bf

Debug runtime prof

Merge pull request #15 from sgiavasis/master

13854f6

3dAllineate out_matrix output file handling fix

Modified thread-monitoring logic and ensured unit tests pass

6fb95f8

Lowered memory usage for unit tests

dd4a7ca

Removed resource_multiproc code so only new_interfaces code is left

31ecb49

shnizzedy and others added 22 commits January 6, 2021 09:54

🐛 Convert timing values to datetimes from strings

bfa707b

* exclude nodes without timing information from Gantt chart * fall back on "id" or empty string if no "name" in node

🥅 Reduce double logging from exception to warning

5b46a80

✅ Add test for draw_gantt_chart

8aff589

🚨 Automatic linting by pre-commit

fb092aa

🚸 Don't restrict nan timestamps to predetermined options

c291171

🚚 Copy draw_gantt_chart from nipype into C-PAC

16b2bf3

⚡ Set default Node and MapNode mem_gb to 2.0

69c6f06

✅ Add tests for overridden Node and MapNode

8b45e77

Ref https://miykael.github.io/nipype_tutorial/notebooks/basic_mapnodes.html Co-authored-by: Michael Notter <michaelnotter@hotmail.com>

⚡ Set mem_gb estimates > 2.0

3e6b667

🔥 Remove unused (and circular) imports

1e9d8b4

🚚 Move monitoring into its own submodule

11283e6

🔀 Merge nipype:fix/gantt-chart into covergence/memory-estimates

2e95383

⚡ Set number of threads used to math.ceil(cpu_percent/100)

5883b9a

🚨 Lint monitoring

ee68d1f

♻️ Import unchanged functions directly from nipype

f7f61bf

🚨 Lint draw_gantt_chart

77aebe5

🥅 Generate text and HTML resource usage reports

044370f

⚡ Set default n_procs for MapNode to 1

ac1dfd1

🔊 Save resource overusage report to file

6b6d6f7

🔀 Fix config headers from merge f15fe5f

5aed1ac

🚸 Handle empty creds_path

d7cd96e

Merge branch 'develop_v1.8_convergence' into convergence/memory-estim…

aeae2ae

…ates

sgiavasis merged commit 827ca3e into develop_v1.8_convergence Jan 30, 2021

shnizzedy deleted the convergence/memory-estimates branch February 19, 2021 20:42

shnizzedy mentioned this pull request Apr 22, 2021

⚡️ Make memory estimates data-dependent #1480

Open

shnizzedy pushed a commit that referenced this pull request Nov 5, 2021

Merge pull request #1428 from FCP-INDI/convergence/memory-estimates

1a05d86

⚡️ Update memory and threading estimates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Update memory and threading estimates #1428

⚡️ Update memory and threading estimates #1428

shnizzedy commented Jan 29, 2021

sgiavasis commented Jan 30, 2021

	class Node(pe.Node):
	__doc__ = _doctest_skiplines(
	pe.Node.__doc__,
	{" >>> realign.inputs.in_files = 'functional.nii'"}
	)

	__init__ = partialmethod(pe.Node.__init__, mem_gb=DEFAULT_MEM_GB)


	class MapNode(pe.MapNode):
	__doc__ = _doctest_skiplines(
	f'mem_gb={DEFAULT_MEM_GB}\n\nn_procs=1\n\n{pe.MapNode.__doc__}',
	{" ... 'functional3.nii']"}
	)

	__init__ = partialmethod(pe.MapNode.__init__, mem_gb=DEFAULT_MEM_GB,
	n_procs=1)

	from nipype.pipeline import engine as pe
	# import __all__ from nipype.pipeline.engine
	from nipype.pipeline.engine import * # noqa F401
	# import DEFAULT_MEM_GB and override Node, MapNode
	from .engine import DEFAULT_MEM_GB, Node, MapNode

	__all__ = [
	interface for interface in dir(pe) if not interface.startswith('_')
	] + ['DEFAULT_MEM_GB', 'Node', 'MapNode']

	del pe

	cb_dict_list = log_to_dict(cblog)
	excessive = {node['id']: [
	node['runtime_memory_gb']if node.get('runtime_memory_gb', 0)
	> node.get('estimated_memory_gb', 1) else None,
	node['estimated_memory_gb'] if node.get('runtime_memory_gb', 0)
	> node.get('estimated_memory_gb', 1) else None,
	node['runtime_threads'] - 1 if node.get('runtime_threads', 0) - 1
	> node.get('num_threads', 1) else None,
	node['num_threads'] if node.get('runtime_threads', 0) - 1
	> node.get('num_threads', 1) else None
	] for node in [node for node in cb_dict_list if (
	node.get('runtime_memory_gb', 0) > node.get('estimated_memory_gb', 1)
	or node.get('runtime_threads', 0) - 1 > node.get('num_threads', 1)
	)]}
	text_report = ''
	if excessive:
	text_report += 'The following nodes used excessive resources:\n'
	dotted_line = '-' * (len(text_report) - 1) + '\n'
	text_report += dotted_line
	for node in excessive:
	node_id = '\n .'.join(node.split('.'))
	text_report += f'\n{node_id}\n'
	if excessive[node][0]:
	text_report += ' memory_gb\n' \
	' runtime > estimated\n' \
	f' {excessive[node][0]} ' \
	f'> {excessive[node][1]}\n'
	if excessive[node][2]:
	text_report += ' threads\n runtime > limit\n' \
	f' {excessive[node][2]} ' \
	f'> {excessive[node][3]}\n'
	text_report += dotted_line
	return text_report, excessive

	>>> _doctest_skiplines('skip this line', {'skip this line'})
	'skip this line # doctest: +SKIP'

⚡️ Update memory and threading estimates #1428

⚡️ Update memory and threading estimates #1428

Conversation

shnizzedy commented Jan 29, 2021

Fixes

Description

Technical details

Resource estimates and limits

Reporting resource usage

Tests

Checklist

Developer Certificate of Origin

sgiavasis commented Jan 30, 2021