Skip to content

Profiling kernels with additional function in psyclone tools#339

Open
Matthew Walker (mattatmet) wants to merge 4 commits intoMetOffice:mainfrom
mattatmet:profiling-kernels
Open

Profiling kernels with additional function in psyclone tools#339
Matthew Walker (mattatmet) wants to merge 4 commits intoMetOffice:mainfrom
mattatmet:profiling-kernels

Conversation

@mattatmet
Copy link
Copy Markdown

@mattatmet Matthew Walker (mattatmet) commented Apr 20, 2026

PR Summary

Sci/Tech Reviewer: Oakley Brunt (@oakleybrunt)
Code Reviewer: Ed Hone (@EdHone)

This PR consists of the addition of a function called profile_loops to the psyclone_tools.py file. This function is called by users in the optimisation scripts in apps and uses ProfileTrans from Psyclone to wrap the outermost loops of individual kernels with timing callipers. The names of the callipers are generated in the function, using the file name, invoke name, kernel name and the number of each kernel in that particular invoke.

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Core rose-stem suite
  • If required (e.g. API changes) I have also run the LFRic Apps test suite using this branch
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

Test Suite Results - lfric_core - profiling_dev_core/run7

Suite Information

Item Value
Suite Name profiling_dev_core/run7
Suite User matthew.walker
Workflow Start 2026-04-27T09:11:25
Groups Run developer
Dependency Reference Main Like
lfric_core mattatmet/lfric_core@profiling-kernels False
SimSys_Scripts MetOffice/SimSys_Scripts@4387949 True

Task Information

✅ succeeded tasks - 388

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@github-actions github-actions Bot added the cla-required The CLA has not yet been signed by the author of this PR - added by GA label Apr 20, 2026
@github-actions github-actions Bot added cla-signed The CLA has been signed as part of this PR - added by GA and removed cla-required The CLA has not yet been signed by the author of this PR - added by GA labels Apr 21, 2026
@mattatmet Matthew Walker (mattatmet) marked this pull request as ready for review April 21, 2026 10:19
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments for your consideration. If, after considering them, you decide not to act on them, close the comment. Otherwise leave it open for me to use as a marker.

I notice there's no code reviewer yet. It will be easier if one is sorted out before I approve this change.


:param psyir: the PSyIR of the PSy-layer.
:param colours_only: profile only the coloured kernels. Default True.
:type psyir: :py:class:`psyclone.psyir.nodes.FileContainer`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use type hinting to include this information in a syntactically significant fashion. e.g.

def profile_loops(psyir: FileContainer, colours_only=True):

This is more succinct than the sphinx form and can be used by tools such as mypy for static type checking.

Note that there is no need to specify colours_only: bool = True, although you can, because the type may be inferred from the default.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this tip Matthew. I've added the type hint to this function and the other functions in psyclone_tools.

Comment on lines +118 to +119
# Add timing calipers to coloured loops. This should be done
# before the application of the openmp transformation.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a means to enforce this ordering?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this comment Matthew, it's been really good to check. I hadn't thought about how strict this condition should be so I tested it and it turns out that a check for this ordering was definitely warranted as well as another (making sure the function isn't called before colour_loops as well). I've added a few raise TransformationError lines so that it fails gracefully/clearly when the function is called out of order.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hacka Fett (@christophermaynard) Joerg Henrichs (@hiker): one thing that came up here was that Psyclone raises an error when a Profile node is placed between an OMPParallelDoDirective node and a Loop node. An idea for the future could be setting Psyclone up so that it allows it when you want to profile different threads?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hacka Fett (Hacka Fett (@christophermaynard)) Joerg Henrichs (Joerg Henrichs (@hiker)): one thing that came up here was that Psyclone raises an error when a Profile node is placed between an OMPParallelDoDirective node and a Loop node. An idea for the future could be setting Psyclone up so that it allows it when you want to profile different threads?

I don't think that's valid Fortran: after an omp parallel do there must be a loop, we can't put a call there.
Tools that support measurements for threads (well ... at least tau does :) ) either instrument the omp directives (i.e. source to source transformation, inserting calls), or using the ompt call back functionality.

Comment thread CONTRIBUTORS.md
| mo-lucy-gordon | Lucy Gordon | Met Office | 2026-03-18 |
| shreybh1 | Shrey Bhardwaj | Met Office | 2026-03-26 |

| mattatmet | Matthew Walker | Met Office | 2026-04-21 |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless it states otherwise in the developer documentation, it is probably best to insert your name in alphabetical order by family name. This may be difficult if others have not been doing so.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, there is no clear structure or order to this file, alphabetical or otherwise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt, looking like a very useful change! There are just a few things that could be cleaner and some style changes I'd like to see as well.

Feel free to push back on any of my suggestions :)

Comment on lines +139 to +140
# Insert profiler calls before loop over colours
if (not colours_only and not loop.loop_type in leave_loops) or loop.loop_type == "colours":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check could be more clearly signalled, e.g.:

Suggested change
# Insert profiler calls before loop over colours
if (not colours_only and not loop.loop_type in leave_loops) or loop.loop_type == "colours":
# Always profile coloured loops. Optionally profile all loops if
# colours_only=False and loop not over anything in `leave_loops`
if ((loop.loop_type == "colours") or
(colours_only is False and loop.loop_type not in leave_loops)):

I know that I have used an extra pair of parentheses which are not strictly required, but it visually separates the two checks which I think is easier to read.

Please check that this works if you agree with this suggestion. I am trusting myself here and haven't tried to run this.

k_name = loop.ancestor(InvokeSchedule).coded_kernels()[count].name
invoke_name = loop.ancestor(InvokeSchedule).invoke.name
file_name = loop.ancestor(Container).name
options = {"region_name": (file_name,invoke_name + ":" + k_name + "_k" + str(count))}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long, probably worth creating the region name above:

Suggested change
options = {"region_name": (file_name,invoke_name + ":" + k_name + "_k" + str(count))}
# Make region name
region_name = invoke_name + ":" + k_name + "_k" + str(count)
options = {"region_name": (file_name, region_name)}

Comment on lines +146 to +148
k_name = loop.ancestor(InvokeSchedule).coded_kernels()[count].name
invoke_name = loop.ancestor(InvokeSchedule).invoke.name
file_name = loop.ancestor(Container).name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be nice here to let others know you are getting the parts to name the profiling calliper.

invoke_name = loop.ancestor(InvokeSchedule).invoke.name
file_name = loop.ancestor(Container).name
options = {"region_name": (file_name,invoke_name + ":" + k_name + "_k" + str(count))}
profile_trans.apply(loop,options=options)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
profile_trans.apply(loop,options=options)
profile_trans.apply(loop, options=options)

file_name = loop.ancestor(Container).name
options = {"region_name": (file_name,invoke_name + ":" + k_name + "_k" + str(count))}
profile_trans.apply(loop,options=options)
count += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about why we are using a counter would go a long way here.

Something like "Allows invokes of the same name to be profiled individually"

if loop.ancestor(OMPParallelDirective) or loop.ancestor(OMPParallelDoDirective) \
or loop.ancestor(OMPDoDirective):
raise TransformationError(
"Must apply profile_loops BEFORE openmp_parellelise_loops function in optimisation script.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long, we should be using an 80 character limit.

Suggested change
"Must apply profile_loops BEFORE openmp_parellelise_loops function in optimisation script.")
"Must apply profile_loops BEFORE "
"openmp_parellelise_loops function in optimisation "
"script.")

# Check if the profiling calipers have been added before the colouring.
if isinstance(child, ProfileNode):
raise TransformationError(
"Must apply colour_loops BEFORE profile_loops function in optimisation script.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long:

Suggested change
"Must apply colour_loops BEFORE profile_loops function in optimisation script.")
"Must apply colour_loops BEFORE profile_loops function in "
"optimisation script.")

ctrans.apply(child, options={"tiling": enable_tiling})

# -----------------------------------------------------------------------------
def profile_loops(psyir: FileContainer,colours_only=True):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def profile_loops(psyir: FileContainer,colours_only=True):
def profile_loops(psyir: FileContainer, colours_only=True):

Comment on lines +142 to +143
if loop.ancestor(OMPParallelDirective) or loop.ancestor(OMPParallelDoDirective) \
or loop.ancestor(OMPDoDirective):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long:

Suggested change
if loop.ancestor(OMPParallelDirective) or loop.ancestor(OMPParallelDoDirective) \
or loop.ancestor(OMPDoDirective):
if (loop.ancestor(OMPParallelDirective) or
loop.ancestor(OMPParallelDoDirective) or
loop.ancestor(OMPDoDirective)):

@mattatmet
Copy link
Copy Markdown
Author

Oakley Brunt (@oakleybrunt) Thanks for those suggested changes, the code looks much better now. I've added them, tested them with both core and apps and committed them. I've updated the trac.log to reflect the recent core test as well. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed as part of this PR - added by GA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Addition of profiling function for individual kernels

5 participants