Skip to content

Checkpoint the lfric_apps random seed#427

Open
Steve Mullerworth (stevemullerworth) wants to merge 6 commits intoMetOffice:mainfrom
stevemullerworth:checkpoint_seed
Open

Checkpoint the lfric_apps random seed#427
Steve Mullerworth (stevemullerworth) wants to merge 6 commits intoMetOffice:mainfrom
stevemullerworth:checkpoint_seed

Conversation

@stevemullerworth
Copy link
Copy Markdown
Collaborator

@stevemullerworth Steve Mullerworth (stevemullerworth) commented Apr 10, 2026

PR Summary

Sci/Tech Reviewer: allynt
Code Reviewer: Pierre Siddall (@Pierre-siddall)

Several lfric_atm configurations use random number generators. This PR improves checkpoint/restart for the random seed to support bit-reproducibility across checkpoint/restart boundaries.

The XIOS API does not support writing integers to the checkpoint dump. Previously, the random seed was converted to a real type for checkpointing (using the io_value_type). However, io_value_type is hard-wired to use r_def kind, which offers insufficient precision for storing a random number seed.

This change uses a new integer_io_value_type developed in lfric_core linked MetOffice/lfric_core#326 which checkpoints integer values regardless of real precision settings.

The change affects KGOs of CRUNs of configurations that use random numbers.

The change enabled ral3-seuk to bit-compare across checkpoint/restart boundaries, so nrun/crun comparison was enabled for this test (it requires an extra 2:30 minute run of a 16 node job to generate the long run checksum to compare with the matching nrun+crun)

Other configurations with CRUNs are not, as yet, fully fixed by this change as they use other arrays whose evolution depends on the random seed: PR #383 deals with some clim_gal9 stochastic physics SKEB and SPT schemes (where nrun/crun comparison is possible for 64-bit configurations). Issue #426 refers to the stochastic physics random parameter scheme.

MetOffice/lfric_core#326

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Apps rose-stem suite
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

Test Suite Results - lfric_apps - chkpt_seed_review/run1

Suite Information

Item Value
Suite Name chkpt_seed_review/run1
Suite User steve.mullerworth
Workflow Start 2026-04-30T15:49:06
Groups Run all
Dependency Reference Main Like
casim MetOffice/casim@2026.03.2 True
jules MetOffice/jules@2026.03.2 True
lfric_apps stevemullerworth/lfric_apps@checkpoint_seed False
lfric_core stevemullerworth/lfric_core@checkpoint_seed False
moci MetOffice/moci@2026.03.2 True
SimSys_Scripts MetOffice/SimSys_Scripts@2026.03.2 True
socrates MetOffice/socrates@2026.03.2 True
socrates-spectral MetOffice/socrates-spectral@2026.03.2 True
ukca MetOffice/ukca@2026.03.2 True

Task Information

✅ succeeded tasks - 1513

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@stevemullerworth Steve Mullerworth (stevemullerworth) added enhancement New feature or request Linked Core This PR is linked to a MetOffice/lfric_core PR KGO This PR contains changes to KGO labels May 1, 2026
@stevemullerworth Steve Mullerworth (stevemullerworth) marked this pull request as ready for review May 1, 2026 10:32
@stevemullerworth Steve Mullerworth (stevemullerworth) requested review from mo-lucy-gordon and removed request for a team May 1, 2026 10:32
@github-actions github-actions Bot requested a review from allynt May 1, 2026 10:33
Copy link
Copy Markdown
Contributor

@iboutle iboutle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks fine - I'm confused though as to whether this resolves the issue with test introduced in #383 having to be 64-bit, i.e. could that be changed to 32-bit or is there still more work to do for that?

@stevemullerworth
Copy link
Copy Markdown
Collaborator Author

Generally looks fine - I'm confused though as to whether this resolves the issue with test introduced in #383 having to be 64-bit, i.e. could that be changed to 32-bit or is there still more work to do for that?

This branch is developed from stable, so the test won't work anyway till head (including the checkpointing of the Stochastic Physics fields) is merged in. It could be done once sci/tech is passed, or as a separate PR.

The change adds an nrun/crun test for ral3-seuk 32-bit which I think used random numbers.

@iboutle
Copy link
Copy Markdown
Contributor

iboutle commented May 1, 2026

Generally looks fine - I'm confused though as to whether this resolves the issue with test introduced in #383 having to be 64-bit, i.e. could that be changed to 32-bit or is there still more work to do for that?

This branch is developed from stable, so the test won't work anyway till head (including the checkpointing of the Stochastic Physics fields) is merged in. It could be done once sci/tech is passed, or as a separate PR.

The change adds an nrun/crun test for ral3-seuk 32-bit which I think used random numbers.

Ah yes, that makes sense - I think it would be worth doing that, either on this PR when merging up to main, or a separate PR, whichever is easier.

@github-actions github-actions Bot added the cla-modified The CLA has been modified as part of this PR - added by GA label May 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

⚠️ Hello Steve Mullerworth (@stevemullerworth)!

Your CLA signature was found on the base branch, but you appear to have modified the CONTRIBUTORS.md file in this PR.

Please do not edit the CONTRIBUTORS.md file. If you have already signed the CLA, revert changes to the file and your signature will be picked up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-modified The CLA has been modified as part of this PR - added by GA enhancement New feature or request KGO This PR contains changes to KGO Linked Core This PR is linked to a MetOffice/lfric_core PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants