Allow extending baseline phase by samsrabin · Pull Request #4951 · ESMCI/cime

samsrabin · 2026-03-20T21:07:02Z

Description

Adds a new SystemTestsCommon method, additional_baseline_generation(), which the user can implement to perform extra operations during baseline generation. For example:

The user might have performed postprocessing during RUN that generated files to be included in the baseline.
The user might want to copy all files from a certain history tape, not just the last one.

I also changed it so that the GENERATE_BASELINE status isn't marked as succeeding or failing until after logs are copied (and now also after additional_baseline_generation()).

I have used this code in development and it seems to work. ~~However, I haven't begun to think about adding CIME tests.~~ Some CIME unit tests have now been added.

Checklist

My code follows the style guidelines of this project (black formatting)
I have performed a self-review of my own code
My changes generate no new warnings
I have added tests that exercise my feature/fix and existing tests continue to pass
I have commented my code, particularly in hard-to-understand areas
I have made corresponding additions and changes to the documentation

… logs.

…baseline_phase() method.

codecov · 2026-03-20T21:38:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.29%. Comparing base (82493b1) to head (824243a).
⚠️ Report is 169 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4951      +/-   ##
==========================================
- Coverage   31.44%   28.29%   -3.16%     
==========================================
  Files         264      262       -2     
  Lines       38703    38390     -313     
  Branches     8390     8126     -264     
==========================================
- Hits        12172    10863    -1309     
- Misses      25291    26279     +988     
- Partials     1240     1248       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

billsacks · 2026-03-22T17:28:25Z

This seems like a useful extension – thanks, @samsrabin ! I have a couple of questions:

(1) It kind of feels like, if a test extends the generate phase, it would also want to extend the compare phase similarly. Do you agree? If so, should that extension point also be added here?

(2) I'd be curious to see an example of how you imagine using this. No need to spend a lot of time putting something together if you don't have something yet, but if you have something already implemented or a specific use case already in mind, I'd be interested to see it. One reason I ask is because I think it could warrant some discussion as to which use cases are appropriate for this mechanism vs. possible other mechanisms.

Specifically, I'm thinking about:

The user might want to copy all files from a certain history tape, not just the last one.

Off-hand, this feels like something that might be less tied to a particular test type and more tied to a specific test, given the test length, testmods, etc. And it also feels like implementing it would be tricky enough that its implementation would probably belong in the core CIME (covered by unit tests) rather than in a specific test. So I'm wondering if this kind of behavior would be more appropriate as a standard option to CIME tests, triggered by a test modifier.

I do see how your first example:

The user might have performed postprocessing during RUN that generated files to be included in the baseline

would be tied to a specific test type - though would be interested in seeing an example just to better understand where you're going with this.

samsrabin · 2026-03-24T19:18:44Z

Thanks for having a look, @billsacks!

I developed this as part of testing the scripts for generating CTSM crop calendars. You may remember me introducing the RXCROPMATURITY test a while ago. At the time, we filed an issue (ESCOMP/CTSM#2166) saying that it'd be good to get this testing into aux_clm (and also clm_pymods, for that matter), but it's hard because the required tests are so long—a CTSM run has to go for a few years, then we have to run the scripts on the outputs, and then ideally we run CTSM again for a few years with the outputs from those scripts.

What I've done now is to make a new kind of test that just runs the scripts based on pre-existing CTSM outputs, then runs CTSM for a few days to make sure it doesn't crash when using the script outputs. (As you might imagine, this is a bit annoying to manage, but it massively improves iteration time when working on the crop calendar generation scripts. And the annoyance only comes when working on those scripts, not every time we run aux_clm or even ctsm_sci.)

The "based on pre-existing CTSM outputs" bit is why I only added an extension for the baseline generation phase and not the comparison phase. This probably would be better handled in core CIME, but I don't have the time to implement that anytime soon, and I need to get my RXCROPMATURITY work done before the CTSM6.0 release.

Ideally I would also like to include the script outputs in the comparison, but I consider that a separate, lower-priority effort that I'm not going to tackle before the CTSM6.0 release. These are already included indirectly, as inputs to the second CTSM run in the full RXCROPMATURITY test that happens at least every time we run ctsm_sci.

billsacks · 2026-03-24T23:13:30Z

Thanks for the explanation, @samsrabin . I still don't feel like I have a clear picture, but probably a discussion will be more efficient. Maybe you can explain this at the next CIME call?

samsrabin · 2026-03-25T16:14:02Z

Sure, we can definitely discuss at the next CIME meeting. In the meantime, here's a diagram showing the workflow for the two versions of the test. RXCROPMATURITY (blue) is the full run that takes a long time; RXCROPMATURITYSKIPFIRSTRUN (red) is the short one solely intended to check the Python scripts. The latter uses pre-existing outputs from the former as inputs to the Python scripts.

billsacks · 2026-03-25T16:36:53Z

Thanks, @samsrabin . The main piece I'm not understanding (which you can try to explain here if you'd like, or we can wait until we can discuss synchronously) is how you're going to leverage the baselines for this. I'm picturing something like:

You'll generate baselines from a run of the test suite where you have run RXCROPMATURITY
You'll then somehow point to these baselines in future runs of RXCROPMATURITYSKIPFIRSTRUN

If so, how would you ensure that the necessary baselines continue to exist for future runs of RXCROPMATURITYSKIPFIRSTRUN given that - at least historically - we have treated baseline directories as scrubbable things after some time? (Again, no need to answer here - just sharing the question in my mind.)

Copilot

Pull request overview

This PR adds an explicit extension point to the CIME system-test baseline generation workflow, allowing subclasses to run additional steps during baseline creation without having to replace the full framework method.

Changes:

Added SystemTestsCommon.generate_baseline_phase(basegen_dir) as a subclass extension hook invoked during _generate_baseline.
Deferred marking the GENERATE phase status until after log copying and the new extension hook completes.
Added unit tests verifying generate_baseline_phase is invoked (including via subclass override).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`CIME/SystemTests/system_tests_common.py`	Introduces the `generate_baseline_phase` extension point and shifts `GENERATE` status setting to the end of `_generate_baseline`.
`CIME/tests/test_unit_system_tests.py`	Refactors baseline-generation test setup and adds assertions/tests covering the new extension hook call path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…e_generation()

billsacks

Thank you for the latest changes, @samsrabin ! I am happy with this now.

billsacks · 2026-04-08T18:13:36Z

+            self._test_status.set_status(
+                GENERATE_PHASE, status, comments=os.path.dirname(baseline_name)
+            )


I see you have moved the set_status for the generate phase to the end of the function. That represents a potential behavior change (in unlikely cases), but feels like the right thing to me.

Yes, sorry, I realize that probably should have been a separate PR... But I did at least call it out in the PR description! It felt weird for PASS to get marked before everything was actually done.

Oops, yeah, I do see that you called it out in the PR description - thanks. I'm personally fine with this being folded in here given that it's called out in the PR description. I'm sorry for not having gone back and re-read the PR description when I re-reviewed this. Yes, I agree that it feels better to wait to mark the phase as pass until everything is done, so I do support this change.

jasonb5

LGTM

billsacks · 2026-04-09T21:21:40Z

@samsrabin - FYI, I edited your top-level comment to bring it up-to-date with the current implementation (changing mention of "generate_baseline_phase" to instead "additional_baseline_generation").

samsrabin added 2 commits February 13, 2026 11:34

SystemTestsCommon _generate_baseline(): Only set status after copying…

2df0153

… logs.

SystemTestsCommon: Optionally extend GENERATE_BASELINE with generate_…

8524cc1

…baseline_phase() method.

samsrabin requested a review from jasonb5 March 20, 2026 21:07

samsrabin self-assigned this Mar 20, 2026

samsrabin added the ty: enhancement label Mar 20, 2026

samsrabin added 2 commits March 20, 2026 15:24

Add unit tests for generate_baseline_phase().

ad58528

Reduce code duplication among test_generate_baseline* tests.

a4fd9df

Simplify testing of generate_baseline_phase().

3c0d623

samsrabin force-pushed the allow-extending-baseline-phase branch from 9b9e3d6 to 3c0d623 Compare March 20, 2026 21:42

jgfouca requested a review from Copilot April 1, 2026 18:15

Copilot started reviewing on behalf of jgfouca April 1, 2026 18:16 View session

samsrabin commented Apr 1, 2026

View reviewed changes

Comment thread CIME/SystemTests/system_tests_common.py Outdated

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Comment thread CIME/SystemTests/system_tests_common.py Outdated

samsrabin commented Apr 1, 2026

View reviewed changes

Comment thread CIME/SystemTests/system_tests_common.py Outdated

samsrabin added 4 commits April 8, 2026 11:37

Rename generate_baseline_phase to additional_baseline_generation.

86aba72

_generate_baseline() docstring now just *suggests* additional_baselin…

31bed3e

…e_generation()

Reformat with black.

64565a1

Add previous commit to .git-blame-ignore-revs.

824243a

billsacks approved these changes Apr 8, 2026

View reviewed changes

jasonb5 approved these changes Apr 9, 2026

View reviewed changes

billsacks merged commit 56ba669 into ESMCI:master Apr 9, 2026
10 of 11 checks passed

Conversation

samsrabin commented Mar 20, 2026 • edited by billsacks Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

codecov Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

billsacks commented Mar 22, 2026

Uh oh!

samsrabin commented Mar 24, 2026

Uh oh!

billsacks commented Mar 24, 2026

Uh oh!

samsrabin commented Mar 25, 2026

Uh oh!

billsacks commented Mar 25, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

billsacks left a comment

Choose a reason for hiding this comment

Uh oh!

billsacks Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

samsrabin Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

billsacks Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

jasonb5 left a comment

Choose a reason for hiding this comment

Uh oh!

billsacks commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

samsrabin commented Mar 20, 2026 •

edited by billsacks

Loading

codecov Bot commented Mar 20, 2026 •

edited

Loading