Skip to content

Allow extending baseline phase#4951

Merged
billsacks merged 9 commits intoESMCI:masterfrom
samsrabin:allow-extending-baseline-phase
Apr 9, 2026
Merged

Allow extending baseline phase#4951
billsacks merged 9 commits intoESMCI:masterfrom
samsrabin:allow-extending-baseline-phase

Conversation

@samsrabin
Copy link
Copy Markdown
Contributor

@samsrabin samsrabin commented Mar 20, 2026

Description

Adds a new SystemTestsCommon method, additional_baseline_generation(), which the user can implement to perform extra operations during baseline generation. For example:

  1. The user might have performed postprocessing during RUN that generated files to be included in the baseline.
  2. The user might want to copy all files from a certain history tape, not just the last one.

I also changed it so that the GENERATE_BASELINE status isn't marked as succeeding or failing until after logs are copied (and now also after additional_baseline_generation()).

I have used this code in development and it seems to work. However, I haven't begun to think about adding CIME tests. Some CIME unit tests have now been added.

Checklist

  • My code follows the style guidelines of this project (black formatting)
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • I have added tests that exercise my feature/fix and existing tests continue to pass
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding additions and changes to the documentation

@samsrabin samsrabin requested a review from jasonb5 March 20, 2026 21:07
@samsrabin samsrabin self-assigned this Mar 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 28.29%. Comparing base (82493b1) to head (824243a).
⚠️ Report is 169 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4951      +/-   ##
==========================================
- Coverage   31.44%   28.29%   -3.16%     
==========================================
  Files         264      262       -2     
  Lines       38703    38390     -313     
  Branches     8390     8126     -264     
==========================================
- Hits        12172    10863    -1309     
- Misses      25291    26279     +988     
- Partials     1240     1248       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@samsrabin samsrabin force-pushed the allow-extending-baseline-phase branch from 9b9e3d6 to 3c0d623 Compare March 20, 2026 21:42
@billsacks
Copy link
Copy Markdown
Member

This seems like a useful extension – thanks, @samsrabin ! I have a couple of questions:

(1) It kind of feels like, if a test extends the generate phase, it would also want to extend the compare phase similarly. Do you agree? If so, should that extension point also be added here?

(2) I'd be curious to see an example of how you imagine using this. No need to spend a lot of time putting something together if you don't have something yet, but if you have something already implemented or a specific use case already in mind, I'd be interested to see it. One reason I ask is because I think it could warrant some discussion as to which use cases are appropriate for this mechanism vs. possible other mechanisms.

Specifically, I'm thinking about:

The user might want to copy all files from a certain history tape, not just the last one.

Off-hand, this feels like something that might be less tied to a particular test type and more tied to a specific test, given the test length, testmods, etc. And it also feels like implementing it would be tricky enough that its implementation would probably belong in the core CIME (covered by unit tests) rather than in a specific test. So I'm wondering if this kind of behavior would be more appropriate as a standard option to CIME tests, triggered by a test modifier.

I do see how your first example:

The user might have performed postprocessing during RUN that generated files to be included in the baseline

would be tied to a specific test type - though would be interested in seeing an example just to better understand where you're going with this.

@samsrabin
Copy link
Copy Markdown
Contributor Author

Thanks for having a look, @billsacks!

I developed this as part of testing the scripts for generating CTSM crop calendars. You may remember me introducing the RXCROPMATURITY test a while ago. At the time, we filed an issue (ESCOMP/CTSM#2166) saying that it'd be good to get this testing into aux_clm (and also clm_pymods, for that matter), but it's hard because the required tests are so long—a CTSM run has to go for a few years, then we have to run the scripts on the outputs, and then ideally we run CTSM again for a few years with the outputs from those scripts.

What I've done now is to make a new kind of test that just runs the scripts based on pre-existing CTSM outputs, then runs CTSM for a few days to make sure it doesn't crash when using the script outputs. (As you might imagine, this is a bit annoying to manage, but it massively improves iteration time when working on the crop calendar generation scripts. And the annoyance only comes when working on those scripts, not every time we run aux_clm or even ctsm_sci.)

The "based on pre-existing CTSM outputs" bit is why I only added an extension for the baseline generation phase and not the comparison phase. This probably would be better handled in core CIME, but I don't have the time to implement that anytime soon, and I need to get my RXCROPMATURITY work done before the CTSM6.0 release.

Ideally I would also like to include the script outputs in the comparison, but I consider that a separate, lower-priority effort that I'm not going to tackle before the CTSM6.0 release. These are already included indirectly, as inputs to the second CTSM run in the full RXCROPMATURITY test that happens at least every time we run ctsm_sci.

@billsacks
Copy link
Copy Markdown
Member

Thanks for the explanation, @samsrabin . I still don't feel like I have a clear picture, but probably a discussion will be more efficient. Maybe you can explain this at the next CIME call?

@samsrabin
Copy link
Copy Markdown
Contributor Author

Sure, we can definitely discuss at the next CIME meeting. In the meantime, here's a diagram showing the workflow for the two versions of the test. RXCROPMATURITY (blue) is the full run that takes a long time; RXCROPMATURITYSKIPFIRSTRUN (red) is the short one solely intended to check the Python scripts. The latter uses pre-existing outputs from the former as inputs to the Python scripts.

Diagram of RXCROPMATURITY* runs

@billsacks
Copy link
Copy Markdown
Member

Thanks, @samsrabin . The main piece I'm not understanding (which you can try to explain here if you'd like, or we can wait until we can discuss synchronously) is how you're going to leverage the baselines for this. I'm picturing something like:

  • You'll generate baselines from a run of the test suite where you have run RXCROPMATURITY
  • You'll then somehow point to these baselines in future runs of RXCROPMATURITYSKIPFIRSTRUN

If so, how would you ensure that the necessary baselines continue to exist for future runs of RXCROPMATURITYSKIPFIRSTRUN given that - at least historically - we have treated baseline directories as scrubbable things after some time? (Again, no need to answer here - just sharing the question in my mind.)

Comment thread CIME/SystemTests/system_tests_common.py Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an explicit extension point to the CIME system-test baseline generation workflow, allowing subclasses to run additional steps during baseline creation without having to replace the full framework method.

Changes:

  • Added SystemTestsCommon.generate_baseline_phase(basegen_dir) as a subclass extension hook invoked during _generate_baseline.
  • Deferred marking the GENERATE phase status until after log copying and the new extension hook completes.
  • Added unit tests verifying generate_baseline_phase is invoked (including via subclass override).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
CIME/SystemTests/system_tests_common.py Introduces the generate_baseline_phase extension point and shifts GENERATE status setting to the end of _generate_baseline.
CIME/tests/test_unit_system_tests.py Refactors baseline-generation test setup and adds assertions/tests covering the new extension hook call path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CIME/SystemTests/system_tests_common.py Outdated
Comment thread CIME/SystemTests/system_tests_common.py Outdated
Copy link
Copy Markdown
Member

@billsacks billsacks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the latest changes, @samsrabin ! I am happy with this now.

Comment on lines +1063 to +1065
self._test_status.set_status(
GENERATE_PHASE, status, comments=os.path.dirname(baseline_name)
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you have moved the set_status for the generate phase to the end of the function. That represents a potential behavior change (in unlikely cases), but feels like the right thing to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I realize that probably should have been a separate PR... But I did at least call it out in the PR description! It felt weird for PASS to get marked before everything was actually done.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yeah, I do see that you called it out in the PR description - thanks. I'm personally fine with this being folded in here given that it's called out in the PR description. I'm sorry for not having gone back and re-read the PR description when I re-reviewed this. Yes, I agree that it feels better to wait to mark the phase as pass until everything is done, so I do support this change.

Copy link
Copy Markdown
Collaborator

@jasonb5 jasonb5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@billsacks
Copy link
Copy Markdown
Member

@samsrabin - FYI, I edited your top-level comment to bring it up-to-date with the current implementation (changing mention of "generate_baseline_phase" to instead "additional_baseline_generation").

@billsacks billsacks merged commit 56ba669 into ESMCI:master Apr 9, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants