Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize the naming and attributes of ancillary time arrays across components #194

Open
billsacks opened this issue Nov 3, 2021 · 31 comments

Comments

@billsacks
Copy link
Member

A request from @phillips-ad, based on user input:

Standardize the naming and attributes of ancillary time arrays across components. (time_bnds in cam, time_bounds in clm, no calendar attribute set for either, etc..)

@klindsay28 points out that well-written scripts shouldn't require uniformity in names across components. However, I feel like this is something that might be relatively easy to standardize and could save significantly more time than it takes us to make the change, so we might as well do it.

I'm not sure exactly what needs to be done. When someone gets a chance to coordinate this, we should open issues in the respective component repositories asking for changes and note those issues here to track them.

@phillips-ad
Copy link
Collaborator

@billsacks thanks for writing this up. I can definitely come up with a list of proposed changes to output ancillary variable attributes as I'm one of the folks who is pushing for this. I will post the list in a future reply to this thread so others can review.

@phillips-ad
Copy link
Collaborator

There are more inconsistencies across the components for the output time/time_bounds arrays beyond the naming of the time_bounds arrays. Below is a summary list of potential changes. Examples by component of the current status of these arrays and what the Tier 1+2 proposed changes would look like are shown in a Google Doc.

Tier 1 proposed changes

  • The time endpoint arrays should all be named "time_bounds", and the two dimensions of this array should be named "time" and "nbnd".
  • time_bounds should have the same calendar and units attributes as time.
  • time_bounds@long_name should be set to "time interval endpoints"
  • GLC output for the time array has a units attribute setting ("common_year since 0000-01-01 0:0:0") different from all other components. This should be changed to either "days since YYYY-MM-DD HH:MM:SS" (set to the beginning of the run, used by atm/land) or "days since 0001-01-01 00:00:00" (used by ocean/ice).
  • At present there is no time_bounds array for GLC output. This should be added.

Tier 2 proposed change

  • The time and equivalent time_bounds arrays are of type double for some components and type float for others. All components should output type double for both arrays.

Tier 3 (Inconsistency should be left alone?)

  • Atmosphere/land models have a time@units setting of "days since YYYY-MM-DD HH:MM:SS" which equates to the start time of the run, while ocean/ice has a setting of "days since 0001-01-01 00:00:00". As long as proper time attributes are set, and subsequent conversions of the time by common tools work, I don't see why this needs to be changed.

@strandwg
Copy link

strandwg commented Nov 5, 2021

Adam nails every issue exactly.

My only comment is that for Tier 3 it would be nice for all components to use the same fixed absolute time start date of "days since 0001-01-01 00:00:00", as ocean and ice already do. Only atm and lnd reset that, as Adam notes, in certain contexts (a branch run, IIRC).

@gold2718
Copy link

If we are standardizing the time units (I am in favor), is there a reason we do not adopt ISO 8601 notation? Can our new standard explicitly spell out a standard for paleo runs (technically, any run with dates before 1583)?

@strandwg
Copy link

There's a discussion of this issue here

cf-convention/cf-conventions#298

@billsacks
Copy link
Member Author

billsacks commented Mar 29, 2022

For CTSM, see ESCOMP/CTSM#1693

For CICE, see ESCOMP/CESM_CICE#14

For CISM, see ESCOMP/CISM-wrapper#75

For CAM, see ESCOMP/CAM#554

For RTM, see ESCOMP/RTM#31

For MOSART, see ESCOMP/MOSART#53

For MOM, see ESCOMP/MOM_interface#107

For WW3, see ESCOMP/WW3-CESM#22

@billsacks
Copy link
Member Author

@phillips-ad @strandwg - Assuming that the resolution of ESCOMP/CAM#159, ESCOMP/CTSM#1059 and similar issues for other components is that we separate time-averaged from instantaneous values so that a given file only contains one or the other: What would you suggest we do for the time_bounds field on files that only contain instantaneous values? Should we leave time_bounds off in this case? Or make it the start and end of the relevant time step, as @ekluzek suggests in ESCOMP/CTSM#1059 (comment) ?

@strandwg
Copy link

"time_bounds" doesn't make sense for instantaneous values, and for our MIP data, instantaneous data aren't required to have it for the time axis. I'd leave it out of instantaneous output streams.

Since averaged data and instantaneous data will be in different streams, what should the streams be and how to differentiate them?

@phillips-ad
Copy link
Collaborator

phillips-ad commented Mar 31, 2022

I'm honestly ambivalent on this, so I will defer on leaving time_bounds off.

In response to @strandwg 's question, I believe average, min and max will be in one stream, and instantaneous will be in other streams. An example of how I thought this would work for CAM/CTSM streams: h0 would contain A, M and X for monthly, h1 would contain the same for daily, and so on, and then (say) h8 would contain instantaneous (=I) for monthly, h9 would contain I for daily and so on. But it would be user customizable, so instead of h8 containing monthly I the user could set the stream to h1 if they want. I haven't seen a discussion of this, but I might've missed it.

@strandwg, were you wondering whether there should be h0, h1, etc streams for A, M, X, but streams named (say) hi0, hi1, etc for instantaneous streams?

@billsacks
Copy link
Member Author

billsacks commented Mar 31, 2022

Thanks @strandwg and @phillips-ad .

Good question about distinguishing between the different files / streams. I can see how naming files with something like hi would make this more clear for users. This might be something that's hard to make consistent across all components, but I'd welcome suggestions for CTSM, and maybe we can at least try to push for CTSM and CAM to be consistent, since their history infrastructure is somewhat similar.

@strandwg
Copy link

"time_bounds" isn't applicable for an instantaneous value, so it's not needed.

The critical issue about differentiating instantaneous and averaged fields is keeping them separate. It's not uncommon to save the same field from CAM as both, but in different "cam.hN" streams. That kind of implementation needs to be kept intact.

It's unfortunate that "cam.i.*" files already exist as initial files, because I think replicating the ".h[0-9]" (averaged) as ".i[0-9]" (instantaneous) would be clear and unambiguous. I don't know if that's reasonable or would cause too much confusion.

@phillips-ad
Copy link
Collaborator

After today's CISM meeting, I realized that we never spoke again of the fact that some components set time units to the start of the run with "days since YYYY-MM-DD HH:MM:SS" (CAM/CTSM/RTM/MOSART/WW3), while others use or are being pushed to use "days since 0001-01-01 00:00:00" (MOM, CICE, CISM). I previously cited this as an optional tier 3 change above. (#194 (comment)) But @strandwg said it would be nice if this was consistent across components. I agree, but I wonder whether it is too big an ask to implement this. Are we OK with different components setting the time units attribute one of these two possible ways? In my recent filed issues for each component I did propose that each can use one of these two methods.

@strandwg
Copy link

strandwg commented Apr 4, 2022

Getting all components to have the same "days since YYYY-MM-DD HH:MM:SS" units for time and time bounds is more important at this point, IMHO. If it's not unreasonable to have all components have the same "days since 0001-01-01 00:00:00", that would be really nice, but I agree that may be asking too much now.

@ekluzek
Copy link
Contributor

ekluzek commented Apr 4, 2022

@strandwg gave me an idea when talking about the cam.i files. What if we change the history files ".h" into either ".ha" for average (and min, max), and ".hi" for instantaneous? It seems like that would be a clearer way to figure out what type of data is on a given file stream? Do others like this idea? There's some precedence for this in CLM as CNDV created ".hv" files.

@billsacks
Copy link
Member Author

What if we change the history files ".h" into either ".ha" for average (and min, max), and ".hi" for instantaneous? It seems like that would be a clearer way to figure out what type of data is on a given file stream? Do others like this idea? There's some precedence for this in CLM as CNDV created ".hv" files.

I like that idea. It will probably mean more changes to the user interface for setting history file variables: I'm imagining that now, instead of hist_fincl1, hist_fincl2, etc., we'd probably have histi_fincl1, histi_fincl2, hista_fincl1, hista_fincl2, etc. And instead of hist_dov2xy = .true.,.false. we'd have histi_dov2xy = .true.,.false. and hista_dov2xy = .true.,.false.; etc. (I guess we could avoid too many changes by dropping the a - so we'd have hist_ variables referring to the ha files and histi_ variables referring to the hi files, but I'd probably argue for just going ahead and ripping the band-aid off, even if it means users will need to adjust their scripts / user_nl files.) But my intuition is that, once those changes are done, that would make this easier to work with moving forward, both for users and for developers mucking in the history code.

@klindsay28
Copy link

I suggest steering clear of embedding more semantic information in filenames than we already do. CF provides the cell methods attribute to provide information about the temporal treatment of variables. This attribute gets propagated with data workflow software like NCO. Semantic information embedded in filenames can get dropped and/or modified unintentionally. Also, having (more) semantic content is multiple places (metadata and filename) is setting us up for having the content from different places being inconsistent, which can lead to confusion.

@billsacks
Copy link
Member Author

@klindsay28 while I definitely see your point, my intuition is that distinguishing between instantaneous vs. time-averaged files might actually make the user interface to setting history-related namelist variables – via user_nl_clm, etc. – more clear and less error-prone both for users and developers needing to work with the relevant code. So I'm thinking about the advantages largely from that perspective.

@gold2718
Copy link

gold2718 commented Apr 4, 2022

A couple of brief comments.

  1. I am getting confused between MUST HAVE, HIGH PRIORITY, and NICE TO HAVE (and possibly other levels). The more we pile on, the most expensive this work gets and I have yet to see any budget for this work, never mind any discussion of what we are going to stop doing to get this work done.
  2. I do not think we have control over how all the components (e.g., MOM, CICE) work in terms of history output.

@strandwg
Copy link

strandwg commented Apr 5, 2022

It's true that while cell_methods is the canonical means to determine if a field is averaged or instantaneous, that can only be determined by opening and inspecting the file. Given that the model now regularly outputs hundreds of fields, checking them all is inefficient and time-consuming, especially once the output has been transposed to timeseries format.

Since we need to have averaged and instantaneous fields in different output streams, adding a character or two to distinguish between the streams isn't too much to ask, given that it will enable users to determine which data are which quickly and easily.

@gold2718
Copy link

gold2718 commented Apr 5, 2022

adding a character or two to distinguish between the streams isn't too much to ask

Have you calculated how much SE work is involved in this? Writing new filenames is not that hard (although not easy since some models such as CAM do not have the infrastructure for multiple history file naming schemes so this is a bunch of work just to get there). However, these name changes must be tracked through every post-processing tool and archiving tool and somehow work with different, incompatible versions and components which implement the changes at different times.

What are we not going to do for CESM3 to get this done? Or are you extending the dates for the CESM3 release to incorporate these new requirements?

@gold2718
Copy link

gold2718 commented Apr 5, 2022

What if we change the history files ".h" into either ".ha" for average (and min, max), and ".hi" for instantaneous?

Does i stand for 'instantaneous' or for 'interval'? We have discussed dropping the term, 'instantaneous', in CAM history because that is not an accurate description of what 'I' does. Currently, that processing style outputs the last captured value, there is nothing particularly 'instantaneous' about it.

If we adopt a new file-naming scheme, let's please spend a bit of time to come up with names that have less potential for confusion.

@phillips-ad
Copy link
Collaborator

I see a few priorities that have the potential to compete with one another:

  • Implement the changes detailed above with the possible exception of the Tier 3 change.
  • Minimize issues data users will have with the proposed changes. (Obviously though renaming some variables within codes will be necessary.)
  • Minimize work required by SE's.

Hopefully we can find a good balance. It is unlikely that we will be able to get complete consistency across component output due to MOM/CICE as @gold2718 correctly notes, and I'm sure there will be some other unforeseen issues that arise. But hopefully there will be a number of tasks that will be easier to implement, and as a result CESM data will be easier for all to use.

So it sounds like at present we have two options. Both would require separating A, M or X output from instantaneous (/last captured value).

Option 1

  • Keep history file names the same, all output goes in h0 (or h), h1, h2, etc.
  • Require instantaneous data to be in a separate stream (that would not house any A/M/X data). Checks would need to be written to enforce this behavior. time_bounds array would not be written to instantaneous streams.
  • Benefits: namelist changes are kept to a minimum (outside of users having to put instantaneous data in its own stream), and file names remain unchanged.
  • Drawbacks: Users would have to inspect the cell_methods attribute to verify the type of calculation, same as it is with CESM2.

Option 2

  • Alter history file names, so that instantaneous streams have their own distinct file names (whether "hi" or some other choice).
  • Require instantaneous data to be in a separate stream (that would not house any A/M/X data). Checks would need to be written to enforce this behavior. time_bounds array would not be written to instantaneous streams. (Same as in 1.)
  • Benefits: Users will know immediately if the history or timeseries file is instantaneous due to the different file name.
  • Drawbacks: More SE time required compared to option 1 (?), the changing of the naming of output streams may affect users more than if option 1 is implemented.

I'm sure I'm missing some benefits/drawbacks. I honestly see both sides here. However, if the SE development time required is considerably more for option 2 than 1, I lean towards option 1.

@billsacks
Copy link
Member Author

Thanks for laying that out clearly @phillips-ad . Question for you and others: How important is it for components to agree on option 1 vs. option 2? My vague sense is that component history naming conventions already differ for some components, though CAM and CTSM are still in pretty close agreement (maybe due to shared history of those components' history infrastructure). So I think that, before we get to option 1 vs 2 from your last comment, we need to decide between:

Option I: Let each component make its own choice on this question – i.e., whether to use h0, h1 etc. or hi0, ha0, hi1, ha1, etc. (or some other letters)

Option II: Try to at least keep CAM and CTSM consistent, but let other components possibly do something different (e.g., MOM & CICE can be harder to impose rules on because they are not CGD-led projects)

Option III: Try to get as much consistency as possible between as many CESM components as possible

@gold2718
Copy link

gold2718 commented Apr 5, 2022

However, if the SE development time required is considerably more for option 2 than 1, I lean towards option 1.

@phillips-ad, thanks for putting this together.

Does anyone have a list of post-processing uses of CESM data? It would be really helpful to have some idea of how big a job this is. A few items that come to mind are:

  • The short-term archiver (which has a history of being brittle when faced with filename changes).
  • The history-restart mechanism (for fields / files that have some processing other than 'last sample' (aka Instantaneous)).
  • Various diagnostics packages (I only really know about the ones used by atmosphere folks).

What else is out there? Discovery tools? Run databases? Post-processing tools such as time-series generators? Others?

@billsacks
Copy link
Member Author

Good question @gold2718 . I don't have an answer to your broad question, though I will say, on the subject of the short-term archiver, that the way it works now is pretty flexible in this respect. In addition @jedwards4b put in place some nice tests of this a few years ago that can give confidence that the archiving & restore functionality is working correctly when you do these renames. I recently had to change the naming convention of CISM's history output files (to enable multiple ice sheets), and this wasn't too hard to do, and I could have confidence that it was working correctly due to the testing @jedwards4b added. But I realize that only addresses the pieces that are integrated with the Case Control System, not pieces in external tools like diagnostics packages and other post-processing tools.

@strandwg
Copy link

strandwg commented Apr 5, 2022

Thanks, @phillips-ad, for succinctly describing the two choices. One issue is that output file naming is only by convention, not enforced requirements. "cam.h0" is traditionally monthly-means, "cam.h1" is daily data, and h2 to h9 are whatever CAM's been configured to output. For CTSM, h0, h1 and h2 are monthlies, h3 and h4 are annual.

I hesitate to use one of the "h[0-9]" for instantaneous/last value output. Much cleaner and more obvious to have another denotation. "ha[0-9]" and "hi[0-9]" I think are the best choices.

@billsacks, there's no current requirement that each component's output has to resemble another's; CAM and CTSM and MOSART have .h[0-9]; CICE and CISM have .h (CICE has .h1 also), and MOM is yet different.

There may be cases in which some component other than CAM writes instantaneous/last values, but I've not observed that. Only CAM has a real need for instantaneous/last output.

@gold2718, at some point some years ago, CAM output was changed from "cam2.h[0-9]" to "cam.h[0-9]", without too much fuss, so I expect something similar will be done to adapt tools outside of CESM's control. Adding regular expressions generally takes care of the problem.

@gold2718
Copy link

gold2718 commented Apr 5, 2022

@gold2718, at some point some years ago, CAM output was changed from "cam2.h[0-9]" to "cam.h[0-9]", without too much fuss, so I expect something similar will be done to adapt tools outside of CESM's control. Adding regular expressions generally takes care of the problem.

Gosh, cam2 was well before CESM1 / CCSM4, nevermind CIME. The external infrastructure has all changed since those days.

But thanks for making our work sound easy. Does this mean you are volunteering to do this? I would be happy to hand over the history-restart problem as we are way overbooked.

@billsacks
Copy link
Member Author

There may be cases in which some component other than CAM writes instantaneous/last values, but I've not observed that. Only CAM has a real need for instantaneous/last output.

CTSM does have a few variables that are instantaneous by default, and the capability for any others to be written as instantaneous. For runs with a bunch of non-default output files (CMIP6 runs and other big production runs), instantaneous values are used for 10 or so variables. I don't think this capability is as widely used in CTSM as in CAM, but we do need to accommodate it.

at some point some years ago, CAM output was changed from "cam2.h[0-9]" to "cam.h[0-9]"

That reminds me... maybe we should take this opportunity to drop the "2" from "clm2.h*"... or even go so far as to rename from "clm2" to "ctsm".

@strandwg
Copy link

strandwg commented Apr 5, 2022

@gold2718, at some point some years ago, CAM output was changed from "cam2.h[0-9]" to "cam.h[0-9]", without too much fuss, so I expect something similar will be done to adapt tools outside of CESM's control. Adding regular expressions generally takes care of the problem.

Gosh, cam2 was well before CESM1 / CCSM4, nevermind CIME. The external infrastructure has all changed since those days.

But thanks for making our work sound easy. Does this mean you are volunteering to do this? I would be happy to hand over the history-restart problem as we are way overbooked.

Apologies for the poor phrasing; what I meant was that when CAM output went from "cam2" to "cam", it wasn't too difficult to adapt postprocessing and other tools to that change.

@phillips-ad
Copy link
Collaborator

That reminds me... maybe we should take this opportunity to drop the "2" from "clm2.h*"... or even go so far as to rename from "clm2" to "ctsm".

Ha, that's been around for a while now. Yes please. Either way (clm or ctsm).

Thanks for laying that out clearly @phillips-ad . Question for you and others: How important is it for components to agree on option 1 vs. option 2? My vague sense is that component history naming conventions already differ for some components, though CAM and CTSM are still in pretty close agreement (maybe due to shared history of those components' history infrastructure). So I think that, before we get to option 1 vs 2 from your last comment, we need to decide between:

Option I: Let each component make its own choice on this question – i.e., whether to use h0, h1 etc. or hi0, ha0, hi1, ha1, etc. (or some other letters)

Option II: Try to at least keep CAM and CTSM consistent, but let other components possibly do something different (e.g., MOM & CICE can be harder to impose rules on because they are not CGD-led projects)

Option III: Try to get as much consistency as possible between as many CESM components as possible

I think we would cause users some (initial) confusion if CAM/CTSM has stream names ha/hi, while other components keep h. If we switch to output file names with ha/hi, I'd rather all components move to that nomenclature. Right now, whether POP says "h" or CAM says "h0" there's at least a consistent file naming system across components. I think users are OK with the (minor) differences in file names across components in CESM2. So, if I'm to answer your question about which option, I guess option 3?

@gold2718 I can somewhat speak for diagnostic packages and timeseries generating packages. The older NCL-based component diagnostic packages are for the most part run as part of the CESM postprocessing suite. Timeseries generation can also be done with this suite, and works quite well. The problem with the suite is that it is hard to port, so it is mostly run on cheyenne. Plus there is no support for the suite, and I get the impression that most folks think this package will not be used for CESM3 development/post-processing.

As you know the replacement for the AMWG suite is the ADF, and development on that is quite active. I am unsure where other components are in the development of next gen diagnostics.

@strandwg also has his own postprocessing (timeseries-creation) suite, +cylc is also used to create timeseries and CMIP-timeseries.

I agree with @strandwg: I do not see any of these changes being that hard for present or future diagnostic/post-processing suites to handle, with the possible exception of the unsupported CESM_postprocessing suite.

@billsacks
Copy link
Member Author

Regarding consistency between components in their file naming: see https://docs.google.com/document/d/1OSDZtsxrfwgcYBqiI19lcFnTf-7teLLnrJgMt1Y2eC8/edit# , where I am trying to gather this from various components. (I have asked component SE liaisons to fill it out; some have, but there are still some missing pieces.)

billsacks added a commit to billsacks/cesm that referenced this issue Mar 3, 2023
7b6d92e Merge pull request ESCOMP#198 from johnpaulalex/gitdir
927ce3a Merge pull request ESCOMP#197 from johnpaulalex/testpath
a04f114 Merge pull request ESCOMP#196 from johnpaulalex/readmod
d9c14bf Change the rest of the methods to use -C. Still some usage of getcwd in test_unit_repository_git.
332b106 Fix incorrect logged path of checkout_externals in test_sys_checkout: it was basically the parent of the current directory, which varies throughout the test. (it called abspath with '{0}/../../', which adds arbitrary and not-interpolated subdir '{0}' to the path, then removes it and removes one more level).
932a749 Remove printlog from read_gitmodules_file since read_externals_description_file() already has a nearly-the-same printlog (but add it to the other caller).
5d13719 Merge pull request ESCOMP#195 from johnpaulalex/check_repo
4233954 Update utest to mock _git_remote_verbose in a new way, since it is now called via the GitRepository class rather than on the specific GitRepository instance.
d7a42ae Check that desired repo was actually checked out.
71596bb Merge pull request ESCOMP#194 from johnpaulalex/manic2
4c96e82 Make the MANIC_TEST_BARE_REPO_ROOT env var special - give it a constant for easy tracking, and automatically tear it down after each test.
259bfc0 test_sys_checkout: use actual paths in on-the-fly configs rather than MANIC_TEST_BARE_REPO_ROOT env var. This will make it easier to test (in the near future) that checkout_externals actually checked out the desired repo dir.
557bbd6 Merge pull request ESCOMP#193 from johnpaulalex/manic
5314eed Remove MANIC_TEST_TMP_REPO_ROOT environment variable in favor of module-level variable.
345fc1e Merge pull request ESCOMP#191 from johnpaulalex/test_doc12
2117b84 test_sys_checkout: verify that basic by-tag/branch/hash tests actually take us to the correct git tag/branch/hash.
94d6e5f Merge pull request ESCOMP#190 from johnpaulalex/test_doc11
3ff33a6 Inline local-path-creation methods
47dea7f Merge pull request ESCOMP#189 from johnpaulalex/test_doc10
9ea75cb Grab-bag of renamings: Remove redundant _NAME from repo constants, and consistently add _REPO suffix (This causes the majority of diffs).
c0c847e Merge pull request ESCOMP#188 from johnpaulalex/test_doc9
2dd5ce0 test_sys_checkout.py: only check for correct 'required' or 'optional' state in the test that exercises required vs optional behavior. Removed a lot of boilerplate.
eb30859 Merge pull request ESCOMP#187 from johnpaulalex/test_doc8
1832e1f test_sys_checkout: Simplify many tests to only use a single external.
8689d61 Merge pull request ESCOMP#186 from johnpaulalex/test_doc7
fbee425 Grab bag of test_sys_checkout cleanups:    Doc inside of each test more clearly/consistently.    TestSysCheckoutSVN didn’t get the inlining-of-helper-methods treatment, now it has that.    Move various standalone repo helper methods (like create_branch) into a RepoUtils class.    README.md was missing newlines when rendered as markdown.    Doc the return value of checkout.main    Fix test_container_exclude_component - it was looking for the wrong key (which is never present); now it looks for the correct key.
f0ed44a Merge pull request ESCOMP#185 from johnpaulalex/test_doc6
a3d59f5 Merge pull request ESCOMP#184 from johnpaulalex/test_doc5
5329c8b test_sys_checkout: Inline config generation functions that are only called once.
464f2c7 test_sys_checkout: Inline another layer (per-config-file checks). Rename the 4 methods that are used multiple times, to reflect what they do rather than what they're called.
8872c0d Merge pull request ESCOMP#183 from johnpaulalex/doc_test4
c045335 Merge pull request ESCOMP#182 from johnpaulalex/doc_test3
c583b95 Merge pull request ESCOMP#181 from johnpaulalex/doc_test2
e01cfe2 test_sys_checkout: less confusing handling of return values from checkout_externals. Specifically, when doing a checkout, don't return tree_status from _before_ the checkout. Make a new wrapper to call checkout_externals a second time, to calculate the new status after a checkout (very frequent pattern).
2328681 test_sys_checkout: Remove another layer (which generates test component names)
c3717b6 Merge pull request ESCOMP#180 from johnpaulalex/doc_test
36d7a44 test_sys_checkout.py: remove one layer of functions (that check for local status enums). No-op.
2c4584b More documentation about tests: * contents of test repositories (n a new README.md) * various constants in test_sys_checkout.py that point to those contents, and terminology like container/simple/mixed. * in each test method, the scenarios being tested. * The coupling between test methods.
55e74bd Merge pull request ESCOMP#179 from johnpaulalex/circ
66be842 Remove circular dependency by making _External stop doing tricky things with sourcetrees.
82d3b24 Merge pull request ESCOMP#178 from johnpaulalex/test_doc
3223f49 Additional documentation of system tests - global variables, method descriptions.
45b7c01 Merge pull request ESCOMP#177 from jedwards4b/git_workflow
ace90b2 try setting credentials this way
f4d6aa9 try setting credentials this way
1d61a69 use this to set git credentials
7f9d330 use this to set git credentials
5ac731b add tmate code
836847b get git workflow working
dcd462d Merge pull request ESCOMP#176 from jedwards4b/add_github_testing
2d2479e Merge pull request ESCOMP#175 from johnpaulalex/fix
711a53f add github testing of prs and automatic tagging of main
cfe0f88 fix typos
5665d61 Fix broken checkout behavior introduced by PR ESCOMP#172.
27909e2 Merge pull request ESCOMP#173 from johnpaulalex/readall
00ad044 Further tiny refactorings and docs of checkout API (no-op).    Remove unused load_all param in _External.checkout().    Rename _External.checkout_externals() to checkout_subexternals(), to remove the ambiguity about whether the main external pointed to by the _External is itelf checked out (it is not)    Clarify load_all documentation - it’s always recursive, but applies different criteria at each level.    Rename variables in checkout.py (e.g. ext_description)  to match the equivalent code in sourcetree.py.
2ea3d1a Merge pull request ESCOMP#172 from johnpaulalex/fixit
43bf809 Merge pull request ESCOMP#171 from johnpaulalex/docstatus
e6aa7d2 Merge pull request ESCOMP#170 from johnpaulalex/printdir
adbd715 On checkout, refresh locally installed optional packages regardless of whether -o is passed in.
add0745 Comment tweaks, and fix 'ppath' typo
696527c Document the format of various status dictionaries, and the various paths and path components within an _External.
c677b94 When processing an external, print out its path in addition to the base filename (to disambiguate all the externals.cfg's)
975d7fd Merge pull request ESCOMP#169 from johnpaulalex/docfix_branch
09709e3 Document _Externals.status().  The original comment was apparently copy-pasted from checkout().
1d880e0 Merge pull request ESCOMP#167 from billsacks/fix_svn_on_windows
3510da8 Tweak a unit test to improve coverage
eb7fc13 Handle the possibility that the URL already ends with '/'
02ea87e Fix svn URLs on Windows
b1c02ab Merge pull request ESCOMP#165 from gold2718/doc_fix
9f4be8c Add documentation about externals = None feature
a3b3a03 Merge pull request ESCOMP#162 from ESMCI/fischer/python3
d4f1b1e Change shebang lines to python3
2fd941a Merge pull request ESCOMP#158 from billsacks/modified_solution
de08dc2 Add another option for when an external is in a modified state
e954582 Merge pull request ESCOMP#156 from billsacks/onbranch_show_hash
952e44d Change output: put tag/hash before branch name
1028843 Fix pre-existing pylint issues
01b13f7 When on a branch, show tag/hash, too
39ad532 Merge pull request ESCOMP#150 from gold2718/fix_combo_config
75f8f02 Merge pull request ESCOMP#152 from jedwards4b/sort_by_local_path
42687bd remove commented code
29e26af fix pylint issues
7c9f3c6 add a test for nested repo checkout
75c5353 fix spacing
24a3726 improve sorting, checkout externals with each comp
29f45b0 remove py2 test and fix super call
880a4e7 remove decode
1c53be8 no need for set call
36c56db simplier fix for issue
dc67cc6 simpler solution
b32c6fc fix to allow submodule name different from path
5b5e1c2 Merge pull request ESCOMP#144 from billsacks/improve_errmsg
c983863 Add another option for dealing with modified externals
59ce252 Add some details to the error message when externals are modified
be5a1a4 Merge pull request ESCOMP#143 from jedwards4b/add_exclude
2aa014a fix lint issue
49cd5e8 fix lint issues
418173f Added tests for ExternalsDescriptionDict
afab352 fix lint issue
be85b7d fix the test
a580a57 push test
d437108 add a test
21affe3 fix formatting issue
72e6b64 add an exclude option
c33a3bd Merge pull request ESCOMP#139 from jedwards4b/ignore_branch_prop
b124a9a ignore this silently, we use a hash so it does not matter

git-subtree-dir: manage_externals
git-subtree-split: 7b6d92e
jedwards4b added a commit that referenced this issue Apr 20, 2023
7b6d92e Merge pull request #198 from johnpaulalex/gitdir
927ce3a Merge pull request #197 from johnpaulalex/testpath
a04f114 Merge pull request #196 from johnpaulalex/readmod
d9c14bf Change the rest of the methods to use -C. Still some usage of getcwd in test_unit_repository_git.
332b106 Fix incorrect logged path of checkout_externals in test_sys_checkout: it was basically the parent of the current directory, which varies throughout the test. (it called abspath with '{0}/../../', which adds arbitrary and not-interpolated subdir '{0}' to the path, then removes it and removes one more level).
932a749 Remove printlog from read_gitmodules_file since read_externals_description_file() already has a nearly-the-same printlog (but add it to the other caller).
5d13719 Merge pull request #195 from johnpaulalex/check_repo
4233954 Update utest to mock _git_remote_verbose in a new way, since it is now called via the GitRepository class rather than on the specific GitRepository instance.
d7a42ae Check that desired repo was actually checked out.
71596bb Merge pull request #194 from johnpaulalex/manic2
4c96e82 Make the MANIC_TEST_BARE_REPO_ROOT env var special - give it a constant for easy tracking, and automatically tear it down after each test.
259bfc0 test_sys_checkout: use actual paths in on-the-fly configs rather than MANIC_TEST_BARE_REPO_ROOT env var. This will make it easier to test (in the near future) that checkout_externals actually checked out the desired repo dir.
557bbd6 Merge pull request #193 from johnpaulalex/manic
5314eed Remove MANIC_TEST_TMP_REPO_ROOT environment variable in favor of module-level variable.
345fc1e Merge pull request #191 from johnpaulalex/test_doc12
2117b84 test_sys_checkout: verify that basic by-tag/branch/hash tests actually take us to the correct git tag/branch/hash.
94d6e5f Merge pull request #190 from johnpaulalex/test_doc11
3ff33a6 Inline local-path-creation methods
47dea7f Merge pull request #189 from johnpaulalex/test_doc10
9ea75cb Grab-bag of renamings: Remove redundant _NAME from repo constants, and consistently add _REPO suffix (This causes the majority of diffs).
c0c847e Merge pull request #188 from johnpaulalex/test_doc9
2dd5ce0 test_sys_checkout.py: only check for correct 'required' or 'optional' state in the test that exercises required vs optional behavior. Removed a lot of boilerplate.
eb30859 Merge pull request #187 from johnpaulalex/test_doc8
1832e1f test_sys_checkout: Simplify many tests to only use a single external.
8689d61 Merge pull request #186 from johnpaulalex/test_doc7
fbee425 Grab bag of test_sys_checkout cleanups:    Doc inside of each test more clearly/consistently.    TestSysCheckoutSVN didn’t get the inlining-of-helper-methods treatment, now it has that.    Move various standalone repo helper methods (like create_branch) into a RepoUtils class.    README.md was missing newlines when rendered as markdown.    Doc the return value of checkout.main    Fix test_container_exclude_component - it was looking for the wrong key (which is never present); now it looks for the correct key.
f0ed44a Merge pull request #185 from johnpaulalex/test_doc6
a3d59f5 Merge pull request #184 from johnpaulalex/test_doc5
5329c8b test_sys_checkout: Inline config generation functions that are only called once.
464f2c7 test_sys_checkout: Inline another layer (per-config-file checks). Rename the 4 methods that are used multiple times, to reflect what they do rather than what they're called.
8872c0d Merge pull request #183 from johnpaulalex/doc_test4
c045335 Merge pull request #182 from johnpaulalex/doc_test3
c583b95 Merge pull request #181 from johnpaulalex/doc_test2
e01cfe2 test_sys_checkout: less confusing handling of return values from checkout_externals. Specifically, when doing a checkout, don't return tree_status from _before_ the checkout. Make a new wrapper to call checkout_externals a second time, to calculate the new status after a checkout (very frequent pattern).
2328681 test_sys_checkout: Remove another layer (which generates test component names)
c3717b6 Merge pull request #180 from johnpaulalex/doc_test
36d7a44 test_sys_checkout.py: remove one layer of functions (that check for local status enums). No-op.
2c4584b More documentation about tests: * contents of test repositories (n a new README.md) * various constants in test_sys_checkout.py that point to those contents, and terminology like container/simple/mixed. * in each test method, the scenarios being tested. * The coupling between test methods.
55e74bd Merge pull request #179 from johnpaulalex/circ
66be842 Remove circular dependency by making _External stop doing tricky things with sourcetrees.
82d3b24 Merge pull request #178 from johnpaulalex/test_doc
3223f49 Additional documentation of system tests - global variables, method descriptions.
45b7c01 Merge pull request #177 from jedwards4b/git_workflow
ace90b2 try setting credentials this way
f4d6aa9 try setting credentials this way
1d61a69 use this to set git credentials
7f9d330 use this to set git credentials
5ac731b add tmate code
836847b get git workflow working
dcd462d Merge pull request #176 from jedwards4b/add_github_testing
2d2479e Merge pull request #175 from johnpaulalex/fix
711a53f add github testing of prs and automatic tagging of main
cfe0f88 fix typos
5665d61 Fix broken checkout behavior introduced by PR #172.
27909e2 Merge pull request #173 from johnpaulalex/readall
00ad044 Further tiny refactorings and docs of checkout API (no-op).    Remove unused load_all param in _External.checkout().    Rename _External.checkout_externals() to checkout_subexternals(), to remove the ambiguity about whether the main external pointed to by the _External is itelf checked out (it is not)    Clarify load_all documentation - it’s always recursive, but applies different criteria at each level.    Rename variables in checkout.py (e.g. ext_description)  to match the equivalent code in sourcetree.py.
2ea3d1a Merge pull request #172 from johnpaulalex/fixit
43bf809 Merge pull request #171 from johnpaulalex/docstatus
e6aa7d2 Merge pull request #170 from johnpaulalex/printdir
adbd715 On checkout, refresh locally installed optional packages regardless of whether -o is passed in.
add0745 Comment tweaks, and fix 'ppath' typo
696527c Document the format of various status dictionaries, and the various paths and path components within an _External.
c677b94 When processing an external, print out its path in addition to the base filename (to disambiguate all the externals.cfg's)
975d7fd Merge pull request #169 from johnpaulalex/docfix_branch
09709e3 Document _Externals.status().  The original comment was apparently copy-pasted from checkout().
1d880e0 Merge pull request #167 from billsacks/fix_svn_on_windows
3510da8 Tweak a unit test to improve coverage
eb7fc13 Handle the possibility that the URL already ends with '/'
02ea87e Fix svn URLs on Windows
b1c02ab Merge pull request #165 from gold2718/doc_fix
9f4be8c Add documentation about externals = None feature
a3b3a03 Merge pull request #162 from ESMCI/fischer/python3
d4f1b1e Change shebang lines to python3
2fd941a Merge pull request #158 from billsacks/modified_solution
de08dc2 Add another option for when an external is in a modified state
e954582 Merge pull request #156 from billsacks/onbranch_show_hash
952e44d Change output: put tag/hash before branch name
1028843 Fix pre-existing pylint issues
01b13f7 When on a branch, show tag/hash, too
39ad532 Merge pull request #150 from gold2718/fix_combo_config
75f8f02 Merge pull request #152 from jedwards4b/sort_by_local_path
42687bd remove commented code
29e26af fix pylint issues
7c9f3c6 add a test for nested repo checkout
75c5353 fix spacing
24a3726 improve sorting, checkout externals with each comp
29f45b0 remove py2 test and fix super call
880a4e7 remove decode
1c53be8 no need for set call
36c56db simplier fix for issue
dc67cc6 simpler solution
b32c6fc fix to allow submodule name different from path
5b5e1c2 Merge pull request #144 from billsacks/improve_errmsg
c983863 Add another option for dealing with modified externals
59ce252 Add some details to the error message when externals are modified
be5a1a4 Merge pull request #143 from jedwards4b/add_exclude
2aa014a fix lint issue
49cd5e8 fix lint issues
418173f Added tests for ExternalsDescriptionDict
afab352 fix lint issue
be85b7d fix the test
a580a57 push test
d437108 add a test
21affe3 fix formatting issue
72e6b64 add an exclude option
c33a3bd Merge pull request #139 from jedwards4b/ignore_branch_prop
b124a9a ignore this silently, we use a hash so it does not matter

git-subtree-dir: manage_externals
git-subtree-split: 84d65c37f7288576f3a5cd7e4d4fd9f12ac0dece
fischer-ncar added a commit that referenced this issue Dec 19, 2023
876b344 Merge pull request #208 from jedwards4b/add_version_file
46b2eb9 improve workflow name
1506bbf Need to update file before creating version
cb06623 use github workflow to update version.txt
3d13177 update version.txt
ba00e50 Merge branch 'main' into add_version_file
757fed1 update version.txt
cd64e76 add version.txt and pre-push hook
0f884bf Merge pull request #205 from jedwards4b/sunset_svn_git_access
82a5edf merge in billsacks:svn_testing_no_github
17532c1 Use a local svn repo for testing
9c90434 different method to determine if in tests
539952e remove debug print statement
cc5434f fix submodule testing
1d7f288 remove broken tests
04e94a5 provide a meaningful error message
38bcc0a Merge pull request #201 from jedwards4b/partial_match
b4466a5 remove debug print statement
c3cf3ec fix issue with partial branch match
7b6d92e Merge pull request #198 from johnpaulalex/gitdir
927ce3a Merge pull request #197 from johnpaulalex/testpath
a04f114 Merge pull request #196 from johnpaulalex/readmod
d9c14bf Change the rest of the methods to use -C. Still some usage of getcwd in test_unit_repository_git.
332b106 Fix incorrect logged path of checkout_externals in test_sys_checkout: it was basically the parent of the current directory, which varies throughout the test. (it called abspath with '{0}/../../', which adds arbitrary and not-interpolated subdir '{0}' to the path, then removes it and removes one more level).
932a749 Remove printlog from read_gitmodules_file since read_externals_description_file() already has a nearly-the-same printlog (but add it to the other caller).
5d13719 Merge pull request #195 from johnpaulalex/check_repo
4233954 Update utest to mock _git_remote_verbose in a new way, since it is now called via the GitRepository class rather than on the specific GitRepository instance.
d7a42ae Check that desired repo was actually checked out.
71596bb Merge pull request #194 from johnpaulalex/manic2
4c96e82 Make the MANIC_TEST_BARE_REPO_ROOT env var special - give it a constant for easy tracking, and automatically tear it down after each test.
259bfc0 test_sys_checkout: use actual paths in on-the-fly configs rather than MANIC_TEST_BARE_REPO_ROOT env var. This will make it easier to test (in the near future) that checkout_externals actually checked out the desired repo dir.
557bbd6 Merge pull request #193 from johnpaulalex/manic
5314eed Remove MANIC_TEST_TMP_REPO_ROOT environment variable in favor of module-level variable.
345fc1e Merge pull request #191 from johnpaulalex/test_doc12
2117b84 test_sys_checkout: verify that basic by-tag/branch/hash tests actually take us to the correct git tag/branch/hash.
94d6e5f Merge pull request #190 from johnpaulalex/test_doc11
3ff33a6 Inline local-path-creation methods
47dea7f Merge pull request #189 from johnpaulalex/test_doc10
9ea75cb Grab-bag of renamings: Remove redundant _NAME from repo constants, and consistently add _REPO suffix (This causes the majority of diffs).
c0c847e Merge pull request #188 from johnpaulalex/test_doc9
2dd5ce0 test_sys_checkout.py: only check for correct 'required' or 'optional' state in the test that exercises required vs optional behavior. Removed a lot of boilerplate.
eb30859 Merge pull request #187 from johnpaulalex/test_doc8
1832e1f test_sys_checkout: Simplify many tests to only use a single external.
8689d61 Merge pull request #186 from johnpaulalex/test_doc7
fbee425 Grab bag of test_sys_checkout cleanups:    Doc inside of each test more clearly/consistently.    TestSysCheckoutSVN didn’t get the inlining-of-helper-methods treatment, now it has that.    Move various standalone repo helper methods (like create_branch) into a RepoUtils class.    README.md was missing newlines when rendered as markdown.    Doc the return value of checkout.main    Fix test_container_exclude_component - it was looking for the wrong key (which is never present); now it looks for the correct key.
f0ed44a Merge pull request #185 from johnpaulalex/test_doc6
a3d59f5 Merge pull request #184 from johnpaulalex/test_doc5
5329c8b test_sys_checkout: Inline config generation functions that are only called once.
464f2c7 test_sys_checkout: Inline another layer (per-config-file checks). Rename the 4 methods that are used multiple times, to reflect what they do rather than what they're called.
8872c0d Merge pull request #183 from johnpaulalex/doc_test4
c045335 Merge pull request #182 from johnpaulalex/doc_test3
c583b95 Merge pull request #181 from johnpaulalex/doc_test2
e01cfe2 test_sys_checkout: less confusing handling of return values from checkout_externals. Specifically, when doing a checkout, don't return tree_status from _before_ the checkout. Make a new wrapper to call checkout_externals a second time, to calculate the new status after a checkout (very frequent pattern).
2328681 test_sys_checkout: Remove another layer (which generates test component names)
c3717b6 Merge pull request #180 from johnpaulalex/doc_test
36d7a44 test_sys_checkout.py: remove one layer of functions (that check for local status enums). No-op.
2c4584b More documentation about tests: * contents of test repositories (n a new README.md) * various constants in test_sys_checkout.py that point to those contents, and terminology like container/simple/mixed. * in each test method, the scenarios being tested. * The coupling between test methods.
55e74bd Merge pull request #179 from johnpaulalex/circ
66be842 Remove circular dependency by making _External stop doing tricky things with sourcetrees.
82d3b24 Merge pull request #178 from johnpaulalex/test_doc
3223f49 Additional documentation of system tests - global variables, method descriptions.
45b7c01 Merge pull request #177 from jedwards4b/git_workflow
ace90b2 try setting credentials this way
f4d6aa9 try setting credentials this way
1d61a69 use this to set git credentials
7f9d330 use this to set git credentials
5ac731b add tmate code
836847b get git workflow working
dcd462d Merge pull request #176 from jedwards4b/add_github_testing
2d2479e Merge pull request #175 from johnpaulalex/fix
711a53f add github testing of prs and automatic tagging of main
cfe0f88 fix typos
5665d61 Fix broken checkout behavior introduced by PR #172.
27909e2 Merge pull request #173 from johnpaulalex/readall
00ad044 Further tiny refactorings and docs of checkout API (no-op).    Remove unused load_all param in _External.checkout().    Rename _External.checkout_externals() to checkout_subexternals(), to remove the ambiguity about whether the main external pointed to by the _External is itelf checked out (it is not)    Clarify load_all documentation - it’s always recursive, but applies different criteria at each level.    Rename variables in checkout.py (e.g. ext_description)  to match the equivalent code in sourcetree.py.
2ea3d1a Merge pull request #172 from johnpaulalex/fixit
43bf809 Merge pull request #171 from johnpaulalex/docstatus
e6aa7d2 Merge pull request #170 from johnpaulalex/printdir
adbd715 On checkout, refresh locally installed optional packages regardless of whether -o is passed in.
add0745 Comment tweaks, and fix 'ppath' typo
696527c Document the format of various status dictionaries, and the various paths and path components within an _External.
c677b94 When processing an external, print out its path in addition to the base filename (to disambiguate all the externals.cfg's)
975d7fd Merge pull request #169 from johnpaulalex/docfix_branch
09709e3 Document _Externals.status().  The original comment was apparently copy-pasted from checkout().
1d880e0 Merge pull request #167 from billsacks/fix_svn_on_windows
3510da8 Tweak a unit test to improve coverage
eb7fc13 Handle the possibility that the URL already ends with '/'
02ea87e Fix svn URLs on Windows
b1c02ab Merge pull request #165 from gold2718/doc_fix
9f4be8c Add documentation about externals = None feature
a3b3a03 Merge pull request #162 from ESMCI/fischer/python3
d4f1b1e Change shebang lines to python3
2fd941a Merge pull request #158 from billsacks/modified_solution
de08dc2 Add another option for when an external is in a modified state
e954582 Merge pull request #156 from billsacks/onbranch_show_hash
952e44d Change output: put tag/hash before branch name
1028843 Fix pre-existing pylint issues
01b13f7 When on a branch, show tag/hash, too
39ad532 Merge pull request #150 from gold2718/fix_combo_config
75f8f02 Merge pull request #152 from jedwards4b/sort_by_local_path
42687bd remove commented code
29e26af fix pylint issues
7c9f3c6 add a test for nested repo checkout
75c5353 fix spacing
24a3726 improve sorting, checkout externals with each comp
29f45b0 remove py2 test and fix super call
880a4e7 remove decode
1c53be8 no need for set call
36c56db simplier fix for issue
dc67cc6 simpler solution
b32c6fc fix to allow submodule name different from path
5b5e1c2 Merge pull request #144 from billsacks/improve_errmsg
c983863 Add another option for dealing with modified externals
59ce252 Add some details to the error message when externals are modified
be5a1a4 Merge pull request #143 from jedwards4b/add_exclude
2aa014a fix lint issue
49cd5e8 fix lint issues
418173f Added tests for ExternalsDescriptionDict
afab352 fix lint issue
be85b7d fix the test
a580a57 push test
d437108 add a test
21affe3 fix formatting issue
72e6b64 add an exclude option
c33a3bd Merge pull request #139 from jedwards4b/ignore_branch_prop
b124a9a ignore this silently, we use a hash so it does not matter

git-subtree-dir: manage_externals
git-subtree-split: 876b3440e7356fed2bf417659a5f7b5527a92a2f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants