-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Py regex kdr #2338
Py regex kdr #2338
Conversation
… DART files, reduce metacharacters in env_archive.xml, and make some regexes more selective or robust. Potential fix for #2334. I'm not sure that e3sm changes should be in this, but included them to be sure the regression test would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like you took a portion of the regular expressions out of xml and added it to the code. Why?
scripts/lib/CIME/case_st_archive.py
Outdated
@@ -313,19 +314,20 @@ def _archive_history_files(case, archive, archive_entry, | |||
if ninst_string: | |||
if compname.find('mpas') == 0: | |||
# Not correct, but MPAS' multi-instance name format is unknown. | |||
newsuffix = compname + '.*' + suffix | |||
newsuffix = compname + '\d*' + '\.' + suffix + '\.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why all of the additional whitespace here?
This is actually @kdraeder work but I opened the PR for him. |
Passing the buck to @mfdeakin-sandia |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why I am on the reviewer list. I do not know enough about the short term archiver to be dangerous, just clumsy.
<rest_file_extension>[ri]</rest_file_extension> | ||
<rest_file_extension>rh\d*</rest_file_extension> | ||
<rest_file_extension>rs</rest_file_extension> | ||
<hist_file_extension>[eh]</hist_file_extension> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the 'e' for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example where ESP files are showing up where CAM or CLM files belong?
scripts/lib/CIME/case_st_archive.py
Outdated
# That could be done with r"\"\.*\S+\s?\"" | ||
# which is '"...{1 or more non-space}{optional space}"' | ||
# An example that matches the simpler pattern: | ||
# "/glade/scratch/raeder/CIME_DA_vars_6/run/CIME_DA_vars_6.cam_0001.rs.2008-08-02-21600.nc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be resolved before merging.
scripts/lib/CIME/case_st_archive.py
Outdated
# pattern = r"{}\.{}\d*.*".format(casename, compname) | ||
# KDR This finds casename.compname[any # digits][any # non-Ret chars] | ||
# Instead want casename.compname[any # digits or _].[any # non-Ret chars] | ||
pattern = r"{}.{}[\d_]*\..*".format(casename, compname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be resolved before merging.
@goldy was nominated as a reviewer because the(?) resident st_archive
expert (Alice)
is unavailable, @jedwards4 is the assignee, @kdraeder is the author, and we
don't know who else is interested in reviewing DART related PRs.
'e' is the file type for ESP files, some of which end up in CAM, CLM, ...
archive directories, but are not any other file type like i, r, h#, ...
I think that the 3rd item Steve flagged doesn't require much discussion.
The multi-instance file names have component parts like cam_0001,
so '_' needs to be included in the allowable characters that follow
$compname
before the next '.' .
Kevin
…On Fri, Mar 2, 2018 at 8:00 PM, goldy ***@***.***> wrote:
***@***.**** requested changes on this pull request.
I'm not sure why I am on the reviewer list. I do not know enough about the
short term archiver to be dangerous, just clumsy.
------------------------------
In config/cesm/config_archive.xml
<#2338 (comment)>:
> @@ -1,10 +1,10 @@
<components version="2.0">
<comp_archive_spec compname="cam" compclass="atm">
- <rest_file_extension>\.[ri]\..*</rest_file_extension>
- <rest_file_extension>\.rh\d\.*</rest_file_extension>
- <rest_file_extension>\.rs\.*</rest_file_extension>
- <hist_file_extension>\.h.*.nc$</hist_file_extension>
+ <rest_file_extension>[ri]</rest_file_extension>
+ <rest_file_extension>rh\d*</rest_file_extension>
+ <rest_file_extension>rs</rest_file_extension>
+ <hist_file_extension>[eh]</hist_file_extension>
What is the 'e' for?
------------------------------
In scripts/lib/CIME/case_st_archive.py
<#2338 (comment)>:
> @@ -366,6 +368,12 @@ def get_histfiles_for_restarts(rundir, archive, archive_entry, restfile):
for item in items:
# the following match has an option of having a './' at the beginning of
# the history filename
+# KDR Does this do what's intended?
+# I think it's looking for '"...//{awordchar}{anythingelse} "'
+# That could be done with r"\"\.*\S+\s?\""
+# which is '"...{1 or more non-space}{optional space}"'
+# An example that matches the simpler pattern:
+# "/glade/scratch/raeder/CIME_DA_vars_6/run/CIME_DA_vars_6.cam_0001.rs.2008-08-02-21600.nc"
This needs to be resolved before merging.
------------------------------
In scripts/lib/CIME/case_st_archive.py
<#2338 (comment)>:
> @@ -465,16 +473,19 @@ def _archive_restarts_date_comp(case, archive, archive_entry,
pfile = re.compile(pattern)
restfiles = [f for f in os.listdir(rundir) if pfile.search(f)]
else:
- pattern = r"{}\.{}\d*.*".format(casename, compname)
+# pattern = r"{}\.{}\d*.*".format(casename, compname)
+# KDR This finds casename.compname[any # digits][any # non-Ret chars]
+# Instead want casename.compname[any # digits or _].[any # non-Ret chars]
+ pattern = r"{}.{}[\d_]*\..*".format(casename, compname)
This needs to be resolved before merging.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2338 (review)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AMpTEAlQFQnlHzjvYkUi_f82IBxyWd0aks5tagdogaJpZM4SairU>
.
|
@kdraeder, it would help if you would enter your comments on the github page where my comments are so I can be sure we are talking about the same thing. Please look there for my updates. |
@goldy, |
…ure issue about DART restart files.
@gold2718 |
@kdraeder, I see the notification about your push right below the last comment which is where it is supposed to be (so to me, it looks like new top-level item). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like case.st_archive.py still has a KDR comment.
config/cesm/config_archive.xml
Outdated
<hist_file_extension>h\d*</hist_file_extension> | ||
<hist_file_extension>initial_hist</hist_file_extension> | ||
<rest_history_varname>unset</rest_history_varname> | ||
<rpointer> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these spaces correct? I think that <rpointer_file>
and <rpointer_content>
should be nested under <rpointer>
but <rpointer>
and <rest_history_varname>
are at the same level as <hist_file_extension>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from @gold2718 's request for the comment to be resolved and the other spacing issues, this looks fine to me
scripts/lib/CIME/case_st_archive.py
Outdated
@@ -313,19 +314,20 @@ def _archive_history_files(case, archive, archive_entry, | |||
if ninst_string: | |||
if compname.find('mpas') == 0: | |||
# Not correct, but MPAS' multi-instance name format is unknown. | |||
newsuffix = compname + '.*' + suffix | |||
newsuffix = compname + r'\d*' + r'\.' + suffix + r'\.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more spacing issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that to vertically line up the same elements in multiple lines,
to make it easier to see the differences between lines. If this
goes against the CIME style standards, I'll remove the extra spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, I find no issue with these spaces (they do increase readability IMHO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not my preference (personally I prefer a formatting tool to do my formatting thinking for me), but if others don't have a problem with it, then that's fine. Though the one on line 322 is missing a space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfdeakin-sandia, does your formatting tool change these spaces? If not, what is the issue?
I think our python standard is 4 spaces for each indent level and no trailing whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issue, I'm fine with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still would like to understand the need for the [eh]
hist_file_extension
for CAM.
[eh] refers to 2 different file types. 'e' denotes files that were generated by esp (DART) |
So who outputs CIME_DA_vars_11.cam_0001.e.postassim.2008-08-02-64800.nc? |
The program 'filter', which is run by assimilate.csh. |
Then why is the entry under |
Because, as described in earlier discussions about the strategy There are other DART files that describe the whole ensemble; those are |
Ah, thanks. |
Proposed changes to st_archive regular expressions,
which accommodate DART files, reduce metacharacters
in env_archive.xml, and make some regexes more selective
or robust.
Potential fix for #2334.
I'm not sure that e3sm changes should be in this,
but included them to be sure the regression test would work.
Test suite: scripts_regression_tests.py
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes #2334
User interface changes?: in env_archive.xml
Update gh-pages html (Y/N)?:
Code review: gold2718, jedwards4b, mfdeakin-sandia