Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py regex kdr #2338

Merged
merged 7 commits into from
Mar 7, 2018
Merged

Py regex kdr #2338

merged 7 commits into from
Mar 7, 2018

Conversation

jedwards4b
Copy link
Contributor

@jedwards4b jedwards4b commented Mar 2, 2018

Proposed changes to st_archive regular expressions,
which accommodate DART files, reduce metacharacters
in env_archive.xml, and make some regexes more selective
or robust.
Potential fix for #2334.
I'm not sure that e3sm changes should be in this,
but included them to be sure the regression test would work.

Test suite: scripts_regression_tests.py
Test baseline:
Test namelist changes:
Test status: bit for bit

Fixes #2334

User interface changes?: in env_archive.xml

Update gh-pages html (Y/N)?:

Code review: gold2718, jedwards4b, mfdeakin-sandia

… DART files, reduce metacharacters in env_archive.xml, and make some regexes more selective or robust. Potential fix for #2334. I'm not sure that e3sm changes should be in this, but included them to be sure the regression test would work.
Copy link
Contributor Author

@jedwards4b jedwards4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you took a portion of the regular expressions out of xml and added it to the code. Why?

@@ -313,19 +314,20 @@ def _archive_history_files(case, archive, archive_entry,
if ninst_string:
if compname.find('mpas') == 0:
# Not correct, but MPAS' multi-instance name format is unknown.
newsuffix = compname + '.*' + suffix
newsuffix = compname + '\d*' + '\.' + suffix + '\.'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why all of the additional whitespace here?

@jedwards4b jedwards4b self-assigned this Mar 2, 2018
@jedwards4b
Copy link
Contributor Author

This is actually @kdraeder work but I opened the PR for him.

@jgfouca jgfouca requested review from mfdeakin-sandia and removed request for jgfouca March 2, 2018 23:00
@jgfouca
Copy link
Contributor

jgfouca commented Mar 2, 2018

Passing the buck to @mfdeakin-sandia

Copy link

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why I am on the reviewer list. I do not know enough about the short term archiver to be dangerous, just clumsy.

<rest_file_extension>[ri]</rest_file_extension>
<rest_file_extension>rh\d*</rest_file_extension>
<rest_file_extension>rs</rest_file_extension>
<hist_file_extension>[eh]</hist_file_extension>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the 'e' for?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example where ESP files are showing up where CAM or CLM files belong?

# That could be done with r"\"\.*\S+\s?\""
# which is '"...{1 or more non-space}{optional space}"'
# An example that matches the simpler pattern:
# "/glade/scratch/raeder/CIME_DA_vars_6/run/CIME_DA_vars_6.cam_0001.rs.2008-08-02-21600.nc"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be resolved before merging.

# pattern = r"{}\.{}\d*.*".format(casename, compname)
# KDR This finds casename.compname[any # digits][any # non-Ret chars]
# Instead want casename.compname[any # digits or _].[any # non-Ret chars]
pattern = r"{}.{}[\d_]*\..*".format(casename, compname)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be resolved before merging.

@kdraeder
Copy link
Collaborator

kdraeder commented Mar 5, 2018 via email

@gold2718
Copy link

gold2718 commented Mar 5, 2018

@kdraeder, it would help if you would enter your comments on the github page where my comments are so I can be sure we are talking about the same thing. Please look there for my updates.

@kdraeder
Copy link
Collaborator

kdraeder commented Mar 5, 2018

@goldy,
I thought that the discussion would happen under the issue,
so I reloaded #2334
and did not see your comments, so I figured they were a direct
response to @jedwards4 original message. I apologize for
being high maintenance again. Now I've found the way to see
the issue comments.

@kdraeder
Copy link
Collaborator

kdraeder commented Mar 5, 2018

@gold2718
I removed the comments that prompted your 2nd comment,
about the restart history file regex, so that's resolved for now.
My push put the comments into my response about not answering
comments in the right way, which seems odd. Is there a way that
I can make my push comments appear as new, instead attaching
to whatever I or someone last wrote?

@gold2718
Copy link

gold2718 commented Mar 5, 2018

@kdraeder, I see the notification about your push right below the last comment which is where it is supposed to be (so to me, it looks like new top-level item).
Comments that are specifically about the PR go in the PR, not in the issue. I would phrase it as the 'Issue discussion' is to clarify the issue and discuss proposed solutions while the 'PR discussion' is for comments on the requested merge. Now I have to figure out a good place to put that on the wiki.

Copy link

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like case.st_archive.py still has a KDR comment.

<hist_file_extension>h\d*</hist_file_extension>
<hist_file_extension>initial_hist</hist_file_extension>
<rest_history_varname>unset</rest_history_varname>
<rpointer>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these spaces correct? I think that <rpointer_file> and <rpointer_content> should be nested under <rpointer> but <rpointer> and <rest_history_varname> are at the same level as <hist_file_extension>

Copy link
Contributor

@mfdeakin-sandia mfdeakin-sandia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from @gold2718 's request for the comment to be resolved and the other spacing issues, this looks fine to me

@@ -313,19 +314,20 @@ def _archive_history_files(case, archive, archive_entry,
if ninst_string:
if compname.find('mpas') == 0:
# Not correct, but MPAS' multi-instance name format is unknown.
newsuffix = compname + '.*' + suffix
newsuffix = compname + r'\d*' + r'\.' + suffix + r'\.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more spacing issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that to vertically line up the same elements in multiple lines,
to make it easier to see the differences between lines. If this
goes against the CIME style standards, I'll remove the extra spaces.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I find no issue with these spaces (they do increase readability IMHO).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not my preference (personally I prefer a formatting tool to do my formatting thinking for me), but if others don't have a problem with it, then that's fine. Though the one on line 322 is missing a space.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfdeakin-sandia, does your formatting tool change these spaces? If not, what is the issue?
I think our python standard is 4 spaces for each indent level and no trailing whitespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issue, I'm fine with this PR

Copy link

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still would like to understand the need for the [eh] hist_file_extension for CAM.

@kdraeder
Copy link
Collaborator

kdraeder commented Mar 6, 2018

[eh] refers to 2 different file types. 'e' denotes files that were generated by esp (DART)
and have characteristics of a history file, but are not CAM history files. They are closely
associated with CAM ensemble members (instances).
They are archived the same way as history files, in $DOUT_S_ROOT/atm/hist.
Example filenames are
CIME_DA_vars_11.cam_0001.e.postassim.2008-08-02-64800.nc
CIME_DA_vars_11.cam_0001.e.preassim.2008-08-02-43200.nc
where 'preassim' and 'postassim' refer to output from different stages
of the assimilation process.

@gold2718
Copy link

gold2718 commented Mar 6, 2018

So who outputs CIME_DA_vars_11.cam_0001.e.postassim.2008-08-02-64800.nc?

@kdraeder
Copy link
Collaborator

kdraeder commented Mar 6, 2018

The program 'filter', which is run by assimilate.csh.

@gold2718
Copy link

gold2718 commented Mar 6, 2018

Then why is the entry under <comp_archive_spec compname="cam" compclass="atm"> instead of <comp_archive_spec compclass="esp" compname="dart">?

@kdraeder
Copy link
Collaborator

kdraeder commented Mar 6, 2018

Because, as described in earlier discussions about the strategy
for archiving CAM+DART files, I chose to keep DART files that
describe an ensemble member with the CAM files that describe the
same member. Users will want to compare these files, and it will be
helpful to have them in the same directory. Also, the CESM file naming
convention and file name construction in case_st_archive.py put restrictions
on what file names can be archived by which sections of env_archive.xml.
A file name with cam_0001 will not be archived by
<comp_archive_spec compclass="esp" compname="dart">.
The file name would need to have dart_0001 in it, which would make
things more cumbersome when we do coupled assimilations which
will generate cam_0001, pop_0001, clm_0001, ... files. If they all have
dart_0001 in their filenames, then the component part will need to be
wedged in somewhere else, making the names even longer, and no more
informative.

There are other DART files that describe the whole ensemble; those are
archived in esp/hist.

@gold2718
Copy link

gold2718 commented Mar 7, 2018

Ah, thanks.

@gold2718 gold2718 assigned gold2718 and unassigned jedwards4b Mar 7, 2018
@gold2718 gold2718 merged commit df4f779 into ESMCI:master Mar 7, 2018
@kdraeder kdraeder deleted the py_regex_kdr branch March 7, 2018 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

case_st_archive.py and env_archive.xml regexes in the context of DART
7 participants