Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capgen cleanup #493

Merged
merged 9 commits into from
Oct 12, 2023
Merged

Capgen cleanup #493

merged 9 commits into from
Oct 12, 2023

Conversation

peverwhee
Copy link
Collaborator

@peverwhee peverwhee commented Sep 1, 2023

Summary

This PR contains mostly clean-up for readability and clarity.

Description

  • Update CODEOWNERS file
  • Add optional database object output for capgen
  • Code cleanup

User interface changes?: Yes
Can now include optional return_db argument to capgen to return a database for use by host model.

Addresses #380

Testing:
All tests passing except for unit conversion test that is expected to fail.

Copy link
Collaborator

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yow, lots of nice cleanup!
For the record, I shed no tears for Python2.

scripts/ccpp_state_machine.py Outdated Show resolved Hide resolved
Comment on lines +808 to +810
input_vars = [set(), set(), set()] # leaves, arrays, leaf elements
inout_vars = [set(), set(), set()] # leaves, arrays, leaf elements
output_vars = [set(), set(), set()] # leaves, arrays, leaf elements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more pirates?

@@ -12,7 +12,7 @@

########################################################################

_UNITS_RE = re.compile(r"^[^/@#$%^&*()\|<>\[\]{}?,.]+$")
_UNITS_RE = re.compile(r"^[^/!@#$%^&*=()\|<>\[\]{}?,.]+$")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? What do examples that use these look like?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's impossible to decipher regular expressions like this, it might be good to add a few examples above what we want to match. This would also help addressing @gold2718's question.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After staring at this for a minute, I think we can actually clean this up a bit if it makes sense with something like:

unitless_regex = "1"
unit_regex = "[a-zA-Z]+[-]?[0-9]+"
units_regex = f"^({unit_regex}(\s{unit_regex})?|{unitless_regex})$"
# >>> print(units_regex)
# ^([a-zA-Z]+[-]?[0-9]+(\s[a-zA-Z]+[-]?[0-9]+)?|1)$
_UNITS_RE = re.compile(units_regex)

Testing locally and on regex101 looks to allow only what we want minus the exponents with leading 0's (like m s-01) but that's easy enough to add in, but I was just curious if this was a reasonable replacement first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@climbfuji @nusbaume thoughts on @mwaxmonsky 's regex cleanup?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a little more testing and I missed two common cases so I think a more flushed out version could be:

unitless_regex                = "1"
non_leading_zero_num          = "[1-9]\d*"
negative_non_leading_zero_num = f"[-]{non_leading_zero_num}"
unit_exponent                 = f"({negative_non_leading_zero_num}|{non_leading_zero_num})"
unit_regex                    = f"[a-zA-Z]+{unit_exponent}?"
units_regex                   = f"^({unit_regex}(\s{unit_regex})*|{unitless_regex})$"
# >>> print(units_regex)
# ^([a-zA-Z]+([-][1-9]\d*|[1-9]\d*)?(\s[a-zA-Z]+([-][1-9]\d*|[1-9]\d*)?)*|1)$
_UNITS_RE = re.compile(units_regex)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not as familiar with the backwards compatibility needed for the whitespace support but I think we can leave that out of the regex and pass the proposed units str through python's strip() function to remove all surrounding whitespace and then pass it to _UNITS_RE.match(...).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @mwaxmonsky's solution!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update pushed. Just let me know if the new version has any issues!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also on the documentation front, I noticed that there's a few doctest tests which pass in the units and error on invalid strings. Would we be duplicating documentation by putting in examples outside the doctest for check_units() right below the regex declaration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwaxmonsky has pushed his regex updates, which is both cleaner and easier to decipher!

@gold2718 gold2718 mentioned this pull request Sep 2, 2023
Copy link
Collaborator

@dustinswales dustinswales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peverwhee Looks good to me. Thanks for this contribution.
(And good riddance to python v2!!!)

scripts/ccpp_state_machine.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments below. Since this PR removes Python2 support (yay), is this mentioned or encoded somewhere, for example on a top-level README.md and/or CMakeLists.txt?

datatable.xml file created by capgen.
"""
metadata_tables = {}
host_name = "cam"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that host_name is hardcoded to cam. What is the plan for making this configurable in the future?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peverwhee Can you comment on this please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@climbfuji Since the creating a database object from a file is not yet implemented, this host name is not currently doing anything. I did change it to something more general in the interim, but I think long-term, it will be parsed from the datafile.

scripts/ccpp_state_machine.py Outdated Show resolved Hide resolved
scripts/ccpp_suite.py Show resolved Hide resolved
@@ -12,7 +12,7 @@

########################################################################

_UNITS_RE = re.compile(r"^[^/@#$%^&*()\|<>\[\]{}?,.]+$")
_UNITS_RE = re.compile(r"^[^/!@#$%^&*=()\|<>\[\]{}?,.]+$")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's impossible to decipher regular expressions like this, it might be good to add a few examples above what we want to match. This would also help addressing @gold2718's question.

@mkavulich
Copy link
Collaborator

RE: python2 I'm pretty sure python 2 support was de-facto ended long ago (there are a lot of python-3 specific constructs like f-strings deep in the framework, and have been there for more than a year). I'm not sure that requires any specific notes.

Copy link
Collaborator

@mwaxmonsky mwaxmonsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just two small syntax things I noticed.

scripts/metavar.py Outdated Show resolved Hide resolved
scripts/metavar.py Outdated Show resolved Hide resolved
climbfuji added a commit that referenced this pull request Sep 21, 2023
Merge main into feature/capgen (merge after #493)
Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments and for simplifying the regular expressions!

@@ -56,7 +56,7 @@ def db_from_file(self, run_env, database_file):
datatable.xml file created by capgen.
"""
metadata_tables = {}
host_name = "cam"
host_name = "host"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Collaborator

@mwaxmonsky mwaxmonsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@climbfuji
Copy link
Collaborator

@peverwhee @nusbaume This has been approved by a bunch of people. @nusbaume Can you look at your previous comments and approve if ok, so that we can merge this? Thanks!

@nusbaume nusbaume requested review from nusbaume and removed request for nusbaume October 12, 2023 21:42
@peverwhee peverwhee merged commit c2c2aa0 into NCAR:feature/capgen Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants