Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Commit yaml files #768

Merged
merged 11 commits into from Jan 7, 2020
Merged

Conversation

bryanwweber
Copy link
Member

@bryanwweber bryanwweber commented Dec 11, 2019

Thanks for contributing code! Please include a description of your change and
check your PR against the list below (for further questions, refer to the
contributing guide).

  • There is a clear use-case for this code change
  • The commit message has a short title & references relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage

Please fill in the issue number this pull request is fixing

Fixes #662, #669

Changes proposed in this pull request

  • Commit YAML input files into the repository

I also plan to convert all the tests that are currently using CTI/XML input files to use the equivalent YAML version, except of course tests that are testing the behavior of one of those two formats, or the conversion of them.

It occurred to me that ruamel.yaml will still be required to run the tests, so the benefit of removing the build-time dependency is less than I hoped. I still think this is the appropriate resolution of #662 and #669 though.

@ischoegl
Copy link
Member

Some xml/cti occurrences are also replaced in #737 (addressing #736), which is pending resolution.

@speth
Copy link
Member

speth commented Dec 12, 2019

I would suggest limiting the replacement of CTI/XML with YAML in other tests. CTI and XML are still fully supported for Cantera 2.5, and I'd hate to have a bug creep in because we stopped testing these formats.

If you really want to include all of the converted input files, I think this would be a good opportunity to think about which of the existing ones we should actually keep. There are quite a few that are either not used in the test suite or barely used and that I don't think are all that useful to users. A few examples are silicon.cti, silicon_carbide.cti, argon.cti, and nasa.cti (just a combination of nasa_gas and nasa_condensed, as far as I can tell)

@ischoegl
Copy link
Member

If you really want to include all of the converted input files, I think this would be a good opportunity to think about which of the existing ones we should actually keep. There are quite a few that are either not used in the test suite or barely used and that I don't think are all that useful to users.

What is the benefit of no longer supporting those input files (there is next to no development effort involved)? Also, it is hard to know what people are using. speaking for myself, I have been using argon and silicon carbide, mostly when I need a phase that just acts as a cheap reservoir for heat transfer purposes. It’s a minor issue, but I wanted to put it out there.

@bryanwweber
Copy link
Member Author

I think one primary reason is that we don't want to implicitly endorse one set of thermokinetics/transport data over another. By including the file with a Cantera distribution, we're kind of doing that. Obviously, there are several files we use in examples (nDodecane_reitz for one) that aren't generic, and GRI-3.0 seems like an historical exception, but I think in general we'd like to distribute fewer input files.

@ischoegl
Copy link
Member

This is understandable. However, the message that users should use 'modern' input files is already muddled as cantera de facto endorses gri30, nDodecane_reitz, and the h2o2 portion of gri30 (for the latter, there are better h2o2 models around). Are there better ways to weaken this message than making cantera usage less convenient? An alternative would be to expand usage examples that have clear disclaimers regarding models used?

@bryanwweber
Copy link
Member Author

Personally, I see the distinction as related to the example. If the example is associated with a published work, then the example should more than likely use the same input file as the publication (I believe this is the case for the RK phase example that introduced the Reitz file).

As for GRI-3.0, I personally give that an historical exception. At this point, everyone should know not to use GRI-3.0 for any publication-quality work, so I don't mind including it. We need to have some sort of input file for our examples, and having a somewhat realistic (if outdated) file seems like a benefit over having a trivial input file (e.g., 1-2 species, 1 reaction, etc.).

The provenance of many of the other files is questionable, to me. To my understanding, many of them were invented for the purpose of the related example and haven't been peer-reviewed, and we should probably do a better job of clarifying where that's the case. One problem I have is that many of them were added before I started contributing, so I'm not sure what's good and what isn't.

@ischoegl
Copy link
Member

ischoegl commented Dec 15, 2019

As for GRI-3.0, I personally give that an historical exception. At this point, everyone should know not to use GRI-3.0 for any publication-quality work, so I don't mind including it.

I am less optimistic in this regard, and believe that this needs to be pointed out especially to new users that may not be as familiar with the current state of affairs.

The provenance of many of the other files is questionable, to me. To my understanding, many of them were invented for the purpose of the related example and haven't been peer-reviewed, and we should probably do a better job of clarifying where that's the case.

This is my understanding also.

I agree that some simple, semi-realistic example input files are needed (whether or not those are argon or silicon-carbide is beside the point). (EDIT: I guess I misread, and am actually advocating for more scaled down examples). I personally hope that the NSF project will result in improvements here.

One suggestion I'd have for 2.5 is to add fields to all cantera-supplied YAML (and CTI?) files that include a disclaimer and/or clarify provenance (similar to the deprecated field that @speth suggested elsewhere). To that end, ck2yaml etc. could take an optional argument --comment that adds an entry so files do not have to be modified by hand.

@ischoegl
Copy link
Member

ischoegl commented Dec 15, 2019

Adding/referencing an open issue that could be laid to rest here: #339

@bryanwweber
Copy link
Member Author

I am less optimistic in this regard, and believe that this needs to be pointed out especially to new users that may not be as familiar with the current state of affairs.

I'm not sure where the best place to do this is - include a disclaimer in every example that uses GRI-3.0? I guess another question is, does it matter? There's no way anyone could publish a paper these days using GRI-3.0. Are there other outputs from using Cantera? (That question is clearly unanswerable 😄 ) Maybe someone uses it at an engineering firm, wouldn't they validate with experimental results for a design?

I guess, it seems like adding a disclaimer for all the examples that use GRI-3.0 seems like adding quite a bit of noise 🤷‍♂

(EDIT: I guess I misread, and am actually advocating for more scaled down examples)

What do you mean by "scaled down"?

I personally hope that the NSF project will result in improvements here.

As far as I know, we're not planning any work specifically on existing examples. I'm also not sure what you mean by improvements - improvements where? If you have any ideas here, feel free to put an issue at Cantera/improvements.

One suggestion I'd have for 2.5 is to add fields to all cantera-supplied YAML (and CTI?) files that include a disclaimer and/or clarify provenance (similar to the deprecated field that @speth suggested elsewhere)

This is already done, see

description: |-
GRI-Mech Version 3.0 7/30/99 CHEMKIN-II format
See README30 file at anonymous FTP site unix.sri.com, directory gri;
WorldWideWeb home page http://www.me.berkeley.edu/gri_mech/ or
through http://www.gri.org , under 'Basic Research',
for additional information, contacts, and disclaimer

To that end, ck2yaml etc. could take an optional argument --comment that adds an entry so files do not have to be modified by hand.

I believe that ck2yaml currently reads any comments in the CK files and adds them to the description key. I'm not sure that adding a command line flag to read in a long text blob is a great UI, maybe it could be either a short message or a file name. PRs welcome!

Adding/referencing an open issue that could be laid to rest here: #339

I am not planning to address #339 in this pull request.

@ischoegl
Copy link
Member

I'm not sure that adding a command line flag to read in a long text blob is a great UI, maybe it could be either a short message or a file name

The description field is clearly different as it pertains to what the original authors of the mechanism had to say. All I was pointing to here is to add a short official comment by cantera developers to clarify what those input files represent and/or how they should be used (how this gets implemented is secondary).

This suggestion would partially address #339, especially as the future fate of some of the distributed input files is being actively discussed at this point. I am writing this in an attempt to provide constructive suggestions. I am not planning a PR myself if this is just a shot in the dark; this is clearly a higher level decision. It’s not up to me ...

Ad scaled down Input / improvements: I don’t think that existing examples need work, but I was hoping for a more comprehensive suite of examples, some of which could be really basic ... 😉

@ischoegl
Copy link
Member

@bryanwweber ... rather than adding a PR, I posted Cantera/enhancements#15

@bryanwweber bryanwweber force-pushed the commit-yaml-files branch 2 times, most recently from 9aa49d2 to 68f3564 Compare December 18, 2019 04:43
@codecov
Copy link

codecov bot commented Dec 18, 2019

Codecov Report

Merging #768 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #768      +/-   ##
==========================================
+ Coverage   71.44%   71.44%   +<.01%     
==========================================
  Files         372      372              
  Lines       43935    43946      +11     
==========================================
+ Hits        31388    31399      +11     
  Misses      12547    12547
Impacted Files Coverage Δ
test/general/test_containers.cpp 100% <100%> (ø) ⬆️
test/thermo/thermoFromYaml.cpp 100% <100%> (ø) ⬆️
src/thermo/ThermoFactory.cpp 83.72% <100%> (+0.24%) ⬆️
src/base/AnyMap.cpp 87.5% <100%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba32337...b72a6ca. Read the comment docs.

@bryanwweber
Copy link
Member Author

The description field is clearly different as it pertains to what the original authors of the mechanism had to say.

I don't see the description field in this way. I think it is a more general field that can contain whatever we want. Thank you for adding the enhancement issue for references though, I think that will be valuable if we can come up with a convention there.

This suggestion would partially address #339, especially as the future fate of some of the distributed input files is being actively discussed at this point.

Well, I ended up partially addressed #339 in 89157a4 at least identifying when I didn't know the source of the data.

Ad scaled down Input / improvements: I don’t think that existing examples need work, but I was hoping for a more comprehensive suite of examples, some of which could be really basic ... 😉

If you have something specific in mind (or even if not), feel free to open an issue on Cantera/enhancements. We do have the tutorials on the website which are pretty basic.

@ischoegl
Copy link
Member

Well, I ended up partially addressed #339 in 89157a4 at least identifying when I didn't know the source of the data.

Looks good! Thanks for addressing.

@bryanwweber
Copy link
Member Author

bryanwweber commented Dec 18, 2019

@speth @ischoegl I added a possible deprecated implementation for the YAML files in 546223a with two sample messages in argon.yaml and gri30.yaml.

I don't necessarily think it's worth adding to CTI given the impending deprecation of the whole format, plus I don't know where to add that 😂

@ischoegl
Copy link
Member

ischoegl commented Dec 18, 2019

@bryanwweber ... the way I see this alternative:

Pro:

This implementation is short and won't require further code modifcations.

Con:

@speth: The one advantage of approach [#737] would be that we wouldn't need to start of by introducing some deprecated phase definitions in gri30.yaml.

  • The gri30.yaml file cannot be generated from source via ck2yaml, which is imho a serious drawback.
  • The default loading behavior changes silently / without warning (which is inconsequential)

@bryanwweber
Copy link
Member Author

@ischoegl: The gri30.yaml file cannot be generated from source via ck2yaml, which is imho a serious drawback.

Until we remove those phases after Cantera 2.5.0, i.e., for a limited amount of time. After Cantera 2.5.0, the file will be able to be generated directly from the CK input. As such, I don't see this as a drawback for this approach.

@ischoegl
Copy link
Member

ischoegl commented Dec 18, 2019

I'm ok with this solution (as long as the ck2yaml generation is preserved). The warning message could be more explicit about how transport models are handled in general (which was a nice feature of the warning messages of the alternative approach).

Edit: closed #737

@bryanwweber
Copy link
Member Author

The warning message could be more explicit about how transport models are handled in general (which was a nice feature of the warning messages of the alternative approach).

Yes, these are just a message for the sake of seeing some output is generated. I'm interested in some feedback on the actual implementation for now, and I'll keep working on the messages. I was actually surprised at how easy this was to do, at least to the extent that the feature is available here.

@ischoegl
Copy link
Member

ischoegl commented Dec 18, 2019

I was actually surprised at how easy this was to do, at least to the extent that the feature is available here.

Yup. Seems like your solution hit a sweet spot - short and simple. Wouldn't have closed my PR otherwise!

I'm interested in some feedback on the actual implementation for now, and I'll keep working on the messages.

Sure. I'm wondering whether it wouldn't make sense to move all warnings to AnyMap? You can probably raise warnings for all kinds of other mapped objects (thermo phases, species thermo, etc.) with very little effort using the same idea.

@bryanwweber
Copy link
Member Author

The reason I included it here is that it requires a YAML file with this field to actually function, and this is the only branch that has YAML files committed to the repo.

data/ohn.yaml Outdated
@@ -0,0 +1,18 @@
description: This definition extracts the O/H/N submechanism from GRI-Mech 3.0
Copy link
Member

@ischoegl ischoegl Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hand-copied from nasa.cti in 89157a4 (similar for many other files) ... would it make sense to add parsing of the header block and conversion to the description field to ctml2yaml? (this would also benefit users converting their own files)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would be good to add this functionality... I think it should be possible in ctml2yaml, I believe comments are accessible in the tree. I'm not sure about cti2yaml, because that actually execs the input file, so I'm not sure if comments are available. I believe a docstring would be available and could be added though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I recall, CTI files typically have a commented # header. The simplest way would be just to parse this block using with open(...) as cti_file: etc.?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctml2yaml doesn't really stand a chance here of having a relevant description to copy, since the comment header in the CTI file would have been lost when running ctml_writer, and there are very few XML files with any header comment blocks, at least in the cantera repository.

I spent a little time trying to get access to the comments in a CTI file when writing cti2yaml, but couldn't find a way to get at them in a way that provided correspondence to the code itself -- they don't seem to be available even through the ast module. If we're only after the header comment to use for this field, then I think @ischoegl's suggestion is probably the best we can do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@speth I think it's probably worth seeing how hard this would be to add to ctml2yaml, in case anyone has hand-edited XML files that include comments, at least for a top-level comment. Not a high priority, I'll add an issue at Cantera/enhancements eventually.

data/air.yaml Outdated
@@ -0,0 +1,177 @@
description: |-
Copy link
Member

@ischoegl ischoegl Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on discussion in commit 89157a4 (which is not displayed outside of the commit):

I believe it would be preferable to add the description to the initial source air.inp instead of adding it by hand after conversion. I.e. ck2yaml will generate this field automatically (I did as much in a small addition added to #777).

It is a near duplicate of prandtl1.m
Python best practices are to use the with statement to manage opening
and closing files. The Python docs recommend specifying newline=''
to avoid certain problems with Windows-style carriage returns. See
https://docs.python.org/3/library/csv.html#id3
Remove newlines and strip leading and trailing spaces to reformat the
comment into a single line.
@bryanwweber bryanwweber marked this pull request as ready for review January 5, 2020 17:20
@bryanwweber
Copy link
Member Author

@speth I think this is finished now (finally). I reverted most of the controversial formatting changes, keeping mostly the ones adding spaces and the version requirements to the docstring.

Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final suggestions.

data/argon.yaml Outdated Show resolved Hide resolved
data/lithium_ion_battery.yaml Outdated Show resolved Hide resolved
test/general/test_containers.cpp Outdated Show resolved Hide resolved
test/thermo/thermoFromYaml.cpp Outdated Show resolved Hide resolved
In many cases, these are copied from the corresponding CTI file. In some
cases, the source of the data is unknown and these are also noted.
Add requirements to all Python examples. Update formatting to resolve
some PEP8 problems.
@bryanwweber
Copy link
Member Author

@speth Thanks for all the comments! Updated again.

@speth speth merged commit 6ccefcb into Cantera:master Jan 7, 2020
@ischoegl
Copy link
Member

ischoegl commented Jan 7, 2020

@bryanwweber ... it looks like the chemkin to yaml conversion is no longer used in SConstruct, but the chemkin inputs persist in the data folders. Was this by intent or is it an oversight?

@bryanwweber
Copy link
Member Author

@ischoegl By intent, we decided to handle/discuss other input files in a separate PR to avoid extending the scope of this one even further.

@ischoegl
Copy link
Member

ischoegl commented Jan 7, 2020

@bryanwweber ... as #777 depends on chemkin input, I (temporarily?) added the chemkin->yaml conversion back into SConstruct.

@bryanwweber
Copy link
Member Author

@ischoegl OK, we can discuss that in #777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The build script requires NumPy and ruamel_yaml for YAML conversions
3 participants