Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YamlWriter / yaml2ck issues with 'note' field parsed as integer #1610

Open
speth opened this issue Aug 31, 2023 · 5 comments
Open

YamlWriter / yaml2ck issues with 'note' field parsed as integer #1610

speth opened this issue Aug 31, 2023 · 5 comments
Labels
Input Input parsing and conversion (for example, ck2yaml)

Comments

@speth
Copy link
Member

speth commented Aug 31, 2023

Problem description

There are a pair of coupled problems with species' note fields in cases where that field could be (but should not be) interpreted as an integer, for example 120186, which if you're psychic you might guess is the date December 1, 1986. Or maybe January 12, 1986. Or maybe something else entirely. In any case, Cantera should just preserve this as a string.

Steps to reproduce

In Python:

>>> import cantera as ct
>>> gas = ct.Solution('nDodecane_Reitz.yaml')
>>> gas.species('h2o2').input_data['note']
'120186'  # preserved as a string
>>> gas.write_yaml('test-note.yaml')

Looking at the resulting YAML file shows that this value has been converted to an integer:

  - name: h2o2
    composition: {H: 2.0, O: 2.0}
    ...
    note: 120186

which is inconsistent, but not necessarily a significant problem. However, trying to convert this to a Chemkin file:

gas2 = ct.Solution('test-note.yaml')
gas2.write_chemkin('test-note.ck', sort_species=None, sort_elements=None)

fails with the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [3], line 1
----> 1 gas2.write_chemkin('test-note.ck', sort_species=None, sort_elements=None)

File ~/src/cantera/build/python/cantera/solutionbase.pyx:345, in cantera.solutionbase._SolutionBase.write_chemkin()
    343 
    344         from cantera import yaml2ck
--> 345         output_paths = yaml2ck.convert(
    346             self,
    347             mechanism_path=mechanism_path,

File ~/src/cantera/build/python/cantera/yaml2ck.py:681, in convert(solution, phase_name, mechanism_path, thermo_path, transport_path, sort_elements, sort_species, overwrite)
    674 else:
    675     header_text = header_wrapper.fill("") + "\n"
    676 mechanism_text = [
    677     header_text,
    678     build_elements_text(
    679         all_elements,
    680     ),
--> 681     build_species_text(
    682         all_species,
    683     ),
    684 ]
    685 if thermo_path is None:
    686     mechanism_text.append(
    687         build_thermodynamics_text(
    688             all_species,
    689             separate_file=False,
    690         ),
    691     )

File ~/src/cantera/build/python/cantera/yaml2ck.py:177, in build_species_text(species, max_width)
    173 species_names = {s.name: s.input_data.get("note", "") for s in species}
    175 # Notes shorter than 7 characters are probably related to thermo entries
    176 # and will be included with the thermo entry.
--> 177 if any(len(s) > 6 for s in species_names.values()):
    178     max_species_len = max(len(s) for s in species_names.keys())
    179     max_species_len = max(5, max_species_len)

File ~/src/cantera/build/python/cantera/yaml2ck.py:177, in <genexpr>(.0)
    173 species_names = {s.name: s.input_data.get("note", "") for s in species}
    175 # Notes shorter than 7 characters are probably related to thermo entries
    176 # and will be included with the thermo entry.
--> 177 if any(len(s) > 6 for s in species_names.values()):
    178     max_species_len = max(len(s) for s in species_names.keys())
    179     max_species_len = max(5, max_species_len)

TypeError: object of type 'int' has no len()

System information

  • Cantera version: 3.0.0
  • OS: macOS 13.5.1
  • Python 3.11

Additional context

Originally reported on the Cantera Users' Group: https://groups.google.com/g/cantera-users/c/gn_TJt1gspc

@speth speth added bug Input Input parsing and conversion (for example, ck2yaml) labels Aug 31, 2023
@ischoegl
Copy link
Member

ischoegl commented Sep 5, 2023

In any case, Cantera should just preserve this as a string.

I am not convinced of this conclusion. Per discussion when YAML was first introduced, there should not be anything canonical in terms of reserved field names (the field note was specifically discussed). From that perspective, it's hard to 'divine' a string out of an integer without knowing the context? I don't think that YamlWriter has enough information.

The alternative would be to convert integers to a string in yaml2ck when notes are written (or when an entry is first created via ck2yaml, although with ck2yaml being around for a while there may be many "pre-existing" YAML conditions). In this case, the context of conversion from/to Chemkin is clear. This would be an easy fix and would be my preference.

Just 2 cents of course ...

@bryanwweber
Copy link
Member

bryanwweber commented Sep 5, 2023

From that perspective, it's hard to 'divine' a string out of an integer without knowing the context?

I believe the original NASA spec for this format calls for the field in these positions to be a string. In that sense, the int conversion is an artifact of the YAML parser we use/the YAML specification when the file is loaded in the first place. I think the fix is to write that field out as a string always (that is, with explicit quotes in the YAML file) and explicitly cast to a string in the yaml2ck side.

See also, putting quotes around major.minor semantic version numbers in YAML files, lest they be read as floats by a parser.

Edit: based on your edit, I believe we agree @ischoegl 😀

@ischoegl
Copy link
Member

ischoegl commented Sep 5, 2023

@bryanwweber ... happy to agree!

Fwiw, I had a really curious 'bug' in a YAML file once, where a git hash ended up being a valid integer (I guess I was pretty 'lucky'). So string round-trip conversion in YAML is not reliable - you can write out a string and get back an integer any time you don't use quotes. PS: this is exactly what happens in the example at the top.

@ischoegl
Copy link
Member

ischoegl commented Sep 5, 2023

I added a (partial/band-aid) fix to #1616, which takes care of the exception, but not the inadvertent type conversion. One thing that is now clear to me is is that YamlWriter does indeed know the correct type, but the serialization doesn't handle this edge case. I guess it would be possible to handle these using regex expressions, but I don't think that this is consequential enough.

@ischoegl
Copy link
Member

Removing ‘bug’ label as the band-aid fix prevents failures.

@ischoegl ischoegl removed the bug label Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Input Input parsing and conversion (for example, ck2yaml)
Projects
None yet
Development

No branches or pull requests

3 participants