Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maps with numeric keys in YAML files are not parsed correctly #21

Closed
wants to merge 1 commit into from

Conversation

wsmigaj
Copy link
Contributor

@wsmigaj wsmigaj commented Jan 15, 2020

The YAMLConfiguration class does not handle correctly maps containing numeric keys. These keys are stored as Value objects storing an instance of NumberContent; as a result, their values can't be accessed via the Configuration::get() function, since it expects the keys to be strings. Out of the four ways to write a map in a YAML file (in the block or flow style, with or without quoted keys), the only way that works is the JSON-like flow style with quoted keys.

my_map: { "1": 0.5, "2": 0.75 }

I'm not sure what the best way to fix this is; this pull request simply adds a few test cases demonstrating the problem.

The test_eckit_yaml_21 test actually expects numeric keys to be treated as numbers, but I think this is undesirable, at least when YAML is parsed to build a Configuration object, since the latter requires keys to be strings.

YAMLConfiguration conf(yaml);
}

CASE("JSON-style map with unquoted numeric keys") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON keys should always be strings, since objects are composed of name/value pairs.
Please see https://www.json.org/json-en.html
So this case invalid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction, this case is valid if we consider that you are parsing is as YAML. YAML does accept numeric values as keys

@tlmquintino
Copy link
Member

tlmquintino commented Apr 28, 2020

I think it is important to distinguish between parsing the YAML and the interface in the Configuration class.

Configuration is meant to be used with keys that are strings, in the JSON sense of named values.
We should fix the YAML parser if we aren't parsing the YAML correctly, but it seems we are, as you discovered that NumericContent is generated from the parsing.
However, it appears to me that you are trying to use Configuration for a purpose it wasn't designed for. My suggestion is that you don't pass numeric values in the keys.

If you are trying to use YAML configuration to input numeric data into an application, then Configuration is not the interface for that purpose.

@tlmquintino
Copy link
Member

I have added a test to YAML parsing that quoted keys are correctly parsed as Numeric values.
This should be sufficient..

@wsmigaj
Copy link
Contributor Author

wsmigaj commented Apr 29, 2020

I think it is important to distinguish between parsing the YAML and the interface in the Configuration class.

I agree.

Configuration is meant to be used with keys that are strings, in the JSON sense of named values.

We should fix the YAML parser if we aren't parsing the YAML correctly, but it seems we are, as you discovered that NumericContent is generated from the parsing.
However, it appears to me that you are trying to use Configuration for a purpose it wasn't designed for. My suggestion is that you don't pass numeric values in the keys.

If you are trying to use YAML configuration to input numeric data into an application, then Configuration is not the interface for that purpose.

But isn't input of numeric data precisely what methods such as Configuration::getInt() or getFloatVector() are for?

I would agree that the testNumericKeys() function defined in this PR uses the LocalConfiguration object in a way it was probably not intended to be used. However, it seems to me that wanting to retrieve a number-to-number mapping from a Configuration is not in itself a bad idea: it is no different in principle from wanting to retrieve a sequence of numbers, for which Configuration provides ample support through getFloatVector() etc. Would you agree?

Perhaps the most elegant way to retrieve such maps would be through dedicated methods such as getFloatToFloatMap(); however, given how many such methods would need to be added to support maps in the same way as vectors, this may not be practical. Therefore a more realistic solution might be to build on the existing support for maps with string keys and just make sure that it works in the same way regardless of whether the keys happen to represent numeric values or not, as I tried to suggest in the description of this issue.

@tlmquintino
Copy link
Member

I think it is important to distinguish between parsing the YAML and the interface in the Configuration class.

I agree.

Configuration is meant to be used with keys that are strings, in the JSON sense of named values.

We should fix the YAML parser if we aren't parsing the YAML correctly, but it seems we are, as you discovered that NumericContent is generated from the parsing.
However, it appears to me that you are trying to use Configuration for a purpose it wasn't designed for. My suggestion is that you don't pass numeric values in the keys.

If you are trying to use YAML configuration to input numeric data into an application, then Configuration is not the interface for that purpose.

But isn't input of numeric data precisely what methods such as Configuration::getInt() or getFloatVector() are for?

No. There is a difference between the types for the keys and the types for the values.
Keys are always strings, because they are named keys. That is how you identify what piece of configuration you want.
getInt() and getFloatVector() are used to return named numeric values.
So the design is used as intended, e.g.:
"number_of_iterations" : 42, "steps" : [ 3, 14, 15, 92 ]

I would agree that the testNumericKeys() function defined in this PR uses the LocalConfiguration object in a way it was probably not intended to be used. However, it seems to me that wanting to retrieve a number-to-number mapping from a Configuration is not in itself a bad idea: it is no different in principle from wanting to retrieve a sequence of numbers, for which Configuration provides ample support through getFloatVector() etc. Would you agree?

See above for example of sequence of numbers.

Perhaps the most elegant way to retrieve such maps would be through dedicated methods such as getFloatToFloatMap(); however, given how many such methods would need to be added to support maps in the same way as vectors, this may not be practical. Therefore a more realistic solution might be to build on the existing support for maps with string keys and just make sure that it works in the same way regardless of whether the keys happen to represent numeric values or not, as I tried to suggest in the description of this issue.

Maps with strings as keys is a consequence of the named keys concept. Configuration is meant to be used with named entries. If you are trying to load a map of numbered index to values, I suggest use another abstraction, or create a new abstraction that loads numeric tables from files (which can be YAML). Again, please recall you can use the YAML parser separate from the Configuration class.

@wsmigaj
Copy link
Contributor Author

wsmigaj commented Apr 29, 2020

I think

"pairs" : { 3: 14, 15: 92 }

would be conceptually no different from the examples you've given if it was possible to tell Configuration to treat { 3: 14, 15: 92 } as a single piece of configuration (analogous to a scalar or a sequence).

However, I see that this feature is unlikely to be added to eckit, so I will probably have to encode these kinds of settings as

pairs:
  x: [ 3, 15]
  y: [14, 92]

instead.

@wsmigaj
Copy link
Contributor Author

wsmigaj commented Apr 29, 2020

In any case, many thanks for your feedback on this issue.

@wsmigaj wsmigaj closed this May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants