Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving scalar type information when emitting #30

Closed
leoetlino opened this issue Jan 22, 2020 · 6 comments
Closed

Preserving scalar type information when emitting #30

leoetlino opened this issue Jan 22, 2020 · 6 comments

Comments

@leoetlino
Copy link
Contributor

(I hope you'll forgive me for opening three issues in a row...)

Consider the following document:

- actorType: WeaponSmallSword
  actors:
  - {name: Weapon_Sword_031, plus: -1, value: 15.0}
  - {name: Weapon_Sword_031, plus: 0, value: 20.0}
  - {name: Weapon_Sword_031, plus: 1, value: 24.0}
  not_rank_up: false
  series: DragonRoostSeries
# ...

Emitting it with ryml gives the following:

  -
    actorType: WeaponSmallSword
    actors:
      -
        name: Weapon_Sword_031
        plus: '-1'
        value: 15
      -
        name: Weapon_Sword_031
        plus: 0
        value: 20
      -
        name: Weapon_Sword_031
        plus: 1
        value: 24
    not_rank_up: 0
    series: DragonRoostSeries

Ignoring the strange newlines and the different flow styles (#29), something that's problematic for loading ryml-emitted documents with PyYAML -- or any other parser that relies on detecting types -- is that value types don't seem to be preserved.

In particular, -1 (an integer) gets turned into '-1', which would cause any such parser to load the value as a string and not as an integer.

The value that is associated with not_rank_up, which is supposed to be a boolean, seems to have been implicitly converted to an integer (0). Other than looking somewhat worse, this causes parsers to load it as an integer and not as a boolean.

Something similar happens for floats that also happen to be integers; 24.0 is emitted as 24, which causes the value to be loaded as an integer and not as a float.

I've tried overloading the c4::yml::write function for bool, but that doesn't seem to work as it's not called. More generally, do you have any ideas or suggestions for getting output that matches the existing document?

@biojppm
Copy link
Owner

biojppm commented Jan 26, 2020

(I hope you'll forgive me for opening three issues in a row...)

Actually, it's the other way around -- thank you for submitting; that let's me know what needs to improve!

And as you may surmise, the emitter code is a bit dumb, so it's no wonder you're facing these problems (like #28 #29 or #31). So far I've focused more on parsing, but of course emission is also something that matters, and the bugs you reported help me realize that the current approach is not as good as it should be.

In particular, -1 (an integer) gets turned into '-1', which would cause any such parser to load the value as a string and not as an integer.

Ha, that's a bug indeed. Need to fix.

The value that is associated with not_rank_up, which is supposed to be a boolean, seems to have been implicitly converted to an integer (0). Other than looking somewhat worse, this causes parsers to load it as an integer and not as a boolean.
Something similar happens for floats that also happen to be integers; 24.0 is emitted as 24, which causes the value to be loaded as an integer and not as a float.

How did you create the tree which was emitted? There are two cases here:

  • you used ryml to parse YAML to create the tree, then you used ryml to emit the code above. This case will not cause any string conversions, because the parsed YAML values are stored as strings in the tree. So I don't think this was what you did.
  • you built the tree directly from your types, using ryml's serialization functions. This is likely to use NodeRef::operator<<(T const&), which calls Tree::to_arena() which in turn calls c4::to_chars() (defined in this header). This is the overload set that turns your values into strings. And indeed, the bool overload will turn true/false into "1"/"0". Also, the float overload will default to the %g printf format which will result in "24" rather than "24.0". These strings then end up in the ryml tree, and the emitter prints them verbatim when doing its stuff.

The right way for me to fix this is to refine the Tree::to_arena so that these common types are correctly treated. I will eventually do this, but until then, know that you can tweak the serialization or do it yourself, and then use the correctly formatted string to add to the tree.

If you want to use ryml/c4's facilities for this, there are some facilities to help you, but be warned they're still a little rough on the edges.

In particular, take a look at the real formatting utilities. I think (sorry, can't confirm now) that it's something like this:

float val = 24.0f;
node << val; // BAD, result: "24"

int precision = 2; // print floats with two digits.
node << c4::fmt::real(val, precision); // OK, result: "24.00"

As an alternative, you can do a direct call to the ftoa/dtoa functions. Actually if you look at the code, c4::fmt::real is just syntactic sugar for calling ftoa.

As for the bool problem, you can work around it with something like this:

bool good = true;
node << good; // BAD, result: "1"
node << (good ? "true" : "false"); // OK! result: "true"

Again, these are interim solutions that you can use to tweak your code so that it works with the current ryml version. Let me know if they work for you, or if you have trouble finding how to adopt these.

And of course, this bug will stay open until this is fixed/improved in ryml.

@biojppm
Copy link
Owner

biojppm commented Jan 27, 2020

@leoetlino Is the temp solution working for you?

@leoetlino
Copy link
Contributor Author

Whoops sorry, I didn't realise my comment didn't get sent.

Generating the string representations manually does work, and since it's what I've already been doing for libyaml anyway it's barely a workaround for me :)

@biojppm
Copy link
Owner

biojppm commented Jan 27, 2020

Since you mention you're formatting for libyaml, I guess you are using some other formatter. Just for me to get an idea, what are you using?

Depending on how much data you are formatting, if you are using eg std::stringstream and the like, then using the c4 formatters will likely net you a considerable runtime gain, as formatting is done on the final destination, so on each formatting you save an allocation and/or a temporary and one or two string copies as compared to a std::stringstream-type approach. Of course, it all depends on how much data you are formatting.

So that's why I am curious. I have a pretty good idea of my needs, but I'd love to get some views into the needs of people using this library (or considering use of this library).

@leoetlino
Copy link
Contributor Author

leoetlino commented Jan 27, 2020

Yeah, my current method of formatting does result in at least one extra allocation per formatting, since I'm using std::string (with std::to_string and abseil's formatting utils). However, the cost is probably dwarfed by the inefficiency of libyaml and its numerous allocations ;)

The use case is converting a Nintendo binary YAML format to and from regular YAML. Here's a sample document: https://raw.githubusercontent.com/zeldamods/byml-v2/master/test_data/ActorInfo.product.yml (one of the largest documents -- which takes 2.8s to parse with pyyaml and 0.15s with ryml; including type conversions in both cases).

I'll probably move to using this library for emitting as well when it becomes possible to control the output style, since currently libyaml's performance is acceptable and lets me emit YAML that looks almost identical to the already existing ones.

@biojppm
Copy link
Owner

biojppm commented Feb 2, 2020

The real (and only) bug in this issue is that numbers get quoted when emitting. As for formatting, that is something which anyway ryml already provides. I did a fix for the emission of numbers, and will close this in that commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants