Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(yaml) Possible performance regression in snakeyaml 1.18 #67

Closed
tlrx opened this issue Feb 2, 2018 · 16 comments
Closed

(yaml) Possible performance regression in snakeyaml 1.18 #67

tlrx opened this issue Feb 2, 2018 · 16 comments
Labels
yaml Issue related to YAML format backend

Comments

@tlrx
Copy link

tlrx commented Feb 2, 2018

Hi,

I spotted a performance regression in snakeyaml introduced in version 1.18+. This issue has been reported on the snakeyaml mailing list here and seems to affect the parsing of large text values.

It affects jackson-dataformats-text starting version 2.9.1 as it depends on snakeyaml 1.18. Version 1.19 is also problematic but version 1.17 is ok.

I understand that this is not a jackson-dataformats-text issue but I'd like 1) to raise awareness on this regression and 2) to ensure that jackson-dataformats-text dependency on snakeyaml will be updated to a newer version that fixes the regression (if such version is released)

@cowtowncoder
Copy link
Member

Ok. Thank you for reporting this. I don't remember if 1.18 was specifically needed for a bug fix or not. If not, could gracefully downgrade until 1.20 (or whatever with fix) comes along.

It is too bad SnakeYAML has severe performance issues in general (it's literally 10-20x slower than, say, json parsing, or 10x xml); otherwise it works very well.

@chrisco484
Copy link

"10x [slower than] xml [parsing]?"

With XML being so verbose it's hard to imagine anything being slower than XML parsing. That's amazing.

@cowtowncoder
Copy link
Member

@chrisco484 there is no fundamental reason it should be, since at low level textual codecs really should converge to similar speeds relative to raw size (that is, more verbose formats would be bit slower just since there's more text to decode). On the other hand, XML parsers have been rather aggressively optimized so parsers like Aalto get pretty decent throughput, in 10s of megabytes per second per core. It would be nice if YAML part could be improved.

@chrisco484
Copy link

chrisco484 commented Apr 5, 2018 via email

@cowtowncoder
Copy link
Member

Right. And one problem with YAML is that the rules for decoding are rather complicated (specification is huge, and with all the variations to support wiki-style, or not, it gets hairy quick), which makes it much more work to optimize.
But I don't think that's where most of the time is spent right now: I think it's just SnakeYAML using quite complex processing pipeline.

As long as YAML is mostly used for configuration handling this isn't much of a problem.
But it could be a problem for anyone using YAML for data interchange (request/response).

@chrisco484
Copy link

chrisco484 commented Apr 5, 2018 via email

@cowtowncoder cowtowncoder added the yaml Issue related to YAML format backend label Apr 11, 2018
@cowtowncoder
Copy link
Member

cowtowncoder commented Apr 12, 2018

@tlrx Looks like 1.20 is out. Not sure if that differs from 1.19 in this aspect; based on discussion it sounds like probably not?

@yborovikov
Copy link

yborovikov commented Apr 25, 2018

fyi: snakeyaml 1.20 introduced a breaking (for jackson 2.9.5) change in this commit https://bitbucket.org/asomov/snakeyaml/commits/6cb855839ebc1f3796cff26aa6e5423f1395b975.

java.lang.NoSuchMethodError: org.yaml.snakeyaml.events.MappingStartEvent.<init>(Ljava/lang/String;Ljava/lang/String;ZLorg/yaml/snakeyaml/error/Mark;Lorg/yaml/snakeyaml/error/Mark;Ljava/lang/Boolean;)V

    at com.fasterxml.jackson.dataformat.yaml.YAMLGenerator.writeStartObject(YAMLGenerator.java:489)
    at com.fasterxml.jackson.core.base.GeneratorBase.writeStartObject(GeneratorBase.java:286)
    at com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:626)
    at com.fasterxml.jackson.databind.ser.std.MapSerializer.serialize(MapSerializer.java:33)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider._serialize(DefaultSerializerProvider.java:480)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:319)
    at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3893)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3207)
    ...

cc: @asomov

@40ft
Copy link

40ft commented May 24, 2018

I just ran into this performance problem (parsing a 30KB string on Android) and spent a couple of hours looking at the snakeyaml code, before finding my way here.

The main problem in 1.18 is a O(n-squared) loop in the Reader.peek() function, which has indeed been completely rewritten and fixed in later versions. Forcing snakeyaml to 1.21 (with jackson 2.9.5) in my build gives acceptable parsing performance on Android. It does break serialization though, as mentioned above.

Generally speaking, there is a lot of seemingly unnecessary string buffering/duplication/copying going on in the snakeyaml parsing code (optimisation is hard!), so I am not at all surprised if performance is bad.

@asomov
Copy link
Contributor

asomov commented May 24, 2018

@chrisco484 : the YAML 1.2 spec is very big. Chomping, flow styles, scalar styles (especially the folded style), implicit types, anchors/aliases/recursive structures, complex keys are not present in XML and they are not often used. Nevertheless the parser must always take care of all these details.

@asomov
Copy link
Contributor

asomov commented May 24, 2018

@yborovikov: can I help to adapt jackson 2.9.5 to work with SnakeYAML 1.21 ?

@cowtowncoder
Copy link
Member

Quick note: next patch version would be 2.9.6, but I have decided that there probably should be minor upgrade of 2.10 for Jackson, before bigger changes in 3.0.

I have no objections to changes in 2.9(.6) that allow use of newer version (and dependency to newer version), as long as change would still work with older version(s) that Jackson 2.9 has relied on.
This because users sometimes have slightly inconsistent set of dependencies, and I don't want to add risk in patch versions.

For 2.10, new version should definitely be used, so above caveat is only wrt 2.9.x patches.

@yborovikov
Copy link

@asomov Theoretically, reinstating the ScalarEvent(String anchor, String tag, ImplicitTuple implicit, String value, Mark startMark, Mark endMark, Character style) constructor (and marking it as @Deprecated) should restore snakeyaml's compatibility with jackson's YAMLGenerator.

This way Jackson 2.9.x would be able to use both older and newer versions of snakeyaml (with the exception of 1.20 and 1.21).

@asomov
Copy link
Contributor

asomov commented May 29, 2018

I am confused now. I thought that this issue is all about migrating to SnakeYAML 1.21

@yborovikov
Copy link

@asomov Well, migrating to 1.21 is fairly simple and is covered by this PR: https://github.com/FasterXML/jackson-dataformats-text/pull/82/files

However, it would prevent consumers from using SnakeYAML 1.19 (and older - which might be desirable for some people due to the alleged performance regression) with the newer Jackson 2.9.x.

There's an option of dynamically finding the right constructor at runtime (to maintain compatibility with older SnakeYAML versions as @cowtowncoder requested above), yet, arguably, it's a bit ugly.

So, simply applying #82 and releasing it as a part of Jackson 2.10 might be what the maintainers prefer. Ultimately, I'm confused now, too :)

@cowtowncoder
Copy link
Member

SnakeYAML version was upgraded, as per #81, so I think this is resolved now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
yaml Issue related to YAML format backend
Projects
None yet
Development

No branches or pull requests

6 participants