Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The incoming YAML document exceeds the limit: 3145728 code points. #94

Closed
kwladyka opened this issue Mar 4, 2023 · 10 comments · Fixed by #103
Closed

The incoming YAML document exceeds the limit: 3145728 code points. #94

kwladyka opened this issue Mar 4, 2023 · 10 comments · Fixed by #103

Comments

@kwladyka
Copy link

kwladyka commented Mar 4, 2023

When read large YAML

The incoming YAML document exceeds the limit: 3145728 code points.

the code_point_limit is needed to overwrite, but I didn’t find a way to do this with clj-yaml.

How do you read large YAML files?

From slack #clj-yaml

There isn't currently an option, but you can call make-loader followed by (.setCodePointLimit ...) on it.
But I don't see where you can then pass that loader. That should probably be added as well, along with an explicit option for the :code-point-limit

@PetrGlad
Copy link

PetrGlad commented Mar 6, 2023

I have a large file, loaded as

(def all-of-it (yaml/parse-stream (javaio/reader "a-file.yaml") :load-all true))

Curiously, when I evaluate it in a REPL, this one raises the exception (map identity all-of-it), but this one does not (take 1000000 all-of-it) does not (the limit is larger than the number of documents in the input file).

@borkdude
Copy link
Collaborator

borkdude commented Mar 6, 2023

@PetrGlad Can you wrap that take in a doall?

@PetrGlad
Copy link

PetrGlad commented Mar 6, 2023

Sorry, it looks like a reproduction includes to attempt an operation on the sequence first, then other operations succeed. Like

(def all-of-it (yaml/parse-stream (javaio/reader "a-file.yaml") :load-all true))
(doall (map identity all-cases)) ; <-- FAILS
(doall (map identity all-cases)) ; <-- OK

It seems it does not matter which operation was tried first. These are evaluated in REPL, so I think doall should not change the behavior.

@borkdude
Copy link
Collaborator

borkdude commented Mar 6, 2023

If anyone wants to do a PR, we're open to that. It should be relatively straightforward to add:

  • an option to the size
  • an option to work with a pre-defined loader

@PetrGlad
Copy link

Just wanted to note that the actual problem (in my case) is in the snakeyaml. I have already reported that.
Snakeyaml have enforced the input size limit, but it actually limits the whole input stream size, while it only makes sense to limit document size instead. For example this makes difference when input stream contains many small documents.
Making the limit configurable would be a workaround, nonetheless.

@lread
Copy link
Collaborator

lread commented Apr 24, 2023

Thanks for following up @PetrGlad!

Just wanted to note that the actual problem (in my case) is in the snakeyaml. I have already reported that.

Was it this issue here?

Snakeyaml have enforced the input size limit, but it actually limits the whole input stream size, while it only makes sense to limit document size instead. For example this makes difference when input stream contains many small documents. Making the limit configurable would be a workaround, nonetheless.

Is there a separate SnakeYAML issue to address this too?

@PetrGlad
Copy link

Yes, that was the change.
I sent a message to google groups because other services were locked down due to attacks.
They admitted that it is likely a problem but I do not know if a ticket was created (here).

@lread
Copy link
Collaborator

lread commented Apr 25, 2023

@PetrGlad, I don't see a SnakeYAML issue created for that either.
I think SnakeYAML issues on Bitbucket might be still a bit wonky. We can see them now, but maybe not create new issues yet.
A friendly reply/reminder on your thread in the SnakeYAML mailing list would probably be helpful to Andrey.

@lread
Copy link
Collaborator

lread commented May 3, 2023

I pinged Andrey and he responded:

It was fixed without the ticket.
Feel free to create one - we can check how it works (it should be pre-moderated now)

https://bitbucket.org/snakeyaml/snakeyaml/wiki/Changes

Andrey

@lread
Copy link
Collaborator

lread commented May 9, 2023

@PetrGlad, FYI: because Andrey asked me to, in the spirit of being a good citizen, I went ahead and created a SnakeYAML ticket with repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants