Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While scanning a literal block scalar, found extra spaces in first line. #523

Open
LaughingJohn opened this issue Aug 24, 2020 · 11 comments
Assignees
Labels

Comments

@LaughingJohn
Copy link

Hi,

I have some YAML files from a 3rd party which I'm reading and converting to JSON to make it easier to process. For a few files on the deserialize step it is failing with "While scanning a literal block scalar, found extra spaces in first line".

Trace:

at YamlDotNet.Core.Scanner.ScanBlockScalarBreaks(Int32 currentIndent, StringBuilder breaks, Boolean isLiteral, Mark& end, Nullable`1& isFirstLine)   at YamlDotNet.Core.Scanner.ScanBlockScalar(Boolean isLiteral)   at YamlDotNet.Core.Scanner.FetchBlockScalar(Boolean isLiteral)

Code:

var yamlDeserializer = new DeserializerBuilder().Build();
var yamlObject = yamlDeserializer.Deserialize(sr); -- FAILS HERE --

Looking at the data it appears that the problem is the files have a multi-line literal with additional carriage returns at the beginning. I'm new to YAML, but I'm wondering why it isn't considered reasonable to have more than one CRLF or indeed more than one space at the start of some literal text? Apart from editing the files is there any way around this?

Example literal with extra CRLF at the start:

    Body: |+
      
      Begin forwarded message:

I presume it is to do with trying to establish the indentation - are the files invalid or should the scanner be reading until it finds a non-blank line?

Thanks.

@wspresto
Copy link

I think this issue warrants a response. We are grappling with the same problem manifesting in a different way. Whenever we have empty lines (auto indented by text editors) our literal style is auto-forced into double quoted. I found where this is happening in the code Ln 911 @ Emitter.cs .

Rather than throwing an error, the Emitter object quietly changes the style to double quote if it feels the Scalar in question does not fit the bill to be a block Literal type. .. Suggestion: Allow forced styles or throw an error if requested style is not allowed.

@aaubry aaubry added the bug label Oct 20, 2020
@aaubry
Copy link
Owner

aaubry commented Oct 20, 2020

Of course a response is warranted. This is certainly a bug but I didn't have a chance to look into it. Do you want to help, @wspresto ?

@LaughingJohn
Copy link
Author

LaughingJohn commented Oct 27, 2020

Thanks for the response @aaubry. I did start to look at the code, but unfortunately like you I don't really have the time to dedicate either. In the end I used some regex to fix up most of the files and did a few manually so that that the parser could read them (this lost the first empty line, but in this case it doesn't really matter). I was doing a data migration so it's a one off exercise for me (hopefully). We've never been given YAML files before as a data extract, and having seen YAML and in particular the quality of the files we were given (the format was weird even for YAML) I hope to never see them again ;)
Thank you for this library though, it definitely got me through and would have been impossible without it!

@uanvas
Copy link

uanvas commented May 11, 2023

Hello @EdwardCooke and @JuergenGutsch,

I've encountered the same issue described here when trying to parse a valid YAML document containing an empty line after a block scalar indicator (|-). This has caused some trouble in my application, as it relies on parsing YAML files that might include this particular case.

Given that this issue has been open for some time, I wanted to kindly ask if there is any update or progress on addressing it? This problem significantly impacts the usability of the YamlDotNet library for certain use cases, and it would be great to have a resolution in the near future.

Thank you for your attention and for your work on this library. I appreciate your efforts to make YamlDotNet a reliable and robust tool for the community.

@JuergenGutsch
Copy link
Collaborator

@uanvas Let me have a look during the weekend.

@JuergenGutsch
Copy link
Collaborator

@uanvas Looking at the YAML specification it seems it is not valid to start a literal block scalar with an empty line. A scalar followed by an empty line is an empty scalar and if the following line after the leading line break has a child indentation, it seems to be wrong. Actually, the specification doesn't mention leading empty lines and I miss some more error specifications.

This means the specific files seem to be invalid.

@JuergenGutsch
Copy link
Collaborator

If it helps Quoting the content will make the leading empty line valid.

@perlpunk
Copy link

The example by @LaughingJohn is valid.
See also one of the spec examples: https://matrix.yaml.info/details/4QFQ.html
However, more spaces in the first, "empty" line than the following indentation is invalid, like in that test case: https://matrix.yaml.info/details/5LLU.html

@uanvas
Copy link

uanvas commented May 24, 2023

@JuergenGutsch, thank you for checking the issue. I utilized common validators like https://www.yamllint.com/ to validate the YAML, which led me to believe that it is valid.

However, when it comes to quoting the content, I'm unable to do so since I am validating the provided YAML.

Body: |-
      
      Begin forwarded message:

@gruenedd
Copy link

gruenedd commented Jul 19, 2023

I ran into the same issue. Isn't the following valid yaml?

Body: |-4
        <-- space until here (8 spaces in total)
    Foo

Should result in " Foo" because with explicit indention of 4 the string is clearly defined.
Online yaml to json converters convert this to the expected string value.

Note that adding a single tab in the string, say " \t\n Foo" works perfectly, so this feels like a bug.

Edit:
js-yaml will also parse the given yaml without any error

@EdwardCooke
Copy link
Collaborator

I’ll have to check but I’m pretty sure there’s nothing in yamldotnet parser that will handle that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants