Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In YAML, treat a:b (without space) as text, not key-value pair #15

Closed

Conversation

sjohannes
Copy link
Contributor

Currently a:b in YAML is parsed as if it's a key:value pair in a mapping, but instead it should be a text scalar.

Example:

colon:in:key: value

Current styling:

{2}colon{9}:{0}in:key: value

Expected styling:

{2}colon:in:key{9}:{0} value

There are two commits in this merge request. The first commit adds a YAML test directory containing some simple test cases for the sake of having that structure there. The second commit is the actual bugfix, including the test case mentioned above. I haven't added an entry in LexillaHistory because I'm not sure if you prefer to write it yourself, but let me know if I should supply it as well.

@zufuliu
Copy link
Contributor

zufuliu commented Jun 13, 2021

It's likely this will break (rarely used) JSON-like quoted keys:
http://ben-kiki.org/ypaste/data/3051/index.html

'key':value
"key":value

@sjohannes
Copy link
Contributor Author

Is that really allowed? I've tried a few parsers (PyYAML, JS-YAML, ruamel.yaml, and yaml (npm)) and all of them fail on "key":value, so I think at least in practice it's ok to ignore that particular syntax....

@sjohannes
Copy link
Contributor Author

Now that I've looked it up on the spec, I don't think your example is valid YAML.

Mappings use a colon and space (“: ”) to mark each key: value pair.
§2.1. Collections

Note however that in block mappings the value must never be adjacent to the “:”, as this greatly reduces readability and is not required for JSON compatibility (unlike the case in flow mappings).
§8.2.2. Block Mappings

That kind of syntax is is only allowed within flow syntax (e.g. {"key":value}), which LexYAML doesn't really support correctly anyway.

@nyamatongwe
Copy link
Member

Shouldn't the test for isspacechar be for an explicit (space) as isspacechar matches other whitespace (HT, LF, VT, FF, CR):

constexpr bool isspacechar(int ch) noexcept {
    return (ch == ' ') || ((ch >= 0x09) && (ch <= 0x0d));
}

@zufuliu
Copy link
Contributor

zufuliu commented Jun 14, 2021

https://yaml.org/spec/1.2/spec.html#Characters
5.4. Line Break Characters and 5.5. White Space Characters
at least space, tab and new lines can follows colon :.

@sjohannes
Copy link
Contributor Author

Yeah, it should just be space, tab, newlines, or nothing (end of file). I've modified the patch to check for

IsWhiteSpaceOrEOL(lineBuffer[i + 1]) || i == lengthLine - 1

where IsWhiteSpaceOrEOL checks for SP/HT/LF/CR, and i == lengthLine - 1 checks for EOF.

@nyamatongwe
Copy link
Member

Committed the changes. There were some problems with committing the first patch so the change to .gitattributes was extracted as a separate commit.

@nyamatongwe
Copy link
Member

Fix included in 5.1.0 release.

@nyamatongwe nyamatongwe added the yaml Caused by the yaml lexer label Aug 5, 2021
@nyamatongwe nyamatongwe mentioned this pull request Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
yaml Caused by the yaml lexer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants