Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/yaml: unable to parse number value in YAML with leading zero #2578

Closed
jesselang opened this issue Sep 5, 2023 · 3 comments
Closed
Assignees

Comments

@jesselang
Copy link

What version of CUE are you using (cue version)?

$ cue version
cue version v0.6.0

go version go1.20.6
      -buildmode exe
       -compiler gc
       -trimpath true
     CGO_ENABLED 0
          GOARCH amd64
            GOOS darwin
         GOAMD64 v1

Does this issue reproduce with the latest stable release?

Yes. v0.6.0 appears to be the latest release.

What did you do?

I'm trying to parse a YAML value that contains a number with a leading zero.

exec cue vet a.cue a.yaml

-- a.cue --
account: uint
-- a.yaml --
# number without leading zero parses correctly.
account: 1123456778901
# number with leading zero results is parsing error.
account: 0123456778901

What did you expect to see?

The number is parsed. Perhaps this is challenging because some parsers parse the value as base 8, others might strip the leading zero, etc.

What did you see instead?

> exec cue vet a.cue a.yaml
[stderr]
account: parse error: illegal integer number "0123456778901":
    ./a.yaml:4:11
[exit status 1]
FAIL: /var/folders/yd/6gthgn951hlbr9xmst_p_bb1smsz_1/T/testscript715473880/leading-zero.txtar/script.txtar:1: unexpected command failure
error running leading-zero.txtar in /var/folders/yd/6gthgn951hlbr9xmst_p_bb1smsz_1/T/testscript715473880/leading-zero.txtar
@jesselang jesselang added NeedsInvestigation Triage Requires triage/attention labels Sep 5, 2023
@mvdan
Copy link
Member

mvdan commented Sep 6, 2023

This appears to be a bug in upstream go-yaml; see go-yaml/yaml#157. See @rogpeppe's replies in that thread, in particular - each YAML implementation seems to solve this ambiguity a different way, since 0129 could be interpreted as a valid base-10 integer, an invalid base-8 integer, or a valid unquoted string.

It seems to me like the YAML v1.2 spec is unambiguous in that 0o is the octal prefix, and not simply 0, so perhaps go-yaml should decode 0129 as a base-10 integer. However, its README says:

Octals encode and decode as 0777 per YAML 1.1, rather than 0o777 as specified in YAML 1.2, because most parsers still use the old format. Octals in the 0o777 format are supported though, so new files work.

Either way, this needs to be fixed upstream - then we can pick up the fix.

@mvdan mvdan added NeedsFix encoding and removed NeedsInvestigation Triage Requires triage/attention labels Sep 6, 2023
@myitcv myitcv changed the title Unable to parse number value in YAML with leading zero encoding/yaml: unable to parse number value in YAML with leading zero Sep 6, 2023
@jesselang
Copy link
Author

Thanks @mvdan. I was able to use go-yaml to decode the value as a string, while CUE will not. Here's a reproducer that I'd love your feedback on. Thanks!

go mod tidy
go run .
exec cue vet a.cue a.yaml

-- a.cue --
account: string
-- a.yaml --
# number without leading zero parses correctly.
account: 1123456778901
# number with leading zero results in parsing error.
account: 0123456778901
-- go.mod --
module b

go 1.20

require gopkg.in/yaml.v3 v3.0.1
-- b.yaml --
# number with leading zero loads fine with go-yaml
account: 0123456778901
-- b.go --
package main

import (
  "os"
  "fmt"

  "gopkg.in/yaml.v3"
)

type data struct {
    Account string `yaml:"account"`
}

func main() {
    f, err := os.Open("b.yaml")
    if err != nil {
        panic(err)
    }
    defer f.Close()

    dec := yaml.NewDecoder(f)

    var d data
    if err = dec.Decode(&d); err != nil {
        panic(err)
    }

    fmt.Printf("%s\n", d.Account)
}

Output:

> go mod tidy
[stderr]
go: downloading gopkg.in/yaml.v3 v3.0.1
go: downloading gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405

> go run .
[stdout]
0123456778901

> exec cue vet a.cue a.yaml
[stderr]
account: parse error: illegal integer number "0123456778901":
    ./a.yaml:4:11
[exit status 1]
FAIL: /var/folders/yd/6gthgn951hlbr9xmst_p_bb1smsz_1/T/testscript3557253812/leading-zero.txtar/script.txtar:3: unexpected command failure
error running leading-zero.txtar in /var/folders/yd/6gthgn951hlbr9xmst_p_bb1smsz_1/T/testscript3557253812/leading-zero.txtar

@mvdan
Copy link
Member

mvdan commented May 22, 2024

Thank you @jesselang, indeed it seems like the error comes from CUE and not go-yaml, and we can tweak our YAML decoding logic to not produce an invalid/illegal CUE literal. I'll send a fix shortly.

@mvdan mvdan self-assigned this May 22, 2024
cueckoo pushed a commit that referenced this issue May 22, 2024
YAML 1.1 has octal numbers with the "0" prefix,
whereas YAML 1.2 has octal numbers with the "0o" prefix.
Both are used in the wild, so we must be able to handle both.

Right now, a non-octal number beginning with "0" results in an error,
whereas most YAML decoders, including go-yaml's Unmarshal,
fall back to decoding the value as a string.
We will implement a fix like that in a follow-up commit;
this commit simply adds test cases to show the current behavior.

For #2578.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I46f54f43d7f47c83ec907e421a37b049ad9d3137
cueckoo pushed a commit that referenced this issue May 22, 2024
That is, the input

    not_octal_yaml11: 0123456789

used to give the error

    cannot decode "0123456789" as !!float: illegal integer number "0123456789"

given that YAML would hint that the value was float rather than int
(given that it is not made up of octal digits), and we would try to
decode as the CUE decimal literal 0123456789, which is invalid
since a decimal can only start with "0" when it is the zero number:

    decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } .

Much like other YAML decoders when they interpret this edge case,
including go-yaml itself, we now decode the input value as a CUE string:

    not_octal_yaml11: "0123456789"

Note that an alternative would be decoding as the valid CUE integer

    not_octal_yaml11: 123456789

and, while this may be OK for some programs using YAML 1.2 only
that meant the value to be a decimal integer, some other programs
may be depending on the string semantics where the leading 0 is kept.

Fixes #2578.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: If21bd469a2f75e07b76020451633627887868338
cueckoo pushed a commit that referenced this issue May 23, 2024
YAML 1.1 has octal numbers with the "0" prefix,
whereas YAML 1.2 has octal numbers with the "0o" prefix.
Both are used in the wild, so we must be able to handle both.

Right now, a non-octal number beginning with "0" results in an error,
whereas most YAML decoders, including go-yaml's Unmarshal,
fall back to decoding the value as a string.
We will implement a fix like that in a follow-up commit;
this commit simply adds test cases to show the current behavior.

Also add cases where YAML 1.1 and 1.2 octal integer literals
are prefixed with an explicit !!float tag, where the user definitely
does not mean the input to be an octal integer.
We don't want any change in our handling of octal integers to change
these cases, even if one of the cases is broken per the added TODO.

For #2578.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I46f54f43d7f47c83ec907e421a37b049ad9d3137
cueckoo pushed a commit that referenced this issue May 23, 2024
That is, the input

    not_octal_yaml11: 0123456789

used to give the error

    cannot decode "0123456789" as !!float: illegal integer number "0123456789"

given that YAML would hint that the value was float rather than int
(given that it is not made up of octal digits), and we would try to
decode as the CUE decimal literal 0123456789, which is invalid
since a decimal can only start with "0" when it is the zero number:

    decimal_lit = "0" | ( "1" … "9" ) { [ "_" ] decimal_digit } .

Much like other YAML decoders when they interpret this edge case,
including go-yaml itself, we now decode the input value as a CUE string:

    not_octal_yaml11: "0123456789"

Note that an alternative would be decoding as the valid CUE integer

    not_octal_yaml11: 123456789

and, while this may be OK for some programs using YAML 1.2 only
that meant the value to be a decimal integer, some other programs
may be depending on the string semantics where the leading 0 is kept.

Fixes #2578.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: If21bd469a2f75e07b76020451633627887868338
cueckoo pushed a commit that referenced this issue May 23, 2024
YAML 1.1 has octal numbers with the "0" prefix,
whereas YAML 1.2 has octal numbers with the "0o" prefix.
Both are used in the wild, so we must be able to handle both.

Right now, a non-octal number beginning with "0" results in an error,
whereas most YAML decoders, including go-yaml's Unmarshal,
fall back to decoding the value as a string.
We will implement a fix like that in a follow-up commit;
this commit simply adds test cases to show the current behavior.

Also add cases where YAML 1.1 and 1.2 octal integer literals
are prefixed with an explicit !!float tag, where the user definitely
does not mean the input to be an octal integer.
We don't want any change in our handling of octal integers to change
these cases, even if one of the cases is broken per the added TODO.

For #2578.

Signed-off-by: Daniel Martí <mvdan@mvdan.cc>
Change-Id: I46f54f43d7f47c83ec907e421a37b049ad9d3137
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1195046
TryBot-Result: CUEcueckoo <cueckoo@cuelang.org>
Reviewed-by: Roger Peppe <rogpeppe@gmail.com>
Unity-Result: CUE porcuepine <cue.porcuepine@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants