Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json.decode is too strict; fails on duplicate keys #15605

Closed
alexeagle opened this issue Jun 2, 2022 · 12 comments
Closed

json.decode is too strict; fails on duplicate keys #15605

alexeagle opened this issue Jun 2, 2022 · 12 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel type: bug

Comments

@alexeagle
Copy link
Contributor

alexeagle commented Jun 2, 2022

Description of the bug:

A couple of packages are published to npm with a JSON descriptor which is valid in every context it's used.

However json.decode fails on this part of the file:

  "typeCoverage": {
    "atLeast": 100,
    "detail": true,
    "strict": true
  },
  "typeCoverage": {
    "atLeast": 100,
    "detail": true,
    "strict": true,
    "ignoreCatch": true
  }

More context here:
aspect-build/rules_js#148

I agree with the library author that it's really a Bazel bug, which should follow the https://en.wikipedia.org/wiki/Robustness_principle of "be permissive in what you accept". Author points this out:
wooorm/decode-named-character-reference#2
wooorm/stringify-entities#11

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

json.decode("""
{
  "typeCoverage": {
    "atLeast": 100,
    "detail": true,
    "strict": true
  },
  "typeCoverage": {
    "atLeast": 100,
    "detail": true,
    "strict": true,
    "ignoreCatch": true
  }
}
""")

What is the output of bazel info release?

5.x

@emidln-imc
Copy link

What is the most useful and still rational behavior here? You could plausibly argue four approaches:

  1. keep the value of the first key seen
  2. keep the value of the last key seen
  3. merge the values of duplicates if the value is an object
  4. error

@alexeagle
Copy link
Contributor Author

alexeagle commented Jun 7, 2022

experiment:

$ node -e 'console.log(JSON.parse("{\"a\": 1, \"a\": 2}"))'
{ a: 2 }

So the answer I'd most like coming from a NodeJS ecosystem is to take your option 2.

I dunno if that's formally documented in the nodejs spec.

@brandjon
Copy link
Member

Taking the last entry seems preferable over taking the first, since it's more consistent with dict.update, the dict() constructor, etc. But it's not clear whether that's better or worse than failing. We could also make it an option, i.e. a named arg to decode, but I'm not sure I want to complicate that API.

@brandjon brandjon added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed untriaged labels Oct 26, 2022
@alexeagle
Copy link
Contributor Author

ping! @brandjon will you take PRs for this?

@aherrmann
Copy link
Contributor

ECMA-262 says under 25.5.1.1

Note 2 In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten.

This aligns with option 2.

@alexeagle
Copy link
Contributor Author

ping! @brandjon would you accept a PR? Another user on Slack tripped over this.

@tetromino
Copy link
Contributor

Different JSON parsers diverge on this point: https://stackoverflow.com/questions/21832701/does-json-syntax-allow-duplicate-keys-in-an-object

Javascript, and anything that tries to be compatible with Javascript, of course allows duplicate keys and keeps the last value.

C and C++ based JSON parsers typically don't allow duplicate keys.

I have no strong opinion on which way to go, but given the need to integrate with npm, we may as well follow Javascript's behavior.

@alexeagle
Copy link
Contributor Author

I think the critical observation is that Bazel is meant to be the orchestration for a bunch of third-party package managers and tools, so it should follow the https://en.wikipedia.org/wiki/Robustness_principle
Meaning, even if parsers typically reject some input, Bazel's job is to "be permissive in what it accepts" to maximize compatibility.
Same argument I made for c9e2be5

@brandjon
Copy link
Member

brandjon commented Mar 6, 2023

Rationale for accepting the change:

  • The JSON RFC apparently leaves this unspecified, so we're not prohibited from doing it.
  • JavaScript will allow it, with the unsurprising (and consistent with dict) semantics that last key wins.
  • It supports this Node-related use case.

(The "Robustness principle" can be problematic -- think HTML. But we're maintaining a parser, not writing a new standard.)

@LavaToaster
Copy link
Contributor

@brandjon @tetromino I don't suppose you guys know which release this will hit?

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Mar 23, 2023
@fmeum fmeum removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Mar 23, 2023
@fmeum
Copy link
Collaborator

fmeum commented Mar 23, 2023

Given that this doesn't change the behavior on inputs that didn't hard fail before this change, I would propose to cherry-pick it into 6.2.0. Otherwise it would naturally make its way into 7.0.0.

@tetromino
Copy link
Contributor

I am asking for a cherry-pick at #17868

ShreeM01 pushed a commit that referenced this issue Mar 23, 2023
Resolves #15605

Closes #17645.

PiperOrigin-RevId: 514491743
Change-Id: I17ea9fb57682b668bff02bc64fefd75edb2cf2ee

Co-authored-by: Adam Lavin <adam@lavoaster.co.uk>
fweikert pushed a commit to fweikert/bazel that referenced this issue May 25, 2023
Resolves bazelbuild#15605

Closes bazelbuild#17645.

PiperOrigin-RevId: 514491743
Change-Id: I17ea9fb57682b668bff02bc64fefd75edb2cf2ee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants