Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does JSONPath guarantee an ordered list for multiple responses? #3

Closed
cburgmer opened this issue Jun 25, 2019 · 11 comments
Closed

Does JSONPath guarantee an ordered list for multiple responses? #3

cburgmer opened this issue Jun 25, 2019 · 11 comments

Comments

@cburgmer
Copy link
Owner

Not all implementations seem to return results in order of occurrence (depth-first) for recursive descending queries in the JSON structure, e.g. Java (com.github.jsurfer) for https://cburgmer.github.io/json-path-comparison/results/recursive_key.html and https://cburgmer.github.io/json-path-comparison/results/recursive_wildcard.html).

Is this a bug or just not guaranteed?

@remorhaz
Copy link
Collaborator

remorhaz commented Aug 29, 2019

Another interesting question is behaviour of slice with negative step. In Python negative step reverses iteration, like this:

>>> x = [1, 2, 3, 4, 5]
>>> x[::-1]
[5, 4, 3, 2, 1]

I decided that my JsonPath implementation should always return results in depth-first mode, so it will return [1, 2, 3, 4, 5] for $[::-1]. So, there are even more options:

  1. Results order is not deterministic.
  2. Results order is undefined but deterministic (several runs on same data give same results order).
  3. Results order is always depth-first with no exclusions.
  4. Results order is always depth-first excluding slice with negative step.
  5. Results order is always breadth-first with no exclusions.
  6. Results order is always breadth-first excluding slice with negative step.

@cburgmer
Copy link
Owner Author

(Thanks for the negative step idea, I added three more queries, e.g. https://cburgmer.github.io/json-path-comparison/results/array_index_slice_negative_step.html)

@cburgmer
Copy link
Owner Author

We should probably review the need for https://github.com/cburgmer/json-path-comparison/blob/master/src/canonical_json.py which re-orders results before comparison.

@remorhaz
Copy link
Collaborator

I thought alot about this case recently and concluded that we should split results comparation into several levels:

  1. Selected nodes (ignoring order).
  2. The order of selection (matched with some of the possible strategies I mentioned above).

Thus we can have several different consensuses for the same result set:

  • Selection consensus.
  • Result order consensus.

I don't see any reliable way to detect non-deterministic order (maybe just running the tests repeatedly, but that's time consuming and doesn't give any garantee), so I suggest just set it as a flag for those libraries that showed non-deterministic order at least once.

We can use similar approach with errors, configuring result post-processing for each library to detect following states:

  • Can be an error (false, null and so on for those libraries where it cannot be reliably differed from corresponding JSON node value).
  • Is an error (100% not a node value).
  • Is a query parse error
  • Is not a query parse error

In this case we can calculate error consensus more precisely, irrespective of implementation details.

@cburgmer
Copy link
Owner Author

I like your thinking. For the error status that will help us navigate around the different levels of expressiveness.
I'm hoping that we can still keep it simple for the ordering issue, at least for now.

@cburgmer
Copy link
Owner Author

Just to quote ECMA-404 "The JSON Data Interchange Syntax" here:

The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

@remorhaz
Copy link
Collaborator

remorhaz commented Sep 22, 2019

I think the next steps we should do next are:

  • Stop doing "regression" in a blocking way (if you didn't yet), because in fact it doesn't show any regression. Instead of it I suggest calculating a ratio of "successful" (passed+open) tests to total tests amount (maybe available as a badge). We will think about more convenient way for library owners to get notifications later.
  • Ignore order at all for all implementations - we'll return to this problem later. This should eliminate lack of consencus when the only difference is results order.
  • We should start consider errors as a valid result, because errors can form a consensus, too. In final table we can just show e link together with ✓/✗/?.

@cburgmer
Copy link
Owner Author

Happy to improve! Would you mind opening new issues for those points other than result ordering?

As for the ordering:
How do you feel about flagging just the problematic queries as WITHOUT_ORDER or something similar and re-order those results for a canonical representation?

@remorhaz
Copy link
Collaborator

I don't think we can detect "problematic" queries. "Problematic" are the implementations, and later we will find the way to detect approaches for different libraries. But it should be quite separate investigation.

@cburgmer
Copy link
Owner Author

Agree. We do know the queries that are disputed though: anything with recursive descent ($..*) and star operator on objects ($[*]). Did I miss anything?

cburgmer added a commit that referenced this issue Oct 3, 2019
This offers a temporary solution for #3, until we have a better understanding on what ordering can be guaranteed. See #15
Fixes #12.
@cburgmer
Copy link
Owner Author

I would close this issue for now and track pending open questions separately:

  1. Should implementations uniformly support a certain ordering? That's a questions towards a standard. That can be discussed with all other current disagreements.
  2. Do we want to flag the status of ordering in the table? I feel that's a hard one and would ignore it for now. Happy for ideas and code though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants