Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Json: add ability to do semantic comparisons of JSON values à la JToken.DeepEquals() #33388

Closed
dbc2 opened this issue Mar 9, 2020 · 30 comments · Fixed by #104579
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@dbc2
Copy link

dbc2 commented Mar 9, 2020

API proposal

namespace System.Text.Json;

public partial struct JsonElement
{
    public static bool DeepEquals(JsonElement element1, JsonElement element2);
}

// Existing API
public partial class JsonNode
{
    public static bool DeepEquals(JsonNode? node1, JsonNode? node2);
}

Alternative Designs

The static method approach mirrors the shape used by JsonNode.DeepEquals, however in that case we are forced to use a static because null is a valid representation for the case of JsonNode (representing JSON null). This concern does not exist here so might as well just use an instance method instead.

In terms of implementation, property comparison always uses case sensitive ordinal comparison however we might consider introducing case insensitive comparison as well.

Original post Json.NET has the ability to do a deep semantic comparison of two JSON tokens via [`JToken.DeepEquals()`](https://www.newtonsoft.com/json/help/html/M_Newtonsoft_Json_Linq_JToken_DeepEquals.htm):

Compares the values of two tokens, including the values of all descendant tokens.

It also provides JTokenEqualityComparer for hashing and comparing of JSON tokens.

There does not appear to be an equivalent functionality in System.Text.Json for comparing two JsonElement or JsonDocument objects. As this is a common requirement (e.g. in writing unit tests, or checking for differences between object versions) it would be useful to have.

Sample Stack Overflow questions:

Issues:

  1. Formatting should be ignored.

  2. Differences due to string escaping probably should be ignored. (E.g. "a" should be equivalent to "\u0061".)

  3. Differences due to trailing zeroes probably should be significant. When deserializing to decimal trailing zeroes are preserved; financial, engineering and scientific apps sometimes make use of this.

    Example Stack Overflow questions:

    Utf8JsonReader preserves the underlying character representation when parsing a number so you have the advantage over JsonTextReader here, as the latter discards the character representation after recognizing a token as a number.

  4. Differences due to ordering of unique property names should be ignored since the original JSON proposal states, An object is an unordered set of name/value pairs.

  5. Differences due to ordering of duplicate property names require some thought.

    I have noticed that, surprisingly, JsonDocument fully supports duplicate property names! I.e. it's perfectly happy to parse {"Value":"a", "Value" : "b"} and will store both key/value pairs inside the document. (Thanks for that, I guess.)

    A close reading of https://tools.ietf.org/html/rfc8259#section-4 seems to indicate that such objects are allowed but not recommended, and when they arise, interpretation of identically-named properties may be order-dependent. So I'd propose that relative order of identically-named properties should be significant while relative order of distinctly named properties should not. Such a proposal could be implemented by stably sorting the properties by name, then comparing the sorted property lists in order.

  6. Differences due to casing of property names should be significant by default, but possibly there should be a configuration argument to ignore property name casing. (JToken.DeepEquals() doesn't support this.)

One possible demo comparer can be found here, but better performance may be possible by using internal methods.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Mar 9, 2020
@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Mar 13, 2020
@layomia layomia added this to the Future milestone Mar 13, 2020
@layomia
Copy link
Contributor

layomia commented Mar 13, 2020

Triage: this isn't on our roadmap right now.

It would require discussion on what the behavior would be. It would be good to know what scenarios this would be used for. You provided a set of exceptions here, would you hardcode these or would you want options to control these?

@bjbr-dev
Copy link

bjbr-dev commented Jun 4, 2020

I'd appreciate this being considered for the next iteration. We use Newtonsoft's JToken.DeepEqual extensively in unit test code, and we were hoping to see the equivalent in System.Text.Json.

This stack overflow question also highlights community desire for this feature: https://stackoverflow.com/questions/60580743/what-is-equivalent-in-jtoken-deepequal-in-system-text-json

@mattdarwin
Copy link

mattdarwin commented Nov 5, 2020

me too please. I want to be able to compare two JsonDocument objects for logical equality, ignoring formatting and property ordering.
Maybe we should be able to vote on these issue, so you can see how much demand there is for this fix.

@svick
Copy link
Contributor

svick commented Nov 5, 2020

@mattdarwin

Maybe we should be able to vote on these issue, so you can see how much demand there is for this fix.

You can, you do that but giving the issue the "+1" (👍) reaction.

@gregsdennis
Copy link
Contributor

gregsdennis commented Feb 24, 2021

I've been able to define some deep equality functionality in my Json.More package, but doing .GetHashCode() efficiently still eludes me (although I really like that first SO answer).

JsonElement doesn't override this, which means that different instances (remember it's a value type, so you can only practically get different instances unless you start using pointers, which... ew) return different hash values, even if they are otherwise exactly equal.

In trying to implement my own hash code generation, I'd like to use .GetRawText(), but it only returns the input string. This presents a problem for objects and arrays. For example, [ 1, 2, 3 ] and [1,2,3] return those values, respectively, so they don't work for getting a "JSON hash code." And there's not another way (that I'm aware of) to get a consistent output short of putting the element through the serializer.

So that leaves us with performing a serialization in the midst of a method that is supposed to be as efficient as possible.

(Marginally related to #42502.)

@sashafp10
Copy link

sashafp10 commented Apr 22, 2021

As a workaround, you can remove all non-valuable symbols like spaces, tabs, newlines, and so on. Then just sort all symbols in alphabetical order. There is a very small chance that for big JSON files such strings will be equal to each other.

@eiriktsarpalis
Copy link
Member

Related to #56592, which proposes a DeepEquals method for JsonNode.

@steveharter would it make sense to also include this functionality for JsonElement? I suspect we might need to implement it since certain JsonNode instances are backed by JsonElement representations.

@bd
Copy link

bd commented Jan 13, 2022

I, too, could really use the ability to compare JsonDocuments for content.

@AdamMil
Copy link

AdamMil commented Jan 28, 2022

Differences due to string escaping probably should be ignored. (E.g. "a" should be equivalent to "\u0061".)

They should definitely be ignored. Those are two ways of writing the exact same string.

Differences due to trailing zeroes probably should be significant. When deserializing to decimal trailing zeroes are preserved; financial, engineering and scientific apps sometimes make use of this.

I disagree. Applications that rely on trailing zeroes are like applications that rely on "\u0061" versus "a". 1.0m and 1.00m are equal, even though they print differently. (1.0m == 1.00m is true.) DeepEquals should implement semantic equality, not syntactic equality. (You can already do syntactic equality by comparing the JSON strings, just as you'd have to compare the binary or string values to distinguish between 1.0m and 1.00m.) In fact, the JSON numbers "11", "11.0", and "1.1e1" should all be considered equal: they are all ways to represent the integer 11.

If applications care about trailing zeros, they should store their numbers in strings, just like applications that want to represent NaN or Infinity do. Not doing so is asking for trouble.

JSON numbers are defined as arbitrary-precision numbers with a finite decimal representation, but they can be compared efficiently just by looking at the strings without resorting to any complicated arbitrary-precision arithmetic. (They effectively consist of a negation flag, a whole part, and a fractional part. The exponent, if any, simply shifts the boundary between the whole and fractional parts. Two numbers are equal if they have equal whole parts (excluding leading zeros) and equal fractional parts (excluding trailing zeros) and if either both are zero or they have equal negation flags.

All that said, I definitely support the call for a DeepEquals function. Unfortunately it is practically impossible to write one without resorting to serializing to JSON and interpreting the JSON according to its defined semantics all by yourself. (See #64472).

@weichch
Copy link
Contributor

weichch commented Apr 21, 2022

I rolled my own version of semantic comparison in my SystemTextJson.JsonDiffPatch package. It would be good to have an official version.

@chemistrytocode
Copy link

chemistrytocode commented Aug 1, 2022

Triage: this isn't on our roadmap right now.

How? There are tests in this codebase isn't there? How do you test your JsonDocuments if you can't do a comparison?

@eiriktsarpalis
Copy link
Member

Triage: this isn't on our roadmap right now.

How? There are tests in this codebase isn't there? How do you test your JsonDocuments if you can't do a comparison?

I don't see the connection. It's possible to define an equality comparer (for a certain notion of JSON value equality) as a test helper.

@ebarnard
Copy link

ebarnard commented Sep 8, 2022

I think what @chemistrytocode is saying is that, given there is a method for equality comparison of JsonElements in the System.Text.Json tests, it would be straightforward to expose this as a method on JsonDocument/JsonElement.

It is not ideal that the recommended way to compare JsonDocument for equality is for everyone to copy 70 lines of untested code into their codebase.

@gregsdennis
Copy link
Contributor

It is not ideal that the recommended way to compare JsonDocument for equality is for everyone to copy 70 lines of untested code into their codebase.

... or reference a library that has.

That said, I utilize the comparison functions I have in Json.More.Net throughout my suite of libs in json-everything. I'd say it's fairly well-tested.

@eiriktsarpalis
Copy link
Member

I think what @chemistrytocode is saying is that, given there is a method for equality comparison of JsonElements in the System.Text.Json tests, it would be straightforward to expose this as a method on JsonDocument/JsonElement.

I wouldn't think so. For one, it is probably too inefficient for any use beyond testing (e.g. as an equality comparer in a dictionary). Fundamentally, this reflects the type's design and implementation: unlike JToken/JsonNode objects aren't represented using dictionaries so these need to be constructed on each comparison.

If we did add a built-in implementation of equality, I think JsonDocument/JsonElement might be more conducive to a weaker byte-by-byte comparison, although I think this might violate expectations for certain users (property order, indentation issues, etc.). Ultimately it might be better if we let users define the precise notion of equality relevant to their use case.

@gregsdennis
Copy link
Contributor

Ultimately it might be better if we let users define the precise notion of equality relevant to their use case.

The entire point of this issue, and everyone's argument, is that we need some sort of JSON-equality comparison (the model, not the text). That's the use case that everyone is describing. No one in this thread has requested some other kind of equality.

@eiriktsarpalis
Copy link
Member

@gregsdennis that much is clear -- at risk of repeating my earlier point, the particular type of equality comparison is expensive for the representation employed by JsonDocument. It might be preferable if we simply direct users to use JsonNode instead or have them write their own IEqualityComparer implementation if they really need to.

@weichch
Copy link
Contributor

weichch commented Sep 13, 2022

It might be preferable if we simply direct users to use JsonNode instead

AFAIK, JsonNode can also be backed by a JsonElement, and implementing semantic equality is probably as much expensive as implementing it for JsonElement. Also JsonNode does not offer a cheap way to retrieve the underlying value when the type is not known.

I've implemented my own DeepEquals for both JsonNode and JsonElement, for one-off comparisons, it is relatively okay as ideally each property value is only materialized once per object comparison/call to DeepEquals.

However, performance will become a concern when implementing advanced comparison such as determining the longest common sequence of two JSON arrays. In this case, because JToken essentially parses/converts the value once, the performance is way better than a JsonNode backed by JsonElement. I can cache the values in a dictionary, but it's way slower than caching the values in JsonNode itself.

Also to your earlier point @eiriktsarpalis

I think JsonDocument/JsonElement might be more conducive to a weaker byte-by-byte comparison

I think this is a good thing to have. The closest method JsonElement currently offers is ValueEquals method which only allows comparing with a Span<char> or string. For two JsonElement that are strings, one must be materialized to use the ValueEquals method, which is probably not necessary. For two JsonElement s of other types, the only way to do byte-by-byte comparison is to use GetRawText then compare the two raw strings. Both are expensive operations IMO.

@eiriktsarpalis
Copy link
Member

AFAIK, JsonNode can also be backed by a JsonElement, and implementing semantic equality is probably as much expensive as implementing it for JsonElement.

That's right, in fact JsonNode (in particular JsonValue) is capable of encapsulating arbitrary .NET objects. As such, the concept of JSON equality becomes problematic in general (presumably delegating to Equals() is the best we can do here). The JsonNode parsing functionality is implemented using JsonValue<JsonElement>, but that's just an implementation detail that can be changed in the future.

@georg-eckert-zeiss
Copy link

Any news on this feature request?

@eiriktsarpalis
Copy link
Member

We do all our planning in the open, so any new developments on this particular issue will be posted here first thing.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jun 20, 2024

I recently had to implement this internally as part of #103733. The API shape looks as follows:

namespace System.Text.Json;

public partial struct JsonElement
{
    public static bool DeepEquals(JsonElement element1, JsonElement element2);
}

// Existing API
public partial class JsonNode
{
    public static bool DeepEquals(JsonNode? node1, JsonNode? node2);
}

Alternative Designs

The static method approach mirrors the shape used by JsonNode.DeepEquals, however in that case we are forced to use a static because null is a valid representation for the case of JsonNode (representing JSON null). This concern does not exist here so might as well just use an instance method instead.

In terms of implementation, property comparison always uses case sensitive ordinal comparison however we might consider introducing case insensitive comparison as well. JSON numbers are considered equal if and only if their JSON string representations are equal.

@eiriktsarpalis eiriktsarpalis self-assigned this Jun 20, 2024
@eiriktsarpalis eiriktsarpalis added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Jun 20, 2024
@gregsdennis
Copy link
Contributor

gregsdennis commented Jun 20, 2024

JSON numbers are considered equal if and only if their JSON string representations are equal.

No, JSON numbers are considered equal if their numeric values are equal. I have an open issue about how decimal screws up the string comparison approach because it encodes precision (4 vs 4.0).

If you can guarantee the numbers are always serialized to the same format, then sure, but it's apparent that that doesn't hold for decimal.

@eiriktsarpalis
Copy link
Member

I suspect it might be expensive to precisely define equality for JSON numbers given that they can be arbitrarily large (including arbitrarily large exponentiation). The syntactic approach to numeric equality is more conservative, but it's what ended up being used in JsonNode.DeepEquals as well.

Perhaps there is prior art that we can follow, a JsonNumber struct that defines an appropriate definition equality might be useful in this context.

cc @tannergooding

@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 9.0.0 Jun 25, 2024
@eiriktsarpalis eiriktsarpalis added blocking Marks issues that we want to fast track in order to unblock other important work and removed blocking Marks issues that we want to fast track in order to unblock other important work labels Jun 25, 2024
@eiriktsarpalis eiriktsarpalis added the blocking Marks issues that we want to fast track in order to unblock other important work label Jul 2, 2024
@weichch
Copy link
Contributor

weichch commented Jul 6, 2024

A JsonNumber struct is what I ended up having in my own project.

It’s a bit of challenge to implement semantic equality for the JSON nodes that backed by an element. Another thing I found interesting was when a node was backed by element, we have to make sure the semantic values are cached for re-comparison, e.g. when you compare the same nodes multiple times, you don’t want to deserialize the texts again.

@eiriktsarpalis
Copy link
Member

FWIW I have an open PR that improves numeric equality for JsonElement/JsonNode. Turns out you can do arbitrary precision equality comparison if you stick to the underlying span representation.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jul 8, 2024
@bartonjs
Copy link
Member

bartonjs commented Jul 9, 2024

Video

Looks good as proposed.

namespace System.Text.Json;

public partial struct JsonElement
{
    public static bool DeepEquals(JsonElement element1, JsonElement element2);
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 9, 2024
@RenderMichael
Copy link
Contributor

Is there any chance that a DeepEquals(JsonElement, JsonNode?) method naturally falls out of what's already been implemented internally, without too much extra work? It's a comparison I've had to make, and it would be nice to have support out of the box.

@eiriktsarpalis
Copy link
Member

Is converting one of the two values a possibility? It wouldn't be trivial to add and the two representations aren't equivalent (e.g. JsonElement supports duplicate properties)

@RenderMichael
Copy link
Contributor

It's not a huge deal to convert between the two, I just wanted to ask as someone without any knowledge of STJ internals and esoterica.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.