-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Text.Json: add ability to do semantic comparisons of JSON values à la JToken.DeepEquals() #33388
Comments
Triage: this isn't on our roadmap right now. It would require discussion on what the behavior would be. It would be good to know what scenarios this would be used for. You provided a set of exceptions here, would you hardcode these or would you want options to control these? |
I'd appreciate this being considered for the next iteration. We use Newtonsoft's JToken.DeepEqual extensively in unit test code, and we were hoping to see the equivalent in System.Text.Json. This stack overflow question also highlights community desire for this feature: https://stackoverflow.com/questions/60580743/what-is-equivalent-in-jtoken-deepequal-in-system-text-json |
me too please. I want to be able to compare two JsonDocument objects for logical equality, ignoring formatting and property ordering. |
You can, you do that but giving the issue the "+1" (👍) reaction. |
I've been able to define some deep equality functionality in my Json.More package, but doing
In trying to implement my own hash code generation, I'd like to use So that leaves us with performing a serialization in the midst of a method that is supposed to be as efficient as possible. (Marginally related to #42502.) |
As a workaround, you can remove all non-valuable symbols like spaces, tabs, newlines, and so on. Then just sort all symbols in alphabetical order. There is a very small chance that for big JSON files such strings will be equal to each other. |
Related to #56592, which proposes a @steveharter would it make sense to also include this functionality for |
I, too, could really use the ability to compare JsonDocuments for content. |
They should definitely be ignored. Those are two ways of writing the exact same string.
I disagree. Applications that rely on trailing zeroes are like applications that rely on "\u0061" versus "a". 1.0m and 1.00m are equal, even though they print differently. (1.0m == 1.00m is true.) DeepEquals should implement semantic equality, not syntactic equality. (You can already do syntactic equality by comparing the JSON strings, just as you'd have to compare the binary or string values to distinguish between 1.0m and 1.00m.) In fact, the JSON numbers "11", "11.0", and "1.1e1" should all be considered equal: they are all ways to represent the integer 11. If applications care about trailing zeros, they should store their numbers in strings, just like applications that want to represent NaN or Infinity do. Not doing so is asking for trouble. JSON numbers are defined as arbitrary-precision numbers with a finite decimal representation, but they can be compared efficiently just by looking at the strings without resorting to any complicated arbitrary-precision arithmetic. (They effectively consist of a negation flag, a whole part, and a fractional part. The exponent, if any, simply shifts the boundary between the whole and fractional parts. Two numbers are equal if they have equal whole parts (excluding leading zeros) and equal fractional parts (excluding trailing zeros) and if either both are zero or they have equal negation flags. All that said, I definitely support the call for a DeepEquals function. Unfortunately it is practically impossible to write one without resorting to serializing to JSON and interpreting the JSON according to its defined semantics all by yourself. (See #64472). |
I rolled my own version of semantic comparison in my |
How? There are tests in this codebase isn't there? How do you test your |
I don't see the connection. It's possible to define an equality comparer (for a certain notion of JSON value equality) as a test helper. |
I think what @chemistrytocode is saying is that, given there is a method for equality comparison of It is not ideal that the recommended way to compare |
... or reference a library that has. That said, I utilize the comparison functions I have in Json.More.Net throughout my suite of libs in json-everything. I'd say it's fairly well-tested. |
I wouldn't think so. For one, it is probably too inefficient for any use beyond testing (e.g. as an equality comparer in a dictionary). Fundamentally, this reflects the type's design and implementation: unlike If we did add a built-in implementation of equality, I think |
The entire point of this issue, and everyone's argument, is that we need some sort of JSON-equality comparison (the model, not the text). That's the use case that everyone is describing. No one in this thread has requested some other kind of equality. |
@gregsdennis that much is clear -- at risk of repeating my earlier point, the particular type of equality comparison is expensive for the representation employed by |
AFAIK, I've implemented my own However, performance will become a concern when implementing advanced comparison such as determining the longest common sequence of two JSON arrays. In this case, because Also to your earlier point @eiriktsarpalis
I think this is a good thing to have. The closest method |
That's right, in fact |
Any news on this feature request? |
We do all our planning in the open, so any new developments on this particular issue will be posted here first thing. |
I recently had to implement this internally as part of #103733. The API shape looks as follows: namespace System.Text.Json;
public partial struct JsonElement
{
public static bool DeepEquals(JsonElement element1, JsonElement element2);
}
// Existing API
public partial class JsonNode
{
public static bool DeepEquals(JsonNode? node1, JsonNode? node2);
} Alternative DesignsThe static method approach mirrors the shape used by In terms of implementation, property comparison always uses case sensitive ordinal comparison however we might consider introducing case insensitive comparison as well. JSON numbers are considered equal if and only if their JSON string representations are equal. |
No, JSON numbers are considered equal if their numeric values are equal. I have an open issue about how decimal screws up the string comparison approach because it encodes precision (4 vs 4.0). If you can guarantee the numbers are always serialized to the same format, then sure, but it's apparent that that doesn't hold for decimal. |
I suspect it might be expensive to precisely define equality for JSON numbers given that they can be arbitrarily large (including arbitrarily large exponentiation). The syntactic approach to numeric equality is more conservative, but it's what ended up being used in JsonNode.DeepEquals as well. Perhaps there is prior art that we can follow, a JsonNumber struct that defines an appropriate definition equality might be useful in this context. |
A JsonNumber struct is what I ended up having in my own project. It’s a bit of challenge to implement semantic equality for the JSON nodes that backed by an element. Another thing I found interesting was when a node was backed by element, we have to make sure the semantic values are cached for re-comparison, e.g. when you compare the same nodes multiple times, you don’t want to deserialize the texts again. |
FWIW I have an open PR that improves numeric equality for JsonElement/JsonNode. Turns out you can do arbitrary precision equality comparison if you stick to the underlying span representation. |
Looks good as proposed. namespace System.Text.Json;
public partial struct JsonElement
{
public static bool DeepEquals(JsonElement element1, JsonElement element2);
} |
Is there any chance that a |
Is converting one of the two values a possibility? It wouldn't be trivial to add and the two representations aren't equivalent (e.g. JsonElement supports duplicate properties) |
It's not a huge deal to convert between the two, I just wanted to ask as someone without any knowledge of STJ internals and esoterica. |
API proposal
Alternative Designs
The static method approach mirrors the shape used by
JsonNode.DeepEquals
, however in that case we are forced to use a static becausenull
is a valid representation for the case ofJsonNode
(representing JSON null). This concern does not exist here so might as well just use an instance method instead.In terms of implementation, property comparison always uses case sensitive ordinal comparison however we might consider introducing case insensitive comparison as well.
Original post
Json.NET has the ability to do a deep semantic comparison of two JSON tokens via [`JToken.DeepEquals()`](https://www.newtonsoft.com/json/help/html/M_Newtonsoft_Json_Linq_JToken_DeepEquals.htm):It also provides
JTokenEqualityComparer
for hashing and comparing of JSON tokens.There does not appear to be an equivalent functionality in
System.Text.Json
for comparing twoJsonElement
orJsonDocument
objects. As this is a common requirement (e.g. in writing unit tests, or checking for differences between object versions) it would be useful to have.Sample Stack Overflow questions:
Issues:
Formatting should be ignored.
Differences due to string escaping probably should be ignored. (E.g.
"a"
should be equivalent to"\u0061"
.)Differences due to trailing zeroes probably should be significant. When deserializing to
decimal
trailing zeroes are preserved; financial, engineering and scientific apps sometimes make use of this.Example Stack Overflow questions:
Utf8JsonReader
preserves the underlying character representation when parsing a number so you have the advantage overJsonTextReader
here, as the latter discards the character representation after recognizing a token as a number.Differences due to ordering of unique property names should be ignored since the original JSON proposal states, An object is an unordered set of name/value pairs.
Differences due to ordering of duplicate property names require some thought.
I have noticed that, surprisingly,
JsonDocument
fully supports duplicate property names! I.e. it's perfectly happy to parse{"Value":"a", "Value" : "b"}
and will store both key/value pairs inside the document. (Thanks for that, I guess.)A close reading of https://tools.ietf.org/html/rfc8259#section-4 seems to indicate that such objects are allowed but not recommended, and when they arise, interpretation of identically-named properties may be order-dependent. So I'd propose that relative order of identically-named properties should be significant while relative order of distinctly named properties should not. Such a proposal could be implemented by stably sorting the properties by name, then comparing the sorted property lists in order.
Differences due to casing of property names should be significant by default, but possibly there should be a configuration argument to ignore property name casing. (
JToken.DeepEquals()
doesn't support this.)One possible demo comparer can be found here, but better performance may be possible by using internal methods.
The text was updated successfully, but these errors were encountered: