Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for canonical JSON output in Jackson 2.16 #3967

Draft
wants to merge 11 commits into
base: 2.16
Choose a base branch
from

Conversation

digulla
Copy link

@digulla digulla commented Jun 6, 2023

Canonical JSON based on Jackson 2.16.

The code for setting up the JsonMapper has been extracted in a builder and I've started with a JsonAssert helper class that can be used in tests to load and verify JSON data.

EDIT: (by @cowtowncoder)

I kept all classes in the tests folder since there will be many more changes and it's easier when everything is in a single place.
Added JsonAssert with several assertion methods that can be used similar to assertEquals().
@digulla digulla changed the title Canonical json 2 16 Support for canonical JSON output in Jackson 2.16 Jun 6, 2023
@JooHyukKim
Copy link
Member

Just to keep things connected, leaving here the original issue FasterXML/jackson-core#1037

import com.fasterxml.jackson.databind.SerializationFeature;
import com.fasterxml.jackson.databind.json.JsonMapper;

public class CanonicalJsonMapper { // TODO It would be great if we could extend JsonMapper but the return type of builder() is incompatible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not going to be possible since with 3.0 especially there are be per-format sub-classes of ObjectMapper (2.x actually has these too, but with 3.0 it is more fundamentally so, to match particular TokenStreamFactory with matching ObjectMapper subtype).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I feel that the implementation of the builders as is leads to a lot of code duplication. Any ideas how to improve this?

@cowtowncoder
Copy link
Member

Implemented #3965 wherein JsonNodeFeature.WRITE_SORTED_PROPERTIES can be enabled (disabled by default) to force sorting on serialization.

Replaced hack to sort properties with JsonNodeFeature.WRITE_PROPERTIES_SORTED.
Deleted several TODOs that are obsolete.
Added tests for @JsonTypeInfo. Here, the sorting is still broken.
@digulla
Copy link
Author

digulla commented Jun 19, 2023

I've updated the code with JsonNodeFeature.WRITE_SORTED_PROPERTIES and other things that were merged in the last few days.

Most things are solved. Please have a look at the remainign // TODO. Any suggestions?


@Override
public void writeNumber(double v) throws IOException {
BigDecimal wrapper = BigDecimal.valueOf(v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely changes some values, fwtw (double is 2-based, BigDecimal 10-based; lossy conversion)?
Or would that only be problematic in reverse direction (I know 0.1 etc are exact in 10-based and transcendental in 2-based; not sure if reverse cases exist).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug in BigDecimal(double) but that has been fixed with BigDecimal.valueOf(). valueOf() works correctly and always creates a correct BigDecimal, even for numbers that can't be represented with 2-based / IEEE math.

Since my code to canonicalize numbers is string based, we could maybe skip the conversion to BigDecimal but the string manipulation was ugly hack until we have a better solution.

The other option would be to use Number instead of BigDecimal in ValueToString and rename it to NumberToString. Then we could wrap the primitive in a Double instance here and both code paths would look the same plus it would also be possible to process BigInteger if someone wanted to.

@cowtowncoder
Copy link
Member

Ok, couple of notes:

First: to understand logic, it'd be good to have link to specific canonical JSON definition/description -- esp. logic on numbers. I am not sure I can reverse engineer rules from changes here.

Second some concerns:

  • Is there still need to override BigDecimal serializer? Ideally JsonGenerator override would be sufficient
  • Sub-classing of JsonMapper is not sustainable, in my opinion.

@digulla
Copy link
Author

digulla commented Jun 20, 2023

Ok, couple of notes:

First: to understand logic, it'd be good to have link to specific canonical JSON definition/description -- esp. logic on numbers. I am not sure I can reverse engineer rules from changes here.

See here: http://gibson042.github.io/canonicaljson-spec/ plus my unit tests.

In a nutshell, the standard requires to convert 10.01 into 1.001E1 - which I don't personally like; this number is hard to understand. If precision was an issue, the standard should require the opposite: 1001E-2 (i.e. no fraction and the exponent just moves the decimal point to the right place).

For my own tests, I prefer "BigDecimal.toPlainString() without trailing zeros" (i.e. 10.01, never 10.0100 and never an exponent). The problem with the trailing zeros is that the order of math operations determines whether you get them and how many, so changes in the code can break the test even though the number is mathematically the same. That's why we have to strip them for canonical output.

I saw that you had a feature to strip trailing zeros from BigDecimal which was removed.

Second some concerns:

  • Is there still need to override BigDecimal serializer? Ideally JsonGenerator override would be sufficient

Yep, you're right. I removed the module and tests still pass. Deleted the obsolete code and simplified the rest.

  • Sub-classing of JsonMapper is not sustainable, in my opinion.

In my MR, I didn't want to change existing code too much before I get your OK.

How do I get a custom JsonFactory into JsonMapper.Builder after calling builder()? It's a ctor argument.

I see three options:

  1. Collect all parameters for the mapper and make the mapper really immutable (not going to happen in 2.16, I guess. Also would be a big change).
  2. JsonMapper.Builder.build() could copy the mapper again, allowing to set a different JsonFactory at that point in time. Not sure what else that would break.
  3. Add canonicalBuilder() to JsonMapper as a factory method which returns a pre-configured builder.

@cowtowncoder
Copy link
Member

Ok, couple of notes:
First: to understand logic, it'd be good to have link to specific canonical JSON definition/description -- esp. logic on numbers. I am not sure I can reverse engineer rules from changes here.

See here: http://gibson042.github.io/canonicaljson-spec/ plus my unit tests.

In a nutshell, the standard requires to convert 10.01 into 1.001E1 - which I don't personally like; this number is hard to understand. If precision was an issue, the standard should require the opposite: 1001E-2 (i.e. no fraction and the exponent just moves the decimal point to the right place).

Thanks.

...

I saw that you had a feature to strip trailing zeros from BigDecimal which was removed.

Was this for case of JsonNode, or am I thinking something else? For that case there is/was an alternative, I think.

Ah, yes, JsonNodeFeature.STRIP_TRAILING_BIGDECIMAL_ZEROES instead of JsonNodeFactory setting.

But either way, it only applied to JsonNode handling.

If it helped, however, we could have StreamWriteFeature equivalent for jackson-core which would then apply to everything.

Second some concerns:

  • Is there still need to override BigDecimal serializer? Ideally JsonGenerator override would be sufficient

Yep, you're right. I removed the module and tests still pass. Deleted the obsolete code and simplified the rest.

Ok that makes sense. Thanks!

  • Sub-classing of JsonMapper is not sustainable, in my opinion.

In my MR, I didn't want to change existing code too much before I get your OK.

That makes sense as well.

How do I get a custom JsonFactory into JsonMapper.Builder after calling builder()? It's a ctor argument.

Right, I thought about that too. Not a simple thing...

There are now mapper.rebuild() and streamFactory().rebuild() methods, which would allow reconfiguration (they sort of fake things for 2.x, to be honest, as true immutability isn't there but...)
3.0 will change things and maybe complicate matters, as mapper and token stream factory types must match.

I see three options:

1. Collect all parameters for the mapper and make the mapper really immutable (not going to happen in 2.16, I guess. Also would be a big change).

Agreed, can't really make.

2. `JsonMapper.Builder.build()` could copy the mapper again, allowing to set a different `JsonFactory` at that point in time. Not sure what else that would break.

We could perhaps add an overload for JsonMapper.rebuild(); one that takes JsonFactory? Or, I think there has been the idea of "shape-shifting", something like JsonMapper.with(JsonFactory).

3. Add `canonicalBuilder()` to `JsonMapper` as a factory method which returns a pre-configured builder.

I wonder if another approach is necessary: changing things so that Canonical JSON helper object (whatever it is called) would actually produce and return a JsonMapper.Builder() with properly configured JsonFactory -- and allowing caller to pass JsonFactory.builder() for different configurations?
That is, caller either passes builder for JsonFactory (if it needs specific changes) or not (for default settings); and receives a JsonMapper.Builder if it wants to make further changes.

@cowtowncoder
Copy link
Member

Ok, one further thought on "Builder in, Builder out": there would be question of whether this could be applied to other format backends. I think it should be possible, probably by passing a closure, but it's probably something to experiment with.

And specifically I'd be interested in Binary JSON backends; CBOR and Smile (and possibly MsgPack, BSON). But who knows, perhaps canonicalization would work nicely for even things like Properties :)

@digulla
Copy link
Author

digulla commented Jun 21, 2023

Okay, in that case, the Spring approach to use "configurers" (i.e. an interface which consumes builders) then sounds like the best bet:

interface JsonMapperBuilderConfigurer {
    void configure(JsonMapper.Builder builder);
}

or simply java.util.Consumer<JsonMapper.Builder>. One predefined configurer would then apply the canonical setup.

The nice thing here is that a developer can define a second such configurer to "fix" certain aspects of the canonical JSON which don't work in their case. So they get all the defaults which they can then patch as needed.

It's not perfect since you need a builder which is already initialized with the correct JsonFactory but that should be fine for 2.16 and can then be fixed properly in 3.0.

@digulla
Copy link
Author

digulla commented Jun 26, 2023

Build is failing because it uses new classes that are in jackson-core 2.16 SNAPSHOT.

@digulla
Copy link
Author

digulla commented Aug 8, 2023

Any update on this one?

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.json.JsonMapper;

public class JsonAssert {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like name here; JsonAssert sounds like a class from unit test framework.
Class could use Javadoc to explain the intent.

@@ -0,0 +1,43 @@
package com.fasterxml.jackson.databind.ser;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be put under new package (not sure if databind.canonical or databind.ser.canonical).

I realize all code is under src/test and not necessarily meant to be in specific location.

@@ -0,0 +1,5 @@
package com.fasterxml.jackson.databind.ser;

public interface ValueToString<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a new type could just use Function<T, String>?

@cowtowncoder
Copy link
Member

I added some notes before realizing that I should really take more time to think of how I'd like things packaged, and then comment based on that.

So feel free to ignore comments for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants