Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymorphic [de]serialization #147

Closed
mdzaebel opened this issue Jul 1, 2019 · 141 comments · Fixed by #284
Closed

Polymorphic [de]serialization #147

mdzaebel opened this issue Jul 1, 2019 · 141 comments · Fixed by #284
Labels
enhancement New feature or request
Milestone

Comments

@mdzaebel
Copy link

mdzaebel commented Jul 1, 2019

I'd like to repeat the feature proposal from eclipse-ee4j/yasson#133 here. There should at least be a description of how to implement Serializer/Deserializer that maintain runtime types. As there are more possible strategies, they should be defined too.

@rmannibucau
Copy link

rmannibucau commented Jul 1, 2019

+1, we have this kind of implementation in Apache Johnzon as well - it is portable: https://github.com/apache/johnzon/blob/master/johnzon-json-extras/src/test/java/org/apache/johnzon/jsonb/extras/polymorphism/PolymorphicTest.java#L84

@mdzaebel
Copy link
Author

mdzaebel commented Jul 1, 2019

Yes, thats helpfull. Many implementations have a concepts with annotations or other strategies. I'd prefer also one without annotations, as this would enable a transformation without changing legacy classes. Also interesting, why did the expert group omit this feature in 1.0?

@rmannibucau
Copy link

@mdzaebel What is likely not possible is to make it transparent without any config (type=fully qualified name) so it requires to have explicit aliases for all children and whitelist the allowed children otherwise you get immediately a security issue. Therefore it is not strictly omitted in 1.0 since it is trivial to handle it in a (de)serializer. Hope it helps.

@mdzaebel
Copy link
Author

mdzaebel commented Jul 2, 2019

At first, thanks so much for your help, which is really important for us (big insurance company). E.g. I did not know about the security aspect. I'm not an expert, so could you show me a serializer, that inserts a $type attribute and a deserializer that creates the instances out of $type? Currently, the security aspect could be solved by testing an instanceof relation in our case.

@mdzaebel
Copy link
Author

mdzaebel commented Jul 2, 2019

May be JSON Binding 1.0 rather focus on the wrapping strategy, documented in https://javaee.github.io/javaee-spec/javadocs/javax/json/bind/serializer/JsonbDeserializer.html. So the usecase above seems to be difficult, but should be supported in future.

@rmannibucau
Copy link

rmannibucau commented Jul 2, 2019

@mdzaebel all the code is contained in this class: https://github.com/apache/johnzon/blob/master/johnzon-json-extras/src/main/java/org/apache/johnzon/jsonb/extras/polymorphism/Polymorphic.java - you can also import johnzon-extras module in your project and not fork it. Then just configure it as mentionned in previous comment (with the example link).

@mdzaebel
Copy link
Author

mdzaebel commented Jul 3, 2019

Great stuff. That should be sufficient. Thank you Romain!

@mdzaebel
Copy link
Author

mdzaebel commented Jul 3, 2019

With Yasson, your serializer doesn't work (Yasson Issue) e.g.

public void serialize(Dog obj, JsonGenerator generator, SerializationContext ctx) {
        ctx.serialize(new Wrapper(obj.getClass().getName(), obj), generator); // shortened
}

leads to "Recursive reference has been found in class ...", but isn't this correct, as the identical object has to be serialized in a deeper level recursively?

@mdzaebel
Copy link
Author

Roman Grigoriadi now explained, that Yasson has to be used slighty different (see Issue)

@aguibert aguibert added the enhancement New feature or request label Jul 22, 2019
@Verdent
Copy link
Contributor

Verdent commented Jul 23, 2019

This feature is definitely a must to have so I thought about it and these are my suggestions which should be covered in terms of polymorphic (de)serialization.

  1. Global property
    Every serialized instance should have one additional property with the class name. This would be disabled by default and allowed to be configured or set by a property.

  2. Annotation based
    I have not yet come up with specific annotation name but this annotation could be added to the parent type of the classes user wants to (de)serialize. This will add new property to json with the class name of the instance. It should also have possibility to set "alias" name to each sub class and should not require to set exact class names in resulting json.

  3. Polymorphic configuration
    This would be part of the configuration which would allow user to add the same configuration possibility as in 2. even for classes one doesn't control or if he doesn't want to change existing classes.

It would also be possible to specify the name of the new property in each option mentioned above.

I would also suggest two levels of how strict it should be. For example aliases are set for some set of classes and during serialization is "not mentioned" class found.

  • Exception is thrown because it was not mentioned in the aliases (default)
  • No exception is thrown and class name is set to the property instead of an alias

What do you guys think about it? Any suggestions?

@rmannibucau
Copy link

@Verdent -10000 for your "1", this is exactly the 0-day vulnerability java serialization got and jackson is still trying to fix. 2 requires to use aliases - and it is not an option - known from the server (and not class names) for the same reason. Isn't 3 a custom deserializer - so already built-in?

@Verdent
Copy link
Contributor

Verdent commented Jul 23, 2019

Yes I know about this. This would be just for those who want to use it internally. And there should be also warning that this might be potential security problem. But I wrote it here because it could be fairly simple to set up for a user.

In case of that 2. I would say that aliases are not that bad way to go. You have only several possible values and those are not invoked on the server. They are matched with corresponding class. What is wrong with that option? :-)

The 3. option was more or less 2. but in configuration way. You can create deserializer, but I was trying to find some way how to do it easier and more "automatic".

@rmannibucau
Copy link

My point on 2 was that only aliases are an option (being required and not just one option) since classname is a no go.

@m0mus
Copy link
Contributor

m0mus commented Jul 23, 2019

@rmannibucau I agree that option 1 adds a vulnerability. But if it's switched off by default and users know that enabling it opens a security hole it's fine. Basic HTTP Authentication is an even bigger security hole, but still used. The same I can say for option 2. It's more secured, but opens a vulnerability in case alias is not specified. Again, users should know about it before enabling it. Do we have other options to make it more secured?

@rmannibucau
Copy link

We dont have to provide it at all, basic is still there cause it had been possible if you follow me.
Off by default = on in prod (dont ask me how many spring dev actuators are in prod).
What is limiting with aliases? Nothing except having the meta model we work in another issue to decorate external classes (a bit like cdi metamodel but lighter) so let's do it safe?

@m0mus
Copy link
Contributor

m0mus commented Jul 23, 2019

What about defining a list of classes which we allow to serialize/deserialize in configuration. If JSONB tries to deserialize a class which is not in configuration it will throw an exception. It should solve the security issue, is it?

JsonbConfig config = new JsonbConfig()
    .serializable("com.test.model").
    .serializableAndDeserializable(com.test.SerializableClass);

// Create Jsonb with custom configuration
Jsonb jsonb = JsonbBuilder.create(config);

// Works fine
SerializableClass c = Jsonb.fromJson("some json", SerializableClass.class);

// Throws exception
NotSerializableClass c = Jsonb.fromJson("some json", NotSerializableClass.class);

// Works fine
Jsonb.toJson(com.test.model.Model);

// Throws exception
Jsonb.toJson(com.test.otherpackage.Model);

@aguibert aguibert added this to the 2.0 milestone Jul 23, 2019
@rmannibucau
Copy link

toJson can pass but the fromJson is the one which must be controlled. Your solution works.

@mdzaebel
Copy link
Author

@m0mus serializable(...) is an approach to handle security problems for polymorphic instantiation. It's only one aspect of it and should be defined within a polymorphic configuration. serializable(...) could be missunderstood as a general constraint, rather than a special one for polymorphic instantiation. I would drop duplicate methods like serializableAndDeserializable and use one method in both directions. Differences can always be defined with new JsonbConfig instances and should not occur often. So I would prefer a whitelisting approach like:

/** Polymorphic [de]serialization */
public JsonbConfig polymorphic(Polymorphic polymorphic);

Where Polymorphic could be defined like e.g.:

class Polymorphic {
   String typeAttribute="@type";
   Function<Class<?>, String>stringResolver; // String alias of typeAttribute 
   Function<String, Class<?>>classResolver;  // Deserialzation of alias value
   Strategy strategy;                        // How to serialize class types
   /** Classes/interfaces of which subclasses/implementations are handled:
     * Serialization: typeAttribute added.
     * Deserialization: Instantiation by typeAttribute */
   Class<?>[] classes;
}

This solves the security issue and the polymorphic configuration.

The proposals of @Verdent have to be decided. I would support them completely. It might be helpful to analyse the concepts already used by other implementations.

@rmannibucau
Copy link

jsonio used kinf of that and it failed in some of our project cause the lack of evolutivity in time (too rigid) + payloads where too big. Adapters/deserializers enable to be more clever and bypass type attributes IMHO since they can identify the type per attributes directly and not change the shape of the serialization which breaks js clients for example. So I'd prefer a lighter solution/abstraction if generic and not adapter based and not something against js. So at the end it is just providing a default serializer/adapter impl so let s keep it simple without new concept maybe? The whitelist is still needed though.

@mdzaebel
Copy link
Author

mdzaebel commented Aug 1, 2019

Automatic type detection via attributes could also be a candidate for the "strategy". However, in many cases it's not sufficient, as attributes/values might be the same for different instance types.
The concept has to be evolutive, thats an important point (what do you mean by "too rigid"?). E.g., if classnames change, it should be possible to use older and newer formats, which would emphasize the necessity of a strong alias concept. For internal communication, payload is not a problem but of course, JS-requirements have to be solvable too. Why does a type attribute break JS clients?

@rmannibucau
Copy link

@mdzaebel there is no interoperability standard at all (in the sense "commonly used standard") so if we change the shape of the payload then the clients will not consume it right (missing wrapper, custom attribute). It is likel producing a record following a schema without control. Therefore it must only be activable but never activated without a stringly explicit flag (annotation, no config probably).

@mdzaebel
Copy link
Author

mdzaebel commented Aug 2, 2019

@rmannibucau Why no config? This would be more dynamic, while annotations are fixed at classes/methods/parameters. Each strategy has it's own settings, that have to be configurable by preferably one polymorphic configuration.

@mkarg
Copy link
Contributor

mkarg commented Aug 22, 2019

@rmannibucau Just like Yasson, also Johnzon 1.1.13 fails at the same location:

public void serialize(Dog obj, JsonGenerator generator, SerializationContext ctx) {
ctx.serialize(new Wrapper(obj.getClass().getName(), obj), generator); // shortened
}

What actually happens is a Stack Overflow due to endless recursion:

Caused by: java.lang.StackOverflowError
	at org.apache.johnzon.mapper.MapperConfig.findObjectConverter(MapperConfig.java:186)
	at org.apache.johnzon.mapper.MapperConfig.findObjectConverterWriter(MapperConfig.java:175)
	at org.apache.johnzon.mapper.MappingGeneratorImpl.doWriteObject(MappingGeneratorImpl.java:153)
	at org.apache.johnzon.mapper.MappingGeneratorImpl.writeObject(MappingGeneratorImpl.java:111)
	at org.apache.johnzon.jsonb.serializer.JohnzonSerializationContext.serialize(JohnzonSerializationContext.java:41)
	at org.apache.johnzon.jsonb.extras.polymorphism.Polymorphic$Serializer.serialize(Polymorphic.java:64)
	at org.apache.johnzon.jsonb.JsonbAccessMode$WriterConverters.lambda$new$0(JsonbAccessMode.java:831)
	at org.apache.johnzon.mapper.MappingGeneratorImpl.doWriteObjectBody(MappingGeneratorImpl.java:306)
	at org.apache.johnzon.mapper.MappingGeneratorImpl.writeValue(MappingGeneratorImpl.java:463)
	at org.apache.johnzon.mapper.MappingGeneratorImpl.doWriteObjectBody(MappingGeneratorImpl.java:345)
	at org.apache.johnzon.mapper.MappingGeneratorImpl.doWriteObject(MappingGeneratorImpl.java:167)
	at org.apache.johnzon.mapper.MappingGeneratorImpl.writeObject(MappingGeneratorImpl.java:111)

Apparently even latest Johnzon tries to wrap again and again and again as soon once it serializes the wrapper itself but then sees the wrapper's payload and jumps into the polymorphic deserializer again. Romain, can you please decide whether this is a bug in Johnzon or simply a missing rule in the JSON-B spec?

@rmannibucau
Copy link

@mkarg it looks expected and normal with current spec - thought throwing a JsonbException("invalid") could have been saner. This code works using annotations and not global converters which leads to that. We likely don't want to skip serializer/deserializer/adapter in SerializationContext. If you set your impl globally (on jsonbconfig), it must likely have the standard threadlocal to prevent the loop if possible. That said, this use case does not fit serializer API which is really about how to serializer, it fits adapter API which is about changing the shape of a model then the new shape can have a serializer if you want I think.

@amoscatelli
Copy link

@rmannibucau
Ok I just verified that Yasson doesn't work with our solution.
First problem is that it loops.
If you call ctx.deserialize(Child1.class, parser) the same deserializer will trigger.
:(

@rmannibucau
Copy link

@amoscatelli that's what I thought so let's explicit it in the spec then we should be cleaner :)

@amoscatelli
Copy link

Ok, can we also be more explicit regarding the JsonParser instance in the spec ?
I want to have something more stating that's a bug.

Also, I would like to give Johnzon a try (I am using Wildfly).
Does Johnzon support generics, type variables, parameterizedtypes at the current state ?

@rmannibucau
Copy link

@amoscatelli agree for jsonparser but can be worth another issue/PR.

about johnzon: it has some typevariable support even if not 100% complete and parameterized type support yes.

@Verdent
Copy link
Contributor

Verdent commented Mar 5, 2021

@amoscatelli I am currently rewriting and improving Yasson serialization and deserialization more or less from scratch and I am also adding support for polymorphism (so far implementation specific, but it will change later possibly) You will be able to use config and also annotations to enable polymorphism and customize it as desired, so you do not need to write anything extra by yourself and Yasson will do that for you. Hopefully I will be able to create the first RC soon.

@rmannibucau
Copy link

@Verdent we still need a portable solution since org.yasson internal imports are rarely desired when you use a spec which is primarly intended to write vendor independent apps. Can you ensure yasson works with previous code?

@Verdent
Copy link
Contributor

Verdent commented Mar 5, 2021

@rmannibucau By previous code you mean code which users created before my rework? Absolutely, only classes from internal packages are changed/removed. And I am currently even passing all of the tests (without the need to change them) and TCKs so I would say it will not break any previously created user code.

I completely agree here that implementation specific imports are usually not what users what to have and as soon as it is described in the spec, it will be changed to support annotations and interfaces from the API. If you want, I can describe here in more details how I am implementing it there and we can have a talk here if it is how we want to add it into spec or what needs to be changed.

@rmannibucau
Copy link

@Verdent this one #147 (comment)

@Verdent
Copy link
Contributor

Verdent commented Mar 5, 2021

@rmannibucau Yes I have to test that, but I would say it will work as expected. I see no reason why it shouldn't with the new version. From my point of view it might make sense to even fix that in the current version, but I will have to go though the code first. Perhaps only line ensureNext(parser, START_OBJECT); would fail since the parser will already be on the desired event. I will test that when I am back home.

@rmannibucau
Copy link

@Verdent it is one of the challenge if deserializers, if used at root it must be at start_object but then in nested objects it is not (but it is another issue than this one).

@Verdent
Copy link
Contributor

Verdent commented Mar 5, 2021

@rmannibucau But your deserializer is used at root object in this case so parser should be at START_OBJECT upon entering your deserializer. That is why I am mentioning that calling ensureNext with START_OBJECT is probably incorrect there in this particular case.

Anyway with that mentioned polymorphism support, we do support handling of the mentioned {\"type\":1,\"name\":\"first\",\"uno\":true,\"duo\":true} if you make proper configuration. In case of deserialization when custom deserializer is used to implement polymorphism, we do not support this kind of behavior yet.

@amoscatelli
Copy link

@Verdent thank you. Please let's just stick with official interfaces, please don't assume the use of any internal implementation. Frankly, I don't think Yasson should include its own JSONP implementation, let's not reinvent the wheel. Even if you have to, do not assume its utilization. Jsonb and JsonP should talk to each other just using the spec interfaces. Imho org.eclipse.yasson.internal.JsonbParser should never been there, if you have a state, it should have been kept in the Yasson DeserializationContext implementation.

DeserializationContext should accept ANY Parser created via JsonParserFactory#createParser(JsonObject).

This alone would be a step further.

Now, if you have in mind something to support Polymorphism please describe it here. I created the first topic about supporting polymorphism and it was three years ago. I believe it's important to bring it to the spec as soon as possible as we don't want to use any implementation specific API.

Johnzon has already its own Polymoprphism support so maybe we can extract a good abstraction from both of them and make it for the next spec release.

As I already stressed out, other specifications are going to use Jsonb also, and they'll need good polymorphism support.
jakarta.nosql spec will probably (and the actual implementation actually do so) use jsonb to serialize java objects for the KeyValue abstraction.

Again, thank you.

@amoscatelli
Copy link

@Verdent also, is the 2.x branch of Yasson the rework you are talking about ?

@Verdent
Copy link
Contributor

Verdent commented Mar 5, 2021

@amoscatelli I am not adding anything like JSON-P implementation. Why would I do that? :-)

Yeah I have removed org.eclipse.yasson.internal.JsonbParser, I could not get my head around why it was there in the first place.

Actually I have already described it here. But I changed it a bit in form of structure now, but the idea is the same. I guess I can make a new full description of how I feel it should look like :-)

No the reworked version will be 3.x. It is not there yet.

@m0mus
Copy link
Contributor

m0mus commented Mar 5, 2021

Folks, first of all thanks for your feedback and interest in evolving JSONB spec. It's great and I appreciate it. Let me clarify what's going on. As @Verdent mentioned already, he is working on new version of Yasson. It's a huge rework. The new version has better internal design and it's much faster than the current version. This work is in progress, but he is close to present a draft. When it's ready, he will create a new branch in Yasson project and push his work there, so all of you can review it. The new version has deserialisation support and some other new features such as externalised configuration, etc. We are trying to use implementation first approach. If these features are well accepted by the community, we will add them to the next version of JSONB spec. We plan to do it for Jakarta EE 10.

@rmannibucau
Copy link

@Verdent how does a deserializer know it is in an array or object when set on a list then? Thisbis the main issue skipping events, deserializers handle the struct behind the equivalent struct.

@Verdent
Copy link
Contributor

Verdent commented Mar 6, 2021

@rmannibucau What I am trying to say is that you are making the check which would fail in your example since parser should be on that event already. I am talking here about this particular JSON "{\"type\":1,\"name\":\"first\",\"uno\":true,\"duo\":true}" and the check in moveToType method ensureNext(parser, START_OBJECT);. You should be already on that first event. No matter what it is since implementation should check the first evet if it is not NULL_VALUE. In case of null value this custom deserializer will not be executed.

But I feel like we are getting out of the scope of this issue here.

@rmannibucau
Copy link

@Verdent I understood - and agree we can move it to another issue - but checking null does not mean not serving the full model to the deserializer (pushback logic), other impl lead to broken deserializers.

@mkarg
Copy link
Contributor

mkarg commented May 12, 2021

Is there already a public proposal how to user's view on polymorphism is like, e. g. what annotations to set etc?

@Verdent
Copy link
Contributor

Verdent commented May 12, 2021

Not yet. But I am currently making one :-)

@mkarg
Copy link
Contributor

mkarg commented May 29, 2021

OpenJDK 17 comes with sealed classes, including an extension to the Reflection API which allows to compute a complete hierarchy of possible subclasses at runtime. It would be great if JSON-B follows the "feels-like-java" approach and automatically provides polymorphism for all sealed classes without the need for any additional annotations in that case (CoC). That would be terrific!

@rmannibucau
Copy link

-1, polymorphism is about modifying the serialization shape to add a discriminator with wrapping the payload or not. Sealed classes are not only about this use case - most other trivial one is about pattern matching only, no subattribute so no discriminator in terms of shape. I'd stick on the standard solution which enables the user to decide.

@t1
Copy link

t1 commented May 30, 2021

+1 Sealed classes lend themselves perfectly for polymorphic (de)serialization.

We'd need to answer a number of questions. How do we derive the discriminator from the class name? Maybe just the simple class name? How can I override it? Maybe with an annotation @PolymorphicTypeName. And what field do we use to discriminate? type is probably a good default; what happens if there is already a type field? How can I specify a different one? When deserializing, how can I ignore, e.g., messages with an unknown type? Catching the exception is probably too slow if this happens often, or extremely cumbersome when deserializing collections or streams. Maybe by returning null or by skipping elements if it's a collection or stream, if a config option failOnUnknownPolymorphicType is set to false? Many more questions will probably pop up.

And I think that users may wonder why it works when serializing some classes but not on others, and there's no error message that could explain this (only the type field is missing). So maybe we should make this explicit with an annotation like @Polymorphic?

If we later choose to add support also for non-sealed classes, we can reuse all the annotations and conventions defined for the sealed classes.

@rmannibucau
Copy link

@t1 the opposite will happen, polymorphism is under work, sealed classes will just work as others.

@Javaexperter
Copy link

Javaexperter commented Dec 26, 2021

Using the full class name from the (Serializer) JSON is a serious security issue. An attacker could make you deserialize an arbitrary class, and some classes can cause remote code execution.
How I can have just the name for the class. like so?

[
{
"className": "Square",
"shape": {
"side": 2.0
}
},
{
"className": "Circle",
"shape": {
"radius": 5.0
}
}
]

I usesd switch but not working. can someone help me

@bmarwell
Copy link

Using the full class name from the (Serializer) JSON is a serious security issue. An attacker could make you deserialize an arbitrary class, and some classes can cause remote code execution. How I can have just the name for the class. like so?

[ { "className": "Square", "shape": { "side": 2.0 } }, { "className": "Circle", "shape": { "radius": 5.0 } } ]

I usesd switch but not working. can someone help me

Please ask on Gitter, stack overflow etc (or open a question or discussion). Hijacking issues with new topics is not the way to go.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.