Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON/YAML: add `use_discriminator` #8406

Merged
merged 3 commits into from Nov 14, 2019

Conversation

@asterite
Copy link
Member

asterite commented Oct 30, 2019

Note: This is a PR that also serves as an RFC. I decided to go the PR route because I wanted to provide code that proves that this is possible, and show how it could be done.

What

The idea of this PR is to provide a way to deserialize a group of objects based on a field. This is very typical in JSON APIs. For example:

[
  {
    "type": "point",
    "x": 1,
    "y": 2
  },
  {
    "type": "circle",
    "x": 3,
    "y": 4,
    "radius": 5
  }
]

Here the parser expects to decide what properties each object has based on the value of the "type" property.

There's no nice way to do this. It's definitely possible but since this use case is so common it would be great to provide built-in support for it.

With this PR it would work like this (it already works ga:

require "json"

abstract class Shape
  include JSON::Serializable

  use_discriminator "type", {
    point: Point,
    circle: Circle,
  }

  property type : String
end

class Point < Shape
  property x : Int32
  property y : Int32
end

class Circle < Shape
  property x : Int32
  property y : Int32
  property radius : Int32
end

shapes = Array(Shape).from_json(%(
[
  {
    "type": "point",
    "x": 1,
    "y": 2
  },
  {
    "type": "circle",
    "x": 3,
    "y": 4,
    "radius": 5
  }
]
))
pp shapes

Output:

[#<Point:0x109b16dc0 @type="point", @x=1, @y=2>,
 #<Circle:0x109b20e70 @radius=5, @type="circle", @x=3, @y=4>]
@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Oct 30, 2019

To me this seems too advanced for the standard library. This PR only provides a very basic implementation for such a discrimination feature. There can be quite more complicated cases than just a 1:1 mapping from an identifier in a single property to a type. Identifiers can be more complex values, or split over multiple properties and you might need handling for undefined values, etc. It feels like if we start providing this, it will inevitably need to be enhanced more and more because new use cases ask for new features.

In my opinion the standard library should have a basic implementation of a JSON mapping. And that's good as it stands (obviously improvements can be made here and there). But it's better to keep it at some basic level without fancy features that may be desirable for some use cases but add lots of complexity to the base implementation.

@lribeiro

This comment has been minimized.

Copy link

lribeiro commented Oct 30, 2019

I like it.
Is there a way to leverage @type.subclasses,
And do convention over configuration here?

So in the simplest case we could define the discriminator just with?

@[JSON::Discriminator]

Nirvana would be going a step further and having

include JSON::Serializable 

Automatically select the field type if availability to be the discriminator.

Don't like the name, but nothing better comes to mind

@wontruefree

This comment has been minimized.

Copy link
Contributor

wontruefree commented Oct 30, 2019

This is a cool idea I like it. I have not used this feature before.

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Oct 30, 2019

I propose just this because the only time people wanted to do a custom deserialization was because of a discriminator. Maybe we can explore existing JSON APIs, or someone can provide the more complex cases that this simple discriminator doesn't cover.

@RX14
RX14 approved these changes Oct 31, 2019
Copy link
Member

RX14 left a comment

I love the basic concept. This is indeed a common problem, and one which should be able to be solved inside the stdlib.

{% unless mapping.is_a?(HashLiteral) || mapping.is_a?(NamedTupleLiteral) %}
{% raise "Argument to JSON::Discriminator must be a HashLiteral ot NamedTupleLiteral, not #{mapping}" %}
{% end %}
any = JSON::Any.new(pull)

This comment has been minimized.

Copy link
@RX14

RX14 Oct 31, 2019

Member

The best way to do this is to do

data = IO::Memory.new
JSON.build(data) do |builder|
  builder.start_object
  pull.read_object do |key|
    if key == {{discriminator.id.stringify}}
      discriminator_value = pull.read_string
    end
    builder.field(key) { pull.read_raw(builder) }
  end
  builder.end_object    
end
data.rewind

...

This comment has been minimized.

Copy link
@jhass

jhass Oct 31, 2019

Member

In theory we could optimize further by building a IO::Union (basically extract the IO concatenation feature from ARGF). Then break the above loop as soon as the discriminator is found and pass a JSON::PullParser.new(IO::Union.new(data, pull.io)) to the child.

This comment has been minimized.

Copy link
@asterite

asterite Oct 31, 2019

Author Member

@RX14 I think your way is the best way to do it because we retain the original JSON string. My way of using JSON::Any is a bit broken with, for example, large numbers.

This comment has been minimized.

Copy link
@RX14

RX14 Oct 31, 2019

Member

@jhass surely concatenation would be a better term for this operation? IO.concat(*ios)?

Can be a later optimization, though.

This comment has been minimized.

Copy link
@jhass

jhass Nov 1, 2019

Member

Sure, don't feel strong about the naming.

This comment has been minimized.

Copy link
@straight-shoota

straight-shoota Nov 1, 2019

Member

I suppose the discriminator field is likely the first one 80% of the time. In that case, optimization would be pretty simple, because we could essentially just delegate the pull parser to the subtype after the discriminator field has been consumed. There only needs to be some way to skip/repeat the start object event, but that's doable.
This completely avoids the entire parse-build loop.

@Blacksmoke16

This comment has been minimized.

Copy link
Contributor

Blacksmoke16 commented Oct 31, 2019

Adding my thoughts from gitter.

I would rather see it be at the type level, it's feature of the type, not a specific property. You could even implement it where it would default to type if one is not provided. I don't think the argument of "you won't forget adding the property" is enough. A compile time error could be raised if you forget it.

@vlazar

This comment has been minimized.

Copy link
Contributor

vlazar commented Oct 31, 2019

Names like serialize or stringify are quire common, how about JSON::Typeize or better yet JSON::Typify?

@watzon

This comment has been minimized.

Copy link

watzon commented Oct 31, 2019

Doesn't seem to work with this example:

module Types
  abstract class Base
    include JSON::Serializable

    @[JSON::Discriminator({
      updateAuthorizationState: Update::AuthorizationState,
      authorizationStateWaitTdlibParameters: AuthorizationState::WaitTdlibParameters
    })]
    property type : String
  end
  # authorizationState
  class AuthorizationState < Types::Base
  end

  # update
  class Update < Types::Base
  end

  # updateAuthorizationState
  class Update::AuthorizationState < Types::Update
    property authorization_state : Types::AuthorizationState
  end

  # authorizationStateWaitTdlibParameters
  class AuthorizationState::WaitTdlibParameters < Types::AuthorizationState
  end
end

json = <<-JSON
{
  "type": "updateAuthorizationState",
  "authorization_state": {
    "type": "authorizationStateWaitTdlibParameters"
  }
}
JSON

pp Types::Base.from_json(json)

Rather than being parsed as Update::AuthorizationState(authorization_state: AuthorizationState::WaitTdlibParameters) it ends up being Update::AuthorizationState(authorization_state: AuthorizationState)

@j8r

This comment has been minimized.

Copy link
Contributor

j8r commented Oct 31, 2019

Agree with @straight-shoota.
This feature will be implemented, then others too for JSON and perhaps SOAP/XML, etc.
The stdlib won't be a base set of utilities anymore.

What we definitely need is a Crystal serialization framework for this kind advanced stuff 🙏

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Oct 31, 2019

I think I found a way to do this without an annotation and without modifying the existing code base (well, just slightly to fix something else). We'll see.

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Oct 31, 2019

@watzon Could you provide working code that I can compile and run? I did some tests and all seem to work fine. I don't know what's the full hierarchy you are working with and what's the input JSON you are trying it with. It doesn't have to big an entire hierarchy, just enough to reproduce the problem.

@watzon

This comment has been minimized.

Copy link

watzon commented Oct 31, 2019

@asterite updated with a working example. Here's the runnable code https://carc.in/#/r/7xd0

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Oct 31, 2019

@watzon AuthorizationState is not abstract plus it doesn't have a Discriminator field, so the JSON serializer will deserialize it as usual. You'd have to say that this type should also use a discriminator to deserialize. In the final code that I have you would solve this by calling use_discriminator on those nested types too.

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Oct 31, 2019

It also seems authorization state is abstract but doesn't derive from any other type: https://core.telegram.org/tdlib/docs/classtd_1_1td__api_1_1_authorization_state.html . So I think it would make sense to use a discriminator just there, and avoid having that Types::Base type. Or is there such a thing in the telegram API where type can be any object in the system?

@watzon

This comment has been minimized.

Copy link

watzon commented Oct 31, 2019

Ahh ok. So make it abstract and add the discriminator to it as well?

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Oct 31, 2019

Yes, though I'm not sure it will work with the code in this PR. But it does work with the code I have in my machine.

@watzon

This comment has been minimized.

Copy link

watzon commented Oct 31, 2019

Hmm yeah, just making AuthorizationState abstract breaks things
https://carc.in/#/r/7xdb

@asterite asterite changed the title [WIP] JSON::Discriminator JSON/YAML: add `use_discriminator` Oct 31, 2019
@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Oct 31, 2019

I pushed a new commit with a totally different API. I also updated the original PR's text.

Now the PR introduces a use_discriminator macro where you pass the key and mapping. This in turn redefines the new method to deserialize based on the discriminator.

I believe this way is superior to the annotation-based one I proposed before because:

  • No need for a new annotation
  • It's unobtrusive: it will redefine a method to behave in a different way (the old code modified existing code)
  • It serves as an example for people wanting to do custom deserialization: just look at what use_discriminator does (and there's no need to use a macro in all cases, but here's it's convenient)

I also added docs and examples.

As for adding this or not, I believe this pattern is very, very common in the wild. For example it's used in GeoJSON. @watzon already showed it's used in Telegram's tdlib API. It's so common that OpenAPI 3.0 (formerly Swagger) supports it too . I'm sure there are other examples, if you know of them please let me know to support merging this PR 🙏

@wontruefree

This comment has been minimized.

Copy link
Contributor

wontruefree commented Nov 1, 2019

What about using VARY or VARY_ON kind of like a Vary header.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary

require "json"

abstract class Shape
  include JSON::Serializable

  JSON::VARY_ON "type", {
    point: Point,
    circle: Circle,
  }

  property type : String
end

class Point < Shape
  property x : Int32
  property y : Int32
end

class Circle < Shape
  property x : Int32
  property y : Int32
  property radius : Int32
end
@kostya

This comment has been minimized.

Copy link
Contributor

kostya commented Nov 1, 2019

use_discriminator in class text not so informative when you read code, may be json_discriminator. and also would it work when json and yaml together?

@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Nov 1, 2019

Maybe a better name would be def_json_discriminator similar to all the other def_ macros for defining methods.

would it work when json and yaml together?

Yes, they generate completely different method overloads.

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Nov 2, 2019

I agree about choosing a better name. But usually def_x defines an x method (def_hash, etc.) But I'm open to suggestions. use_json_discriminator is good too.

@watzon

This comment has been minimized.

Copy link

watzon commented Nov 3, 2019

@asterite with the recent change to using use_discriminator I'm getting an error:

Invalid memory access (signal 11) at address 0x7ffc4d469f18
[0x7fbb22891fc6] ???
[0x7fbb227e7f4e] ???
[0x7fbb2365f461] ???

with my massive hash. Any idea why that might be? Here is the class containing the hash, and here is where I'm trying to parse the JSON.

Edit: I deleted all but 76 of the entries in the hash and it stopped crashing. No idea why. Now I'm getting another error:

In /home/watzon/.asdf/installs/crystal/0.31.1/share/crystal/src/json/from_json.cr:11:38

 11 | def Object.from_json(string_or_io) : self
                                           ^
Error: method must return Proton::Types::RichText::Plain but it is returning Proton::Types::Base+
@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Nov 3, 2019

@watzon Two things:

  • I tried compiling the project but got:
    In src/proton.cr:4:1
    
     4 | require "./core_extensions/*"
         ^
    Error: can't find file './core_extensions/*' relative to '/Users/asterite/Sandbox/proton/src'
    
    I guess you have core_extensions locally to override how JSON works. But this will not work because in JSON::Serializable there are inherited and included hooks and there's no way to replace this. Maybe this is why you are getting a strange crash?
  • I removed that require line and used bin/crystal from the compiler at this branch, and that code compiles and runs fine.

Now I'm getting another error:

For this you need to use_discriminator in intermediate types, like Proton::Types::RichText, otherwise I think in your case it's using the parent's discriminator (which it shouldn't, but it does because you are trying to reopen and change JSON::Serializable which won't work).

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Nov 3, 2019

@kostya

and also would it work when json and yaml together?

You can do:

class SomeClass
  JSON::Serializable.use_discriminator
  YAML::Serializable.use_discriminator
end

I chose use_discriminator because in most cases you won't have both serialization strategies and use_discriminator is short and straight-forward. But I agree that a name that include json or yaml is better.

Maybe def_new_using_json_discriminator, etc.

@RX14
RX14 approved these changes Nov 13, 2019
Copy link
Member

RX14 left a comment

❤️

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Nov 13, 2019

Cool!

I think I'll rename them to use_json_discriminator and use_yaml_discriminator to make it clearer they are related to JSON/YAML, and also to avoid collisions in case you want to use both (though you could always do JSON::Serializable.use_discriminator, etc.)

`use_yaml_disciminator`
@asterite asterite added this to the 0.32.0 milestone Nov 14, 2019
@asterite asterite merged commit ef10db1 into crystal-lang:master Nov 14, 2019
5 of 6 checks passed
5 of 6 checks passed
ci/circleci: test_darwin Your tests failed on CircleCI
Details
ci/circleci: check_format Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32_std Your tests passed on CircleCI!
Details
ci/circleci: test_preview_mt Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@asterite asterite deleted the asterite:feature/json-discriminator branch Nov 14, 2019
@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Nov 14, 2019

In chat, @yxhuvud mentioned that all the examples (and specs) use a common superclass for the different choices. If that was necessary it might be a limiting factor for being useful in practice.

But the implementation should really work without an inheritance relationship between the individual types You just need any type to define the discriminator method on, the mapping types don't need to be related.
Maybe there should be an example in the docs and/or a spec for that?

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Nov 14, 2019

It's a good point, but I can't see how it would work without at least some inheritance relationship.

Can you provide an example, together with code for serializing and deserializing?

For example if we take the Point and Circle example, let's say I have an object with a property of that type... how can I talk about these types without enumerating them? I need a supertype. So for example if I do Array(Shape).from_json I would expect that to return Array(Shape) not an Array(Point | Circle)... if it's the latter then it'll be incompatible when assigning it to Array(Shape) if those types are not Shape themselves.

Maybe you'd like the base type to be a module? That could work, but it currently doesn't. But PRs are welcome!

@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Nov 14, 2019

This kind of discrimination is completely independent of type relationships. It's common to implement this using inheritance, but that's actually not really required.

In fact, the discriminator implementation doesn't need to be tied to a single type's from_json/from_yaml deserializer. It can just be implemented at runtime and you can call it like JSON::Serializable.discriminate(pull, "type", {point: Point, circle: Circle}). This method would just return Point | Circle and these types don't need to have a shared ancestor.

It's trivial to change the implementation to runtime. There is actually no real need for a macro.
straight-shoota@581d0ad?w=1

This is a little change that gives more options. You can use discrimination rules for different context without having to declare a common supertype for each discrimination type per context.

@asterite

This comment has been minimized.

Copy link
Member Author

asterite commented Nov 14, 2019

Nice! Well, if it works equally fine you can send a PR. I'm still not exactly sure it will always work fine at runtime, but we'll see.

@straight-shoota

This comment has been minimized.

Copy link
Member

straight-shoota commented Nov 14, 2019

The heavy lifting is actually all happening at runtime. The only difference the macro makes is to unfold the mappings so the keys are matched in a case vs. a lookup using #[]? at runtime. When supplying a NamedTuple, it results in exactly the same kind of unfold.

straight-shoota added a commit to straight-shoota/crystal that referenced this pull request Nov 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.