New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul ValuesJsonConverter #109
Overhaul ValuesJsonConverter #109
Conversation
Seems to have a slight memory and allocation bonus
There is probably a little more maintainability (and maybe in some performance) once a lot of the switching from |
|
||
foreach (var type in thisAssembly.ExportedTypes) | ||
{ | ||
var typeInfo = type.GetTypeInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if someone has a custom type? There are devs who have extended Schema.NET with custom schema types built on top of the ones in this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is a fair call. The old converter did support any type in the "Schema.Net" namespace so if they added any types in a different assembly to that namespace they would be missed by that type caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going through how the old support worked, it is a little interesting. Technically it would only support custom types if the generic arguments to Values<>
or OneOrMany<>
listed that type specifically. That is, the custom type wouldn't work if an interface or any parent type was used instead.
There was a loophole (I say loophole as I'm not sure it was intended this way) but if you added your type to the namespace "Schema.NET" and put the assembly name in the @type
property (eg. "MyFancyType, MyAssembly"), it would load your custom type. I didn't pick that up at first as it just seemed like that code was just for loading built-in types. It does however work as a good security measure by requiring the namespace as the idea of loading ANY type and instantiating it could be a big security problem (some poorly written types might actually do work in their constructor or at least be very allocate-y).
So the code as it stood when you saw it supported the former - if a concrete type is used as a generic argument, it would correctly deserialize to it. To have a concrete type as the generic argument is a bit of a chicken-and-egg scenario as to have a custom property with your concrete type requires a concrete type.
What I've done in 536a53a is extend support to allow the latter - if a type is in the "Schema.NET" namespace, it will look the type up the old way. I've kept the cached type lookup as a means to mitigate some allocations from string concatenation and likely are still faster lookups from the dictionary (though I might try a benchmark testing that specific theory out).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I have never tried adding custom types myself but I have seen a few devs mention that they do so, so it would be good to continue to support them.
If we could extend support and lift the limitation on the Schema.NET namespace, I think that would be a nice improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, happy to try and make support better for custom types though want to be careful we don't start instantiating any-old type otherwise it might open up security issues as we are deriving the type from the payload - might need to look into this more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long we check that types inherit from Thing
, I don't think there should be any security issues.
} | ||
else | ||
else if (targetType == typeof(Uri)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if Guid
should also be here, if we are checking a bunch of types? Not that schema.org has any Guid types (that I know of) but people might want to create custom types with Guids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, GUID might be a good one to add and wouldn't be hard to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in d4f08f5 including test.
Looks like AppVeyor is having some issue with it's Ubuntu image which is why that is failing. Having 2 CI servers is really paying off. Switching from Linq to foreach is all good for performance. That was never a major concern for me in the beginning but I guess it should now be now that the project has matured a little. There are a lot of scary looking changes in the converter. As long as the tests pass though and we can get the new tests passing, all is well. A few questions inline. |
Implicit external types refers to properties where `Values` or `OneOrMany` refers to a shared interface and not the concrete type. Types must be assignable to `IThing` and will need the full namespace and assembly.
@nickevansuk - with OpenActive.NET, I noticed you've got a lot of your own types etc as well as your own converters. Is there anything I can do in Schema.NET through this overhaul of the For example: Would you find it easier to specify custom types concretely (eg. |
Overriding WriteObject shouldn't be concerned if it is a single item or an arrray.
The duration string for timespans is handled natively in ValuesJsonConverter.
I've done a few more tweaks to the reading support, adding Another change is the handling of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting very good. What's left to do?
Thank you. Now to look at your other PR's and update mine. |
@Turnerj I'm really sorry I totally missed your message above as I was on annual leave at the time - it must have got buried in the notifications. Only just stumbled across this now!
Looking at this PR, it looks like it will support JSON of the following form for external types: {
"@type":"ExternalSchemaModelSharedNamespace, Schema.NET.Test",
"name":"Property from Thing",
"myCustomProperty":"My Test String"
} I just wanted to quickly make the point that the types we're using are all defined in a JSON-LD namespace. So for OpenActive, we have a defined vocabulary https://openactive.io/ns, which is akin to the https://schema.org/ vocabulary (and in fact builds on top of it). As you can see from that link above, the terms in the vocab are defined by the OpenActive specifications, and there are libraries similar to OpenActive.NET in other languages too (PHP and Ruby currently). We also have a validator and test suite that checks for conformance. All this is to say we can't change the JSON that we're outputting - we'll need to output e.g. This is likely to be the same requirement as anyone else who maintains a custom vocab on top of schema.org, as they're likely adhering to the same conventions. Does that make sense? Apologies if I've missed something, and sorry again for missing your message back in December! We're looking to bump our dependency version of Schema.NET soon to stay up-to-date, so will take a look into the great work you've been doing here in more detail when we come to that. Very much appreciate you considering our use case! |
Ahhhh yeah, responding just a little after I messaged you. 😂 Sometimes we simply didn't know what So the next thought might be to use Rather than keep this limitation in place (and because the core Schema.NET types were no longer going to use In the comment you've quoted from me, there is another option I proposed:
In hindsight, while that wasn't necessarily a bad suggestion, even that would have been a little cumbersome as libraries don't have an entry point to initialise something like that - you likely would have needed to have a static constructor trigger it. Thinking through this again now, we could do something a little different to better support everything and especially take into account things like vocabularies defined in the Currently (from this PR), we maintain a type dictionary we look up for Schema.NET types. If we open the cache up to hold more than built-in types and use the vocabulary as part of the key (to support conflicting names for types in different vocabs), we could do an all-assembly type search for the specified type and effectively replace our
The magic part still needs work as there doesn't seem to be a nice way to discover this without instantiating the object and even then, OpenActive (for example) doesn't actually set the Context object on instantiation to be anything but "https://schema.org" set from Schema.NET's instantiation of All this said, it is probably only worth doing if implementations extending Schema.NET (like OpenActive.NET) use our |
To summarise - while yes we effectively opened the door for "MyNamespace.MyType, MyAssembly" to be supported, it does have a lot of drawbacks limiting its use. To have a more comprehensive way to find concrete types, we need to be able to know the vocabulary through reflection, ideally through a method that doesn't require instantiation - so maybe an attribute. If we did this, and OpenActive.NET used it, we could effectively load any compatible type that is in memory while also taking into account any name conflicts (eg. like how OpenActive.NET and Schema.NET both have an |
Interesting - ok so some background on naming conflicts: the The It would be bad modelling practice for OpenActive to define a distinct term with the same name as something already in schema.org (you'll notice that https://openactive.io/ns/ does not include This makes conflict resolution straightforward (and likely the same for other similar libraries), as it's just that preference is given to the extending lib. |
Sharing the I haven't yet had chance to catch up with the rest of progress here, but it sounds like the cache has created efficiencies in deserialisation, which is great! Just thinking aloud, building on your suggestion above: If every library that inherited from schema.org also extended the SchemaSerializer class (e.g. "OpenActiveSerializer"), an additional cache could be set in the class and passed up to its base with the deserialisation call. Logic in the So more concretely, this code could run in OpenActive.NET, and then the resulting OA-specific BuiltInThingTypeLookup passed up: Schema.NET/Source/Schema.NET/ValuesJsonConverter.cs Lines 23 to 34 in 5310072
Another option (which I've not validated for performance): to get around the registration issues you noted in your comment (which otherwise sounds like a great solution!) could the code above not run in Schema.NET to check all assemblies for anything that inherits from JsonLdObject (not all our classes inherit Thing), and add them to the cache? The preference in the cache (as previous comment) would mean that just the leaf classes of the same name get stored in the type tree (so OpenActive.NET.Action would be referenced over Schema.NET.Action, as it is derived from it)? This might not be as efficient as the pre-populated cache plan above however... The latter suggestion would also more easily allows for library users to add custom properties from their own namespaces by deriving from classes in the model OA.NET/scheme.net model and adding further properties |
Well it definitely would make it easier to program if you reckon that is the best approach for extending it. Just to be clear though, with the case of "Action", the cache is a
Yeah there are a lot of changes - a complete refactor of how the objects are deserialized. I don't know how much you'll be able to use directly as having a look at your implementation, you've got some different functionality than we do. If Schema.NET can do the heaviest lifting of the deserialization and your code do specifics for OpenActive - even if you are still extending
Yeah that's basically what I was thinking now just combined with what you said about the 3rd party library (eg. OA) being the preference in a type lookup. |
Rebuilds the core logic of the
ValuesJsonConverter
for maintainability, performance and memory use. While the changes primarily focus on reads, there are a few other things fixed too.Firstly, this overhaul addresses the following problems outlined in #108:
Values<>
types (eg.Values<int, string>
- an array can have both types)OneOrMany
reading the values of an array of nullable typesDateTime
andDateTimeOffset
. While the MS date format format with offset (\/Date(946730040000-0100)\/
) is still not supported, the logic for handling date values (whether as a Date token from JSON.Net or as a string) is simplified.Values<>
is generally ignored. I say generally as I am still applying the same "right-to-left" rule of the generic types that the previousValuesJsonConverter
used. It will however attempt each type until success without throwing an exception.Some additional changes:
Values<>
handled theIEnumerable<object>
constructor which could lead it to having the same items listed multiple times if they could be assigned to multiple interfaces. Following the same "right-to-left" rule, it now systematically only adds the item to the first matching type it finds.Values<string, IBook>
and the JSON didn't actually have the@type
property, it wouldn't actually read the value in. With this overhaul, it now attempts to convert this not-known type toIBook
's concrete type. This resulted in testReadJson_Values_SingleValue_ThingInterfaceWithNoTypeToken
needing to be updated.Values4
now supports implicit arrays for each of the types like the other variety ofValues<>
types do.Performance
Serialization: Same performance with ~1% fewer allocations
Deserialization: ~30% faster with ~9% fewer allocations
BookBenchmark (Pre-Refactor)
WebsiteBenchmark (Pre-Refactor)
BookBenchmark (Post-Refactor)
WebsiteBenchmark (Post-Refactor)