Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul ValuesJsonConverter #109

Merged
merged 19 commits into from Dec 19, 2019

Conversation

Turnerj
Copy link
Collaborator

@Turnerj Turnerj commented Dec 13, 2019

Rebuilds the core logic of the ValuesJsonConverter for maintainability, performance and memory use. While the changes primarily focus on reads, there are a few other things fixed too.

Firstly, this overhaul addresses the following problems outlined in #108:

  • Supports reading mixed array values for Values<> types (eg. Values<int, string> - an array can have both types)
  • OneOrMany reading the values of an array of nullable types
  • (Some) better handling around DateTime and DateTimeOffset. While the MS date format format with offset (\/Date(946730040000-0100)\/) is still not supported, the logic for handling date values (whether as a Date token from JSON.Net or as a string) is simplified.
  • The ordering of generic types in Values<> is generally ignored. I say generally as I am still applying the same "right-to-left" rule of the generic types that the previous ValuesJsonConverter used. It will however attempt each type until success without throwing an exception.

Some additional changes:

  • There was a bug around how Values<> handled the IEnumerable<object> constructor which could lead it to having the same items listed multiple times if they could be assigned to multiple interfaces. Following the same "right-to-left" rule, it now systematically only adds the item to the first matching type it finds.
  • Previously if there was something like Values<string, IBook> and the JSON didn't actually have the @type property, it wouldn't actually read the value in. With this overhaul, it now attempts to convert this not-known type to IBook's concrete type. This resulted in test ReadJson_Values_SingleValue_ThingInterfaceWithNoTypeToken needing to be updated.
  • Values4 now supports implicit arrays for each of the types like the other variety of Values<> types do.

Performance
Serialization: Same performance with ~1% fewer allocations
Deserialization: ~30% faster with ~9% fewer allocations

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-VTGAFA : .NET Framework 4.8 (4.8.4075.0), X64 RyuJIT
  Job-ZVNHQX : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT

BookBenchmark (Pre-Refactor)

Method Runtime Mean Error StdDev Min Max Gen 0 Gen 1 Gen 2 Allocated
Serialize .NET 4.7.2 181.2 us 2.75 us 2.29 us 177.4 us 184.2 us 15.3809 - - 47.52 KB
Deserialize .NET 4.7.2 466.6 us 10.64 us 21.97 us 438.6 us 572.1 us 38.0859 - - 119.51 KB
Serialize .NET Core 3.0 154.9 us 3.05 us 2.85 us 151.7 us 160.1 us 15.3809 - - 47.27 KB
Deserialize .NET Core 3.0 392.4 us 6.23 us 5.83 us 379.9 us 402.3 us 37.1094 - - 116.11 KB

WebsiteBenchmark (Pre-Refactor)

Method Runtime Mean Error StdDev Min Max Gen 0 Gen 1 Gen 2 Allocated
Serialize .NET 4.7.2 25.35 us 0.141 us 0.125 us 25.17 us 25.57 us 2.9907 - - 9.25 KB
Deserialize .NET 4.7.2 49.79 us 0.316 us 0.296 us 49.34 us 50.41 us 4.8218 - - 14.96 KB
Serialize .NET Core 3.0 23.42 us 0.443 us 0.455 us 22.63 us 24.01 us 2.9602 - - 9.16 KB
Deserialize .NET Core 3.0 40.84 us 0.285 us 0.238 us 40.43 us 41.20 us 4.7607 - - 14.65 KB

BookBenchmark (Post-Refactor)

Method Runtime Mean Error StdDev Min Max Gen 0 Gen 1 Gen 2 Allocated
Serialize .NET 4.7.2 166.1 us 2.79 us 2.48 us 159.4 us 170.5 us 15.1367 - - 47.12 KB
Deserialize .NET 4.7.2 327.7 us 1.33 us 1.18 us 325.6 us 330.1 us 34.6680 - - 106.57 KB
Serialize .NET Core 3.0 154.7 us 1.65 us 1.54 us 151.2 us 157.6 us 15.1367 - - 46.9 KB
Deserialize .NET Core 3.0 276.4 us 1.90 us 1.78 us 272.6 us 280.0 us 33.6914 - - 103.28 KB

WebsiteBenchmark (Post-Refactor)

Method Runtime Mean Error StdDev Min Max Gen 0 Gen 1 Gen 2 Allocated
Serialize .NET 4.7.2 24.70 us 0.373 us 0.349 us 23.98 us 25.11 us 2.9907 - - 9.25 KB
Deserialize .NET 4.7.2 35.61 us 0.632 us 0.591 us 34.40 us 36.66 us 4.4556 - - 13.75 KB
Serialize .NET Core 3.0 23.68 us 0.359 us 0.336 us 23.07 us 24.45 us 2.9602 - - 9.16 KB
Deserialize .NET Core 3.0 29.17 us 0.131 us 0.116 us 28.98 us 29.38 us 4.3640 - - 13.4 KB

@Turnerj Turnerj added the enhancement Issues describing an enhancement or pull requests adding an enhancement. label Dec 13, 2019
@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 13, 2019

There is probably a little more maintainability (and maybe in some performance) once a lot of the switching from Type and TypeInfo goes away - I might actually test that.


foreach (var type in thisAssembly.ExportedTypes)
{
var typeInfo = type.GetTypeInfo();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if someone has a custom type? There are devs who have extended Schema.NET with custom schema types built on top of the ones in this project.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is a fair call. The old converter did support any type in the "Schema.Net" namespace so if they added any types in a different assembly to that namespace they would be missed by that type caching.

Copy link
Collaborator Author

@Turnerj Turnerj Dec 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through how the old support worked, it is a little interesting. Technically it would only support custom types if the generic arguments to Values<> or OneOrMany<> listed that type specifically. That is, the custom type wouldn't work if an interface or any parent type was used instead.

There was a loophole (I say loophole as I'm not sure it was intended this way) but if you added your type to the namespace "Schema.NET" and put the assembly name in the @type property (eg. "MyFancyType, MyAssembly"), it would load your custom type. I didn't pick that up at first as it just seemed like that code was just for loading built-in types. It does however work as a good security measure by requiring the namespace as the idea of loading ANY type and instantiating it could be a big security problem (some poorly written types might actually do work in their constructor or at least be very allocate-y).

So the code as it stood when you saw it supported the former - if a concrete type is used as a generic argument, it would correctly deserialize to it. To have a concrete type as the generic argument is a bit of a chicken-and-egg scenario as to have a custom property with your concrete type requires a concrete type.

What I've done in 536a53a is extend support to allow the latter - if a type is in the "Schema.NET" namespace, it will look the type up the old way. I've kept the cached type lookup as a means to mitigate some allocations from string concatenation and likely are still faster lookups from the dictionary (though I might try a benchmark testing that specific theory out).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I have never tried adding custom types myself but I have seen a few devs mention that they do so, so it would be good to continue to support them.

If we could extend support and lift the limitation on the Schema.NET namespace, I think that would be a nice improvement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, happy to try and make support better for custom types though want to be careful we don't start instantiating any-old type otherwise it might open up security issues as we are deriving the type from the payload - might need to look into this more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long we check that types inherit from Thing, I don't think there should be any security issues.

}
else
else if (targetType == typeof(Uri))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Guid should also be here, if we are checking a bunch of types? Not that schema.org has any Guid types (that I know of) but people might want to create custom types with Guids.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, GUID might be a good one to add and wouldn't be hard to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in d4f08f5 including test.

@RehanSaeed
Copy link
Owner

Looks like AppVeyor is having some issue with it's Ubuntu image which is why that is failing. Having 2 CI servers is really paying off.

Switching from Linq to foreach is all good for performance. That was never a major concern for me in the beginning but I guess it should now be now that the project has matured a little.

There are a lot of scary looking changes in the converter. As long as the tests pass though and we can get the new tests passing, all is well.

A few questions inline.

Implicit external types refers to properties where `Values` or `OneOrMany` refers to a shared interface and not the concrete type.

Types must be assignable to `IThing` and will need the full namespace and assembly.
@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 17, 2019

@nickevansuk - with OpenActive.NET, I noticed you've got a lot of your own types etc as well as your own converters. Is there anything I can do in Schema.NET through this overhaul of the ValuesJsonConveter that will make your life easier?

For example: Would you find it easier to specify custom types concretely (eg. Values<string, MyConcreteType>), values implicitly (eg. Values<string, IThing> with the @type JSON property as "MyNamespace.MyType, MyAssembly") or would a registration system work best (where you register an assembly and it grabs all the types that are assignable to IThing - this allows you to still have the @type JSON property as simple as "MyType" rather than all the namespace and assembly nonsense)?

@Turnerj
Copy link
Collaborator Author

Turnerj commented Dec 17, 2019

I've done a few more tweaks to the reading support, adding TimeSpan support natively in ValuesJsonConverter. This support includes both time formats (eg. "12:34:56") and duration formats (eg. "PT12H34M56S"). This support allows these values to also be in an array which wasn't possible with the read support via the current TimeSpan converter. The TimeSpan converter is still used as that handles writing the duration format. Without it, TimeSpan will write as "HH:MM:SS".

Another change is the handling of the WriteObject method so that both the TimeSpan and DateTime converters are used properly even if there are multiple items. Previously, if there was an array of DateTime items to write with the DateTime converter, they would include the full date and time rather than the date only.

Copy link
Owner

@RehanSaeed RehanSaeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting very good. What's left to do?

Tests/Schema.NET.Test/ValuesJsonConverterTest.cs Outdated Show resolved Hide resolved
Source/Schema.NET/Values{T1,T2,T3,T4}.cs Show resolved Hide resolved
@RehanSaeed
Copy link
Owner

Thank you. Now to look at your other PR's and update mine.

@Turnerj Turnerj deleted the refactoring-valuesjsonconverter branch December 19, 2019 13:07
@nickevansuk
Copy link
Contributor

nickevansuk commented Jul 3, 2020

@Turnerj I'm really sorry I totally missed your message above as I was on annual leave at the time - it must have got buried in the notifications. Only just stumbled across this now!

With OpenActive.NET, I noticed you've got a lot of your own types etc as well as your own converters. Is there anything I can do in Schema.NET through this overhaul of the ValuesJsonConveter that will make your life easier?

Looking at this PR, it looks like it will support JSON of the following form for external types:

{
    "@type":"ExternalSchemaModelSharedNamespace, Schema.NET.Test",
    "name":"Property from Thing",
    "myCustomProperty":"My Test String"
}

I just wanted to quickly make the point that the types we're using are all defined in a JSON-LD namespace. So for OpenActive, we have a defined vocabulary https://openactive.io/ns, which is akin to the https://schema.org/ vocabulary (and in fact builds on top of it).

As you can see from that link above, the terms in the vocab are defined by the OpenActive specifications, and there are libraries similar to OpenActive.NET in other languages too (PHP and Ruby currently). We also have a validator and test suite that checks for conformance.

All this is to say we can't change the JSON that we're outputting - we'll need to output e.g. "@type": "SessionSeries" - as that's strictly defined within the OpenActive specifications in line with JSON-LD conventions. Instead the requirement here is to build on top of Schema.NET with additional terms (those defined here https://openactive.io/ns), which is what we do with OpenActive.NET.

This is likely to be the same requirement as anyone else who maintains a custom vocab on top of schema.org, as they're likely adhering to the same conventions.

Does that make sense? Apologies if I've missed something, and sorry again for missing your message back in December!

We're looking to bump our dependency version of Schema.NET soon to stay up-to-date, so will take a look into the great work you've been doing here in more detail when we come to that. Very much appreciate you considering our use case!

@Turnerj
Copy link
Collaborator Author

Turnerj commented Jul 4, 2020

Ahhhh yeah, responding just a little after I messaged you. 😂

Sometimes we simply didn't know what Type object is actually appropriate when deserializing - this is particularly true when deserializing to interfaces. When you have something like OneOrMany<IBook> - sure we can guess Book as a safe target type as that implements IBook but what if your @type was CustomBook which doesn't extend Book? We've effectively deserialized to the wrong type - a closely compatible but still, the wrong type. If we were to re-serialize the type, we would lose any specificity that CustomBook might have had.

So the next thought might be to use Type.GetType("{NamespacePrefix}.CustomBook"). This would be fine if that type was defined in the namespace prefix used. Unfortunately that namespace prefix is static - for Schema.NET it used to be "Schema.NET" and for OpenActive.NET (as you re-implemented ValuesJsonConverter), it is "OpenActive.NET". Having Schema.NET only support types defined in the Schema.NET namespace automatically excluded OpenActive.NET being able to use our ValuesJsonConverter as we could never deserialize to any of your custom types.

Rather than keep this limitation in place (and because the core Schema.NET types were no longer going to use Type.GetType(string typeName) anyway), I figured why have the restriction at all. This opened the door for what you've described - if you specify "MyFullNamespace.MyType, MyAssembly" in @type, we could now deserialize to it. Yes, it is cumbersome (see my comment here about details) but it was better than nothing.

In the comment you've quoted from me, there is another option I proposed:

or would a registration system work best (where you register an assembly and it grabs all the types that are assignable to IThing - this allows you to still have the @type JSON property as simple as "MyType" rather than all the namespace and assembly nonsense)?

In hindsight, while that wasn't necessarily a bad suggestion, even that would have been a little cumbersome as libraries don't have an entry point to initialise something like that - you likely would have needed to have a static constructor trigger it.

Thinking through this again now, we could do something a little different to better support everything and especially take into account things like vocabularies defined in the @context.

Currently (from this PR), we maintain a type dictionary we look up for Schema.NET types. If we open the cache up to hold more than built-in types and use the vocabulary as part of the key (to support conflicting names for types in different vocabs), we could do an all-assembly type search for the specified type and effectively replace our TryGetConcreteType method.

  • Check cache for matching vocab (@context) and type (@type)
  • On miss, search all non-System assemblies for types that match the name
  • For all types that match the name, check that IThing is assignable from that type
  • Do some magic to determine the vocab of the actual type
  • For the first type that matches the vocab and name, store the vocab, type name and type object into our cache

The magic part still needs work as there doesn't seem to be a nice way to discover this without instantiating the object and even then, OpenActive (for example) doesn't actually set the Context object on instantiation to be anything but "https://schema.org" set from Schema.NET's instantiation of JsonLdContext. I think an inherited attribute might work - we set it in Schema.NET against Thing and if OpenActive.NET set it on any of its types that extend ours, we would be able to read the context (thus the vocab) at runtime.

All this said, it is probably only worth doing if implementations extending Schema.NET (like OpenActive.NET) use our ValuesJsonConverter and then implements whatever that "magic" is for us to read the vocab from the .NET type.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Jul 4, 2020

To summarise - while yes we effectively opened the door for "MyNamespace.MyType, MyAssembly" to be supported, it does have a lot of drawbacks limiting its use.

To have a more comprehensive way to find concrete types, we need to be able to know the vocabulary through reflection, ideally through a method that doesn't require instantiation - so maybe an attribute. If we did this, and OpenActive.NET used it, we could effectively load any compatible type that is in memory while also taking into account any name conflicts (eg. like how OpenActive.NET and Schema.NET both have an Action type).

@nickevansuk
Copy link
Contributor

nickevansuk commented Jul 4, 2020

Interesting - ok so some background on naming conflicts: the @context aliases names so they don't require a namespace. For example: oa:SessionSeries is aliased to SessionSeries.

The Action type is actually schema:Action in both OpenActive.NET and Schema.NET - it's just that the OpenActive.NET class includes the additional properties that OpenActive defines against schema:Action, and restricts the properties that OpenActive use within schema.org to a stricter set of types (e.g. OpenActive defines everything as an array, or singular - in schema.org everything can be either). Hence why OpenActive.NET.Action is derived from Schema.NET.Action.

It would be bad modelling practice for OpenActive to define a distinct term with the same name as something already in schema.org (you'll notice that https://openactive.io/ns/ does not include Action - and even if it did, that type would need to be referenced in the JSON using the oa: prefix defined within the @context i.e "@type": "oa:Action" to differentiate.)

This makes conflict resolution straightforward (and likely the same for other similar libraries), as it's just that preference is given to the extending lib.

@nickevansuk
Copy link
Contributor

nickevansuk commented Jul 4, 2020

Sharing the ValuesJsonConverter (and as much of the rest of the serialisation framework as possible) certainly makes sense from a maintainability point of view.

I haven't yet had chance to catch up with the rest of progress here, but it sounds like the cache has created efficiencies in deserialisation, which is great!

Just thinking aloud, building on your suggestion above:

If every library that inherited from schema.org also extended the SchemaSerializer class (e.g. "OpenActiveSerializer"), an additional cache could be set in the class and passed up to its base with the deserialisation call. Logic in the ValuesJsonConverter could just attempt to match against each extending lib's cache in order of preference (as per previous comment).

So more concretely, this code could run in OpenActive.NET, and then the resulting OA-specific BuiltInThingTypeLookup passed up:

static ValuesJsonConverter()
{
var thisAssembly = ThingInterfaceTypeInfo.Assembly;
foreach (var type in thisAssembly.ExportedTypes)
{
var typeInfo = type.GetTypeInfo();
if (typeInfo.IsClass && ThingInterfaceTypeInfo.IsAssignableFrom(typeInfo))
{
BuiltInThingTypeLookup.Add(type.Name, type);
}
}
}

Another option (which I've not validated for performance): to get around the registration issues you noted in your comment (which otherwise sounds like a great solution!) could the code above not run in Schema.NET to check all assemblies for anything that inherits from JsonLdObject (not all our classes inherit Thing), and add them to the cache? The preference in the cache (as previous comment) would mean that just the leaf classes of the same name get stored in the type tree (so OpenActive.NET.Action would be referenced over Schema.NET.Action, as it is derived from it)? This might not be as efficient as the pre-populated cache plan above however...

The latter suggestion would also more easily allows for library users to add custom properties from their own namespaces by deriving from classes in the model OA.NET/scheme.net model and adding further properties

@Turnerj
Copy link
Collaborator Author

Turnerj commented Jul 4, 2020

This makes conflict resolution straightforward (and likely the same for other similar libraries), as it's just that preference is given to the extending lib.

Well it definitely would make it easier to program if you reckon that is the best approach for extending it. Just to be clear though, with the case of "Action", the cache is a Dictionary<string, Type> so if we preference the extending library, we could be inadvertently deserializing to say OA "Action" rather than Schema.NET "Action". Would it actually be beneficial to then have conflicts actually be deserialization targets? Like we could "cache" all OA types that don't conflict with existing ones which should then technically work (unless you see a problem here).

Sharing the ValuesJsonConverter (and as much of the rest of the serialisation framework as possible) certainly makes sense from a maintainability point of view.

I haven't yet had chance to catch up with the rest of progress here, but it sounds like the cache has created efficiencies in deserialisation, which is great!

Yeah there are a lot of changes - a complete refactor of how the objects are deserialized. I don't know how much you'll be able to use directly as having a look at your implementation, you've got some different functionality than we do.

If Schema.NET can do the heaviest lifting of the deserialization and your code do specifics for OpenActive - even if you are still extending ValuesJsonConverter itself - then you can hopefully more "effortlessly" pull in some of the improvements and fixes we do on our end.

Another option (which I've not validated for performance): to get around the registration issues you noted in your comment (which otherwise sounds like a great solution!) could the code above not run in Schema.NET to check all assemblies for anything that inherits from JsonLdObject (not all our classes inherit Thing), and add them to the cache? The preference in the cache (as previous comment) would mean that just the leaf classes of the same name get stored in the type tree (so OpenActive.NET.Action would be referenced over Schema.NET.Action, as it is derived from it)? This might not be as efficient as the pre-populated cache plan above however...

The latter suggestion would also more easily allows for library users to add custom properties from their own namespaces by deriving from classes in the model OA.NET/scheme.net model and adding further properties

Yeah that's basically what I was thinking now just combined with what you said about the 3rd party library (eg. OA) being the preference in a type lookup.

@Turnerj Turnerj mentioned this pull request Sep 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues describing an enhancement or pull requests adding an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants