Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Schema classes generation to use source generators #252

Merged
merged 27 commits into from
Mar 15, 2021

Conversation

Turnerj
Copy link
Collaborator

@Turnerj Turnerj commented Feb 27, 2021

Switches from the previous file-based system for Schema classes to use source generators. Fixes up a bunch of tooling bugs along the way thanks to unknown/unexpected changes in the JSON files we consume.

Also closes #227 because why not! This was a fun issue because the source generator struggled to find System.Text.Json until I did some weird referencing for it - see Schema.NET.Tool.csproj for what I mean. I think though this would have happened for Newtonsoft.Json too.

At the point of time of writing this description, it does generate the classes but am having some issues referencing them. Some of our existing code is commented out, some of the old generation code still remains etc.

There is a relatively large lag time when building, about 13 seconds on my machine, which most of it comes from processing the JSON files. The actual generation of the final source is only 100ms. It only takes a second or two to download the JSON so it seems to come down to how some of the nested property loops work. We will want to speed this up to make the process easier to use.

Debugging the source generator is a pain as well. I currently have settings in the Schema.NET.csproj which have it create the raw files in the obj directory which is helpful. To actually debug the source generator, you need a snippet like the following in it:

#if DEBUG
if (!System.Diagnostics.Debugger.IsAttached)
{
    System.Diagnostics.Debugger.Launch();
}
#endif

You don't always want this there as it seems to prompt you often, even when you're not actually building (might be some weird debugging issue I was having). Also that debugging it seems to crash some behind-the-scenes processes for Visual Studio.

@Turnerj Turnerj added enhancement Issues describing an enhancement or pull requests adding an enhancement. minor Pull requests requiring a minor version update according to semantic versioning. labels Feb 27, 2021
@Turnerj
Copy link
Collaborator Author

Turnerj commented Feb 27, 2021

And it looks like I screwed up the commit I started from. I wrote git pull upstream master and not git pull upstream main - probably should update my local branches too! doh

@Turnerj
Copy link
Collaborator Author

Turnerj commented Feb 28, 2021

I've managed to improve processing performance by nearly 5x according to my profiling, primarily avoiding loops within loops within loops.

  • Schema properties now use a hash map to find existing enumerations rather than iterate all schema classes (5ca53fd)
  • Schema classes no longer search every single schema property to find theirs, it is performed once and looked up (09069a3)
  • Schema classes now leverage a hash map to find valid parents rather than iterate all other schema classes (09069a3)

While there is probably more performance we can manage to extract, the main time now is just downloading the JSON files.

@Turnerj Turnerj marked this pull request as ready for review February 28, 2021 13:21
@Turnerj Turnerj changed the title WIP: Convert Schema classes generation to use source generators Convert Schema classes generation to use source generators Feb 28, 2021
@Turnerj
Copy link
Collaborator Author

Turnerj commented Feb 28, 2021

I think all the format warnings in the build relate to the interface generation, it has a blank line at the end before the closing brace.

Copy link
Owner

@RehanSaeed RehanSaeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Turnerj, this looks like nice work.

I haven't messed with source generators much yet partly because the tooling is not ready. Is it bad enough that we should wait for better VS tooling?

There seem to be a few remaining warnings to fix. I suspect the generated code also has potential warnings, do these actually show up?

Tests/Schema.NET.Test/core/BookTest.cs Outdated Show resolved Hide resolved
Tools/Schema.NET.Tool/Schema.NET.Tool.csproj Show resolved Hide resolved
@Turnerj
Copy link
Collaborator Author

Turnerj commented Mar 1, 2021

I haven't messed with source generators much yet partly because the tooling is not ready. Is it bad enough that we should wait for better VS tooling?

Honestly, for debugging the source generator, you need to be pretty patient and do some weird hacks for debugging. For working with the library that has a working source generator, it seems to be pretty easy going.

I think even with the debugging issues, the changes make understanding the generator logic easier while opening more possibilities. I think while we will have a rough patch for tooling to catch up, I think the source generator is a net positive.

There seem to be a few remaining warnings to fix. I suspect the generated code also has potential warnings, do these actually show up?

The warnings don't show up in VS but you can see them via the command line using dotnet build. Should be able to fix them without too many issues.

@RehanSaeed
Copy link
Owner

Honestly, for debugging the source generator, you need to be pretty patient and do some weird hacks for debugging. For working with the library that has a working source generator, it seems to be pretty easy going.

I think even with the debugging issues, the changes make understanding the generator logic easier while opening more possibilities. I think while we will have a rough patch for tooling to catch up, I think the source generator is a net positive.

Have you heard anything from the VS team about when they're going to add better support? It seems pretty hacky at the moment. I really like that we no longer checkin all of the types, which will make PR's going forward a lot simpler to navigate.

The warnings don't show up in VS but you can see them via the command line using dotnet build. Should be able to fix them without too many issues.

That's not great, I've seen this occasionally but not sure what causes it. Ideally, we should get to the bottom of that.

Overall, this looks good. Tooling will get better. My main concern is the new enum format which we should get to the bottom of.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Mar 2, 2021

Have you heard anything from the VS team about when they're going to add better support? It seems pretty hacky at the moment.

In VS 16.9 Preview 3, they added the ability to view generated files through solution explorer, that is one big step to helping debug source generators. https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes-preview#NET-Productivity3

Generated source in solution explorer

Besides that, I did find that a Roslyn dev suggests using a console application for debugging. We could do something similar, having a "Schema.NET.Generator.Debugger" console application that is similar to this: https://github.com/davidwengier/SourceGeneratorTemplate/blob/master/TestConsoleApp/Program.cs#L71

Don't know if we will have issues because of other dependencies like Newtonsoft.Json as this method seems to build an entire assembly from scratch rather than just add to an existing one.

@RehanSaeed
Copy link
Owner

Since 16.9 has been released, I'm more comfortable with going with this.

@RehanSaeed RehanSaeed added major Pull requests requiring a major version update according to semantic versioning. and removed minor Pull requests requiring a minor version update according to semantic versioning. labels Mar 4, 2021
@Turnerj
Copy link
Collaborator Author

Turnerj commented Mar 5, 2021

I just realised the tool previously included properties that are pending. For example, BankAccount has 3 custom properties, all of which are pending on Schema.org but we have them generated.

Is this intentional? Should we be generating pending properties on our normal types? We even have a test against it here:

public void ToString_SerializesBankAccountTypeFromStringAndUri_BankAccountTypeHasTwoValues()
{
var bankAccount = new BankAccount()
{
BankAccountType = new List<object>()
{
"https://example.com/1",
new Uri("https://example.com/2"),
},
};
var json =

Only noticed it while trying to "fix" my code that was meant to prevent pending types.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Mar 5, 2021

On one hand, it is cleaner to have two distinct libraries where one is non-pending and one is everything. The problem with that is that the duplicates types across the libraries can't be shared between each other.

The alternative is then to have all non-pending types but keep their pending properties. That way the "pending types" can just reference the main library and everything can be shared more easily.

The problem here is that we don't just have pending types and pending properties, we have pending property types - the values that properties can have. For example, VirtualLocation is a pending type but Action.Location is a non-pending property BUT Action.Location is allowed to be VirtualLocation. If we don't update the property type in the main library, we can't ever set a VirtualLocation as a location. If we do update the property type in the main library, it won't compile because the VirtualLocation class won't exist (and we can't reference it in our pending library because that makes a circular reference).

I think the only way to actually "solve" this is have everything in a single library. Maybe we can get the CI to auto-release new versions on a particular feed so that pending types can be up-to-date often?

@RehanSaeed
Copy link
Owner

I like your idea to have all pending types in a separate library.

Yes, I'd like to automate the versioning. Perhaps the XML feed contains a version number we can run the GitHub actions from.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Mar 6, 2021

So having pending types in a separate library isn't a problem but how do we handle pending properties or properties that can take a pending type? If the property is pending on a type that isn't pending, what do we do? If a property is updated to take a pending type, what do we do?

Splitting the assemblies covers us for pending types that extend existing types but doesn't cover us for pending properties or properties that are updated to take pending types. If we do 2 assemblies where one has no pending types/property and the other has everything (all pending and non-pending types and properties), the types won't be compatible across assemblies.

Does that make sense?

@RehanSaeed
Copy link
Owner

It certainly does make sense. I think the least worst approach is to go with two assemblies containing the all relevant types. You'd have to choose from one assembly or the other. If we go with the shared assembly approach, we either end up with missing pending properties or pending properties on released types. Neither of which seems like a good idea to me.

The enumeration list wasn't actually used and just repeated the initial operation that was done to the classes list.
Previously, for every property in every schema class, this was told to find every enumeration in the list of classes with a matching label. Instead, we can process the list once instead of N^3. A hashset also provides fast lookups of the data itself.

This alone dramatically improves performance of processing the schema data.
Previous each schema class would need to look at every other schema class. Now that processing is done once and then checked again via a HashSet. Similarly with properties, each property was iterated for every class - now that happens once with each class only looping over its properties.
Changes to ID URLs broke the tests for enumeration. URLs now start with "schema:"
This fixes the Schema URLs to be as intended with "https://schema.org/" instead of "schema:".
The "tree" JSON file no longer is needed and actually breaks the layer logic. Explains why the generated file names were missing their layers!
Note: Pending properties are no longer being generated
BankAccountType is a pending property and shouldn't be part of the main assembly
@Turnerj
Copy link
Collaborator Author

Turnerj commented Mar 15, 2021

I've removed the now invalid test that was testing a pending property (that is now no longer generated). I also looked into the warnings I mentioned earlier, they don't seem to happen on Windows but they appear on the CI under Ubuntu and MacOS.

You can see it annotated on the changed files here: https://github.com/RehanSaeed/Schema.NET/pull/252/files#diff-d9f4bad47ff6976da03a740c615555119f094fa6550ea47ce813809aaffdff9f

The warnings are just "fix formatting" without any additional information. Visual Studio doesn't mention anything and looking at the annotated code, I can't see what the warning would be about.

Besides that, I'm happy with the changes at the moment. Getting this in sets us up well for a separate PR to work on adding a second assembly with all types (inc. pending).

@RehanSaeed
Copy link
Owner

@Turnerj I'm seeing those warnings on all my repos. I'm not certain what they're all about. In theory they should fail the build since we treat warnings as errors but they don't. The warning code starts with IDE, so it's probably a warning caused by the .editorconfig file. So far, I've been putting it down to a .NET tooling bug since .editorconfig is still being worked on.

@RehanSaeed
Copy link
Owner

If you run dotnet build from CI, you do get the warning and can get an idea of what it might be. Personally, I'm happy to fix the warning once .NET fixes it's tooling and the warning also shows up in VS.

@Turnerj
Copy link
Collaborator Author

Turnerj commented Mar 15, 2021

Shall we merge these changes in then?

@RehanSaeed RehanSaeed merged commit 0cf861b into RehanSaeed:main Mar 15, 2021
@RehanSaeed
Copy link
Owner

Sure! I guess we can release now, although I've been adding nullable support to all my repos and might do this one too to fix #127.

@Turnerj Turnerj deleted the source-generators branch March 15, 2021 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issues describing an enhancement or pull requests adding an enhancement. major Pull requests requiring a major version update according to semantic versioning.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert Schema.NET.Tool From Newtonsoft.Json to System.Text.Json
2 participants