-
-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert Schema classes generation to use source generators #252
Conversation
And it looks like I screwed up the commit I started from. I wrote |
702c819
to
29cbd5b
Compare
I've managed to improve processing performance by nearly 5x according to my profiling, primarily avoiding loops within loops within loops.
While there is probably more performance we can manage to extract, the main time now is just downloading the JSON files. |
I think all the format warnings in the build relate to the interface generation, it has a blank line at the end before the closing brace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Turnerj, this looks like nice work.
I haven't messed with source generators much yet partly because the tooling is not ready. Is it bad enough that we should wait for better VS tooling?
There seem to be a few remaining warnings to fix. I suspect the generated code also has potential warnings, do these actually show up?
Honestly, for debugging the source generator, you need to be pretty patient and do some weird hacks for debugging. For working with the library that has a working source generator, it seems to be pretty easy going. I think even with the debugging issues, the changes make understanding the generator logic easier while opening more possibilities. I think while we will have a rough patch for tooling to catch up, I think the source generator is a net positive.
The warnings don't show up in VS but you can see them via the command line using |
Have you heard anything from the VS team about when they're going to add better support? It seems pretty hacky at the moment. I really like that we no longer checkin all of the types, which will make PR's going forward a lot simpler to navigate.
That's not great, I've seen this occasionally but not sure what causes it. Ideally, we should get to the bottom of that. Overall, this looks good. Tooling will get better. My main concern is the new enum format which we should get to the bottom of. |
In VS 16.9 Preview 3, they added the ability to view generated files through solution explorer, that is one big step to helping debug source generators. https://docs.microsoft.com/en-us/visualstudio/releases/2019/release-notes-preview#NET-Productivity3 Besides that, I did find that a Roslyn dev suggests using a console application for debugging. We could do something similar, having a "Schema.NET.Generator.Debugger" console application that is similar to this: https://github.com/davidwengier/SourceGeneratorTemplate/blob/master/TestConsoleApp/Program.cs#L71 Don't know if we will have issues because of other dependencies like Newtonsoft.Json as this method seems to build an entire assembly from scratch rather than just add to an existing one. |
Since 16.9 has been released, I'm more comfortable with going with this. |
I just realised the tool previously included properties that are pending. For example, Is this intentional? Should we be generating pending properties on our normal types? We even have a test against it here: Schema.NET/Tests/Schema.NET.Test/MixedTypesTest.cs Lines 73 to 83 in 30068b7
Only noticed it while trying to "fix" my code that was meant to prevent pending types. |
On one hand, it is cleaner to have two distinct libraries where one is non-pending and one is everything. The problem with that is that the duplicates types across the libraries can't be shared between each other. The alternative is then to have all non-pending types but keep their pending properties. That way the "pending types" can just reference the main library and everything can be shared more easily. The problem here is that we don't just have pending types and pending properties, we have pending property types - the values that properties can have. For example, I think the only way to actually "solve" this is have everything in a single library. Maybe we can get the CI to auto-release new versions on a particular feed so that pending types can be up-to-date often? |
I like your idea to have all pending types in a separate library. Yes, I'd like to automate the versioning. Perhaps the XML feed contains a version number we can run the GitHub actions from. |
So having pending types in a separate library isn't a problem but how do we handle pending properties or properties that can take a pending type? If the property is pending on a type that isn't pending, what do we do? If a property is updated to take a pending type, what do we do? Splitting the assemblies covers us for pending types that extend existing types but doesn't cover us for pending properties or properties that are updated to take pending types. If we do 2 assemblies where one has no pending types/property and the other has everything (all pending and non-pending types and properties), the types won't be compatible across assemblies. Does that make sense? |
It certainly does make sense. I think the least worst approach is to go with two assemblies containing the all relevant types. You'd have to choose from one assembly or the other. If we go with the shared assembly approach, we either end up with missing pending properties or pending properties on released types. Neither of which seems like a good idea to me. |
The source generator references seem to be applying now
"Generator" classes are the objects the generator directly consumes
Source generators don't have a console to output to so we couldn't see the message if we wanted to
The enumeration list wasn't actually used and just repeated the initial operation that was done to the classes list.
Previously, for every property in every schema class, this was told to find every enumeration in the list of classes with a matching label. Instead, we can process the list once instead of N^3. A hashset also provides fast lookups of the data itself. This alone dramatically improves performance of processing the schema data.
Previous each schema class would need to look at every other schema class. Now that processing is done once and then checked again via a HashSet. Similarly with properties, each property was iterated for every class - now that happens once with each class only looping over its properties.
Changes to ID URLs broke the tests for enumeration. URLs now start with "schema:"
This fixes the Schema URLs to be as intended with "https://schema.org/" instead of "schema:".
The "tree" JSON file no longer is needed and actually breaks the layer logic. Explains why the generated file names were missing their layers!
Note: Pending properties are no longer being generated
BankAccountType is a pending property and shouldn't be part of the main assembly
274f45e
to
f5f46af
Compare
I've removed the now invalid test that was testing a pending property (that is now no longer generated). I also looked into the warnings I mentioned earlier, they don't seem to happen on Windows but they appear on the CI under Ubuntu and MacOS. You can see it annotated on the changed files here: https://github.com/RehanSaeed/Schema.NET/pull/252/files#diff-d9f4bad47ff6976da03a740c615555119f094fa6550ea47ce813809aaffdff9f The warnings are just "fix formatting" without any additional information. Visual Studio doesn't mention anything and looking at the annotated code, I can't see what the warning would be about. Besides that, I'm happy with the changes at the moment. Getting this in sets us up well for a separate PR to work on adding a second assembly with all types (inc. pending). |
@Turnerj I'm seeing those warnings on all my repos. I'm not certain what they're all about. In theory they should fail the build since we treat warnings as errors but they don't. The warning code starts with |
If you run |
Shall we merge these changes in then? |
Sure! I guess we can release now, although I've been adding nullable support to all my repos and might do this one too to fix #127. |
Switches from the previous file-based system for Schema classes to use source generators. Fixes up a bunch of tooling bugs along the way thanks to unknown/unexpected changes in the JSON files we consume.
Also closes #227 because why not! This was a fun issue because the source generator struggled to find System.Text.Json until I did some weird referencing for it - see
Schema.NET.Tool.csproj
for what I mean. I think though this would have happened for Newtonsoft.Json too.At the point of time of writing this description, it does generate the classes but am having some issues referencing them. Some of our existing code is commented out, some of the old generation code still remains etc.
There is a relatively large lag time when building, about 13 seconds on my machine, which most of it comes from processing the JSON files. The actual generation of the final source is only 100ms. It only takes a second or two to download the JSON so it seems to come down to how some of the nested property loops work. We will want to speed this up to make the process easier to use.
Debugging the source generator is a pain as well. I currently have settings in the
Schema.NET.csproj
which have it create the raw files in theobj
directory which is helpful. To actually debug the source generator, you need a snippet like the following in it:You don't always want this there as it seems to prompt you often, even when you're not actually building (might be some weird debugging issue I was having). Also that debugging it seems to crash some behind-the-scenes processes for Visual Studio.