Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by removing sync-over-async by generating sync methods using Zomp.SyncMethodGenerator #2136

Open
wants to merge 4 commits into
base: 12.x-dev
Choose a base branch
from

Conversation

virzak
Copy link
Contributor

@virzak virzak commented Aug 8, 2023

The main advantage of this approach is performance.

The performance is increased by 26%, 21%, 22%, 28%, 30% and 7% for the following benchmarks in ValidationBenchmark.

useAsync flag

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.22621
AMD Ryzen 9 5900HS with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=7.0.400-preview.23330.10
[Host] : .NET Core 6.0.20 (CoreCLR 6.0.2023.32017, CoreFX 6.0.2023.32017), X64 RyuJIT
DefaultJob : .NET Core 6.0.20 (CoreCLR 6.0.2023.32017, CoreFX 6.0.2023.32017), X64 RyuJIT

Method DataSet Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
FailFast HalfErrors 452.61 ms 19.804 ms 57.456 ms 31000.0000 - - 249.08 MB
Validate HalfErrors 613.89 ms 23.585 ms 68.425 ms 45000.0000 - - 362.54 MB
FailFast ManyErrors 23.21 ms 0.981 ms 2.766 ms 2562.5000 - - 20.5 MB
Validate ManyErrors 996.51 ms 41.464 ms 120.952 ms 55000.0000 3000.0000 - 444.06 MB
FailFast NoErrors 865.26 ms 29.054 ms 83.360 ms 44000.0000 - - 355.72 MB
Validate NoErrors 678.55 ms 42.455 ms 119.745 ms 44000.0000 - - 355.72 MB

SyncMethodGenerator

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.22621
AMD Ryzen 9 5900HS with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=7.0.400-preview.23330.10
[Host] : .NET Core 6.0.20 (CoreCLR 6.0.2023.32017, CoreFX 6.0.2023.32017), X64 RyuJIT
DefaultJob : .NET Core 6.0.20 (CoreCLR 6.0.2023.32017, CoreFX 6.0.2023.32017), X64 RyuJIT

Method DataSet Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
FailFast HalfErrors 333.36 ms 19.547 ms 57.329 ms 322.04 ms 31000.0000 - - 249.08 MB
Validate HalfErrors 484.47 ms 23.499 ms 68.547 ms 475.97 ms 45000.0000 - - 362.54 MB
FailFast ManyErrors 18.55 ms 0.754 ms 2.150 ms 17.90 ms 2562.5000 - - 20.5 MB
Validate ManyErrors 713.78 ms 36.968 ms 108.420 ms 693.34 ms 55000.0000 3000.0000 - 444.06 MB
FailFast NoErrors 603.87 ms 25.970 ms 75.756 ms 602.52 ms 44000.0000 - - 355.72 MB
Validate NoErrors 632.17 ms 39.229 ms 115.053 ms 626.33 ms 44000.0000 - - 355.72 MB

Other than that, there is less lines of code and no "acceptable" sync-over-async, which still generates a state machine. It was also throwing me off seeing "Async" (eg ValidateAsync and ValidateInternalAsync) in the stack trace when I wasn't doing anything asynchronously.

The only downside is that the DLL grew from 458 KB to 466 KB, but that's less than 2%.

@JeremySkinner
Copy link
Member

JeremySkinner commented Aug 8, 2023

Thanks for the suggestion - this is an interesting idea. My initial reaction was to say that this isn't something I'm keen on, but thinking about it further I'm open to have a discussion about it but I have a few concerns:

  • Makes it much less obvious what's happening, I prefer the explicitness of the current approach.
  • I want to avoid forcing consumers to take on additional dependency. Does this work with PrivateAssets=all in the package reference?
  • Build fails currently with the error Zomp.SyncMethodGenerator.dll' references version '4.6.0.0' of the compiler, which is newer than the currently running version '4.4.0.0'.. FluentValidation compilation is done with the .NET 7 SDK (7.0.100), targeting following TFMs: netstandard2.0;netstandard2.1;net5.0;net6.0;net7.0 (although netstandard2.0 and net5.0 will be removed in the next major release). I see that's been fixed now

@virzak
Copy link
Contributor Author

virzak commented Aug 8, 2023

In reverse order: library users won't know the difference other than performance improvement and perhaps stack trace. Looking at the new package using NuGet Package Explorer is this:

image

It seems that no source generator needs to add PrivateAssets=all

About the code being less obvious. Since library users won't know the difference, the question is how less obvious is it for contributors?

You would be correct to say that any source generator is less obvious, because sometimes you can't see a method / property unless you F12 from Visual Studio. Previous contributors might wonder what is CreateSyncVersion attribute.

On the other hand, this sync-over-async approach also had to have a paragraph with explanation of why it is acceptable and any new contributor who knows that Async shouldn't be used in synchronized methods will have a learning curve. Which is the steeper learning curve? Also the end users who see 'Async' in the stack trace might be confused. Btw, that's the exact reason for me submitting this PR.

IMO, while code clarity and obviousness can be argued either way, the boost in performance tips the scale for this change.

I'd be curious to know what was the most confusing part for you in this approach, so perhaps it could be remedied.

For example one idea I had (which isn't yet implemented):

Instead of:

#if SYNC_ONLY
    throw new AsyncValidatorInvokedSynchronouslyException();
#endif

The syntax could be rewritten as

[ThrowWhenInvokedSynchronously<AsyncValidatorInvokedSynchronouslyException>]
internal Func<ValidationContext<T>, CancellationToken, Task<bool>> AsyncCondition => _asyncCondition;

@JeremySkinner
Copy link
Member

hi @virzak thanks for getting back to me

Looking at the new package using NuGet Package Explorer is this:

What does the list of dependent packages look like?

On the other hand, this sync-over-async approach also had to have a paragraph with explanation of why it is acceptable and any new contributor who knows that Async shouldn't be used in synchronized methods will have a learning curve.

Yep all good points, it's a tradeoff, although personally I prefer having the explicitness of being able to see the method with an explanatory comment, and think this is more contributor friendly.

Also the end users who see 'Async' in the stack trace might be confused. Btw, that's the exact reason for me submitting this PR.

That could be partially solved by dropping the async suffix from the method name if its causing confusion. I'm not precious about using the async suffix for internal methods if it clears up ambiguity.

IMO, while code clarity and obviousness can be argued either way, the boost in performance tips the scale for this change.

At the end of the day I think it'll be a case of whether the performance boost balances out my discomfort or not. I used to maintain 2 explicit code paths (sync & async), although this technically performed better than the current approach, it was a massive pain to maintain and the minor performance degredation was balanced out by other performance savings made elsewhere. I'll be the one working on the codebase and maintaining this approach, so I need to be completely comfortable with it, so it needs to be something I'm comfortable with.

I'd be curious to know what was the most confusing part for you in this approach, so perhaps it could be remedied.

Generally I don't like not being able to see the method implementation, although that's a broader issue with source generators as a whole, and why I don't generally like to use them as I want to actually see the code that I'm compiling. I'm also not fond of the conditional SYNC_ONLY symbol; if the method is now purely async then I wouldn't expect to see any sync code paths in there at all. The attribute as an alternative approach is interesting, although again the more rewriting it's doing behind the scenes the less comfortable I am with it.

Also how battle tested is this in production? It looks pretty new, and for a library like FluentValidation with such a large userbase over such a long period of time I try to prioritise stability and am very cautious about taking a dependency on anything that's not been extensively battle tested.

@virzak
Copy link
Contributor Author

virzak commented Aug 8, 2023

hi @virzak thanks for getting back to me

Looking at the new package using NuGet Package Explorer is this:

What does the list of dependent packages look like?

FluentValidation only references these ones. The Zomp.SyncMethodGenerator source generator does not become a dependency.

System.Collections, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Collections.Concurrent, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Linq, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Linq.Expressions, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Runtime, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Text.RegularExpressions, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a
System.Threading, Version=7.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a

On the other hand, this sync-over-async approach also had to have a paragraph with explanation of why it is acceptable and any new contributor who knows that Async shouldn't be used in synchronized methods will have a learning curve.

Yep all good points, it's a tradeoff, although personally I prefer having the explicitness of being able to see the method with an explanatory comment, and think this is more contributor friendly.

Not sure if this is something you're aware of, but you can explicitly see the synchronized method. Generated files are created under dependencies > analysers

image

And as you edit the asynchronous one, the other gets updated right away:

image

Also the end users who see 'Async' in the stack trace might be confused. Btw, that's the exact reason for me submitting this PR.

That could be partially solved by dropping the async suffix from the method name if its causing confusion. I'm not precious about using the async suffix for internal methods if it clears up ambiguity.

Async suffix convention could be adhered to with the source generator. That way the stack trace shows Async for asynchronous calls.

IMO, while code clarity and obviousness can be argued either way, the boost in performance tips the scale for this change.

At the end of the day I think it'll be a case of whether the performance boost balances out my discomfort or not. I used to maintain 2 explicit code paths (sync & async), although this technically performed better than the current approach, it was a massive pain to maintain and the minor performance degredation was balanced out by other performance savings made elsewhere. I'll be the one working on the codebase and maintaining this approach, so I need to be completely comfortable with it, so it needs to be something I'm comfortable with.

The proposed approach is much closer to your original solution, except that the massive pain is no longer there. With you having the two editor windows side by side, you're making changes, while ensuring correctness as well.

I'd be curious to know what was the most confusing part for you in this approach, so perhaps it could be remedied.

Generally I don't like not being able to see the method implementation, although that's a broader issue with source generators as a whole, and why I don't generally like to use them as I want to actually see the code that I'm compiling. I'm also not fond of the conditional SYNC_ONLY symbol; if the method is now purely async then I wouldn't expect to see any sync code paths in there at all. The attribute as an alternative approach is interesting, although again the more rewriting it's doing behind the scenes the less comfortable I am with it.

So not seeing the compiled code has been addressed above. SYNC_ONLY symbol was a solution to where a slight variation between sync and async versions occur. It is kinda all encompassing solution, but for FluentValidation an attribute would make more sense and I'll be happy to implement it.

Also how battle tested is this in production? It looks pretty new, and for a library like FluentValidation with such a large userbase over such a long period of time I try to prioritise stability and am very cautious about taking a dependency on anything that's not been extensively battle tested.

Like all 3rd party source generators, this package is pretty new. However, it has been battle tested in my client project (closed source) for a bit less than a year. It has been a great investment, since performance is crucial there. Other than how it is used in FluentValidation, there's a lot of IAsyncEnumerable rewrites into IEnumerable.

It has nearly 100% code coverage and was recently integrated into atldotnet. Also it seems like this source generator appeals mostly to people who aren't aware of the sync-over-async approach. For example here is the discussion with the maintainer of atldotnet.

Totally understand the reluctance of integrating a library with 17 stars into a library with 8.3k stars, but this is merely a source generator, so of all 17 star projects this would probably be the safest. You also have my commitment to resolve any future issues that might occur.

@JeremySkinner
Copy link
Member

JeremySkinner commented Aug 8, 2023

Thanks for getting back to me, I’ll have a further think about it and do some experimenting too and I’ll give you a shout if I have any questions. If we go down this route then it wouldn’t be outside of a major version release anyway which would be a while away which gives me some time to fully consider any implications.

FluentValidation only references these ones.

That’s a list of assembly references. I was referring to the list of dependent package references and whether it shows up in there. Within the generated package they’re listed inside the nuspec file inside the dependencies element.

Not sure if this is something you're aware of, but you can explicitly see the synchronized method

Do you know if that’s a Visual Studio feature or if it works in Rider too? (I don’t use windows/VS)

@virzak
Copy link
Contributor Author

virzak commented Aug 8, 2023

You're right. As it is right now, dependencies include it. Not sure about the ramifications but will look into it.

<dependencies>
  <group targetFramework="net5.0">
    <dependency id="Zomp.SyncMethodGenerator" version="1.1.2" exclude="Build,Analyzers" />
  </group>
  <group targetFramework="net6.0">
    <dependency id="Zomp.SyncMethodGenerator" version="1.1.2" exclude="Build,Analyzers" />
  </group>
  <group targetFramework="net7.0">
    <dependency id="Zomp.SyncMethodGenerator" version="1.1.2" exclude="Build,Analyzers" />
  </group>
  <group targetFramework=".NETStandard2.0">
    <dependency id="System.Threading.Tasks.Extensions" version="4.5.2" exclude="Build,Analyzers" />
    <dependency id="Zomp.SyncMethodGenerator" version="1.1.2" exclude="Build,Analyzers" />
  </group>
  <group targetFramework=".NETStandard2.1">
    <dependency id="Zomp.SyncMethodGenerator" version="1.1.2" exclude="Build,Analyzers" />
  </group>
</dependencies>

Rider supports source generators, but seems a touch slower than Visual Studio.

As I typed new code in the async version, the read-only sync version updated after a slight delay.

image

Visual Studio Code does not have that support at this time, but I filed an issue.

@virzak
Copy link
Contributor Author

virzak commented Aug 8, 2023

One other consideration regarding this is: what's the ratio of Validate to ValidateAsync in a usual project? The heavier it is towards Validate, the more impactful this change will be. In my main project only Validate is being used.

@JeremySkinner
Copy link
Member

JeremySkinner commented Aug 8, 2023

My general recommendation is to use ValidateAsync as it ensures that all features are usable, and I consider the synchronous API to be legacy. My ideal would be to eventually deprecate and remove the synchronous overloads entirely, but I'm not sure that would ever actually be possible due to its prevalence (plus asp.net's validation pipeline is still synchronous, hence the recommendation not to use it anymore as per #1959)

@cremor
Copy link
Contributor

cremor commented Aug 11, 2023

You're right. As it is right now, dependencies include it. Not sure about the ramifications but will look into it.

@virzak Zomp.SyncMethodGenerator should be flagged as a developmentDependency. If you then re-add it to FluentValidation VS should create correct attributes in the <PackageReference> tag to not include it as a dependency.

My general recommendation is to use ValidateAsync as it ensures that all features are usable, and I consider the synchronous API to be legacy.

@JeremySkinner This statement, and seeing the sync-over-async code in this PR, surprised me. I always thought that having a validator that is async would be a rare case. Therefore I always preferred the sync Validate methods because I thought that this will prevent running the unnecessary async state machine code. Seems like my assumption was wrong 😓 (And therefore I support this PR 😃 )

From a code style point of view I'd also prefer using sync Validate calls by default. That way I would immediately notice that something "special" happens in a validator if it is called with ValidateAsync.

I've even already thought about how I will handle the case when the Roslyn Analyzer "CA1849: Call async methods when in an async method" is (hopefully) fixed in .NET 8 and will finally warn about async overloads that take a CancellationToken as an optional parameter. My current plan is to suppress those warnings for ValidateAsync because of the reasons listed above.

Do you know if that’s a Visual Studio feature or if it works in Rider too? (I don’t use windows/VS)

With VS you can also use "Go to definition" or "Find all references" on a partial class declaration and it will show the generated code as one of the found results. I don't use Rider, but maybe it does this too?

@JeremySkinner
Copy link
Member

hi @cremor thanks for the feedback

This statement, and seeing the sync-over-async code in this PR, surprised me

Just to be clear, this isn't "sync over async" in the way that this term is usually used. There is no risk of deadlocks, and ValueTask is optimised for synchronous completion. This approach has made the library significantly more maintainable and has eliminated several bugs that came about due to subtle divergences in the previous implementation with separate sync & async methods. From my point of view as the sole person maintaining this library, the use of the state machine is an acceptable tradeoff for the increased maintainability. From a performance perspective, my own analysis doesn't show this to be an issue, and if it had I wouldn't have switched to this approach. Also this technique is used by Microsoft internally and if it's good enough for them it's good enough for me.

The use of source generators to solve this an alternative way is interesting, but at present I have too many concerns with this approach. I will re-evaluate when I get round to working on v12.0 again.

@JeremySkinner
Copy link
Member

hi @virzak I'm revisiting this as I'm starting to think about the 12.0 release again. I've rebased your PR into the 12.x branch, but it doesn't currently build as it looks like the generated code forcibly enables nullable with a #nullable enable. Is there a way to turn this off?

@virzak
Copy link
Contributor Author

virzak commented Feb 12, 2024

Hi @JeremySkinner,

All instances of CreateSyncVersion include OmitNullableDirective. Will be able to do that automatically once dotnet/roslyn#49555 is resolved.

More on that here.
zompinc/sync-method-generator#39

@JeremySkinner JeremySkinner changed the base branch from main to 12.x-dev February 12, 2024 09:11
@JeremySkinner
Copy link
Member

@virzak thanks! Build is passing now, I'll continue to review this as part of the 12.0 development

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants