Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax classification taking up to 12% of CPU time #68996

Closed
Youssef1313 opened this issue Jul 12, 2023 · 19 comments
Closed

Syntax classification taking up to 12% of CPU time #68996

Youssef1313 opened this issue Jul 12, 2023 · 19 comments
Assignees
Labels
Area-IDE Tenet-Performance Regression in measured performance of the product from goals.
Milestone

Comments

@Youssef1313
Copy link
Member

image

I'm not sure if addressing this comment would improve it.

// TODO: we could construct the result's ConstructedFrom lazily by using a "deep"
// construct operation here (as VB does), thereby avoiding alpha renaming in most cases.
// Aleksey has shown that would reduce GC pressure if substitutions of deeply nested generics are common.

I'll try to upload the trace soon.

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Jul 12, 2023
@Youssef1313
Copy link
Member Author

Aside from CPU time, there are lots of allocations of TypeWithAnnotations[] in AbstractTypeMap.SubstituteNamedType. That is 6.5% of allocations.

image

@Youssef1313
Copy link
Member Author

@arkalyanms arkalyanms added Tenet-Performance Regression in measured performance of the product from goals. and removed untriaged Issues and PRs which have not yet been triaged by a lead labels Aug 2, 2023
@arkalyanms arkalyanms added this to the 17.8 P4 milestone Aug 2, 2023
ToddGrun added a commit to ToddGrun/roslyn that referenced this issue Mar 18, 2024
Partial fix for dotnet#68996

Both SubstituteTypesWithoutModifiers and SubstituteNamedTypes allocate a temporary array that is used to potentially return an immutable array. Previously, this immutable array was allocated if needed, whereas we can not perform the extra allocation by utilizing ImmutableCollectionsMarshal as the array being wrapped is local to this method.
ToddGrun added a commit that referenced this issue Mar 19, 2024
Partial fix for #68996

Both SubstituteTypesWithoutModifiers and SubstituteNamedTypes allocate a temporary array that is used to potentially return an immutable array. Previously, this immutable array was allocated if needed, whereas we can not perform the extra allocation by utilizing ImmutableCollectionsMarshal as the array being wrapped is local to this method.
@ToddGrun
Copy link
Contributor

Allocations under AbstractTypeMap improved by #72588 . CPU characteristics from classification have improved significantly (due to merging classifiers and reducing frequency invalidated) since 17.8. If you are able to gather a profile against Roslyn with #72588, or against stock VS >= 17.10 Preview 3, that still shows this as an area of concern, I will gladly dig into it more to see if I can find other improvements. Thanks!

@Youssef1313
Copy link
Member Author

Youssef1313 commented Apr 5, 2024

@ToddGrun It's still super significant in my case (using 17.10 P1).

image

9.5% (over 1GB) of TypeWithAnnotations[] is too much.

@ToddGrun
Copy link
Contributor

ToddGrun commented Apr 5, 2024

@Youssef1313 -- Any chance you can share your profile so I can dig into it a bit?

@Youssef1313
Copy link
Member Author

Repro project:

UnoSampleSolution.zip

Scenario: Typing the following:

image

I'm uploading the trace. But upload speed is too slow here for me.

@Youssef1313
Copy link
Member Author

This is also the same repro that was used for #67926

@Youssef1313
Copy link
Member Author

@ToddGrun
Copy link
Contributor

ToddGrun commented Apr 5, 2024

@Youssef1313 -- I've tried reproducing your scenario with your project and I don't see nearly the allocations that you are hitting. I'm not sure if it's because the project isn't fully working for me, as I don't have a couple of the net7.0 tfms used installed or whether it's because I can't resolve a couple of the Uno.* packages. I'll keep looking, but I'm not seeing anything obvious from my debugging or looking at your trace about how to improve this.

mentioned @CyrusNajmabadi for visibility

@Youssef1313
Copy link
Member Author

@ToddGrun I think the issue should be apparent/reproducible if you got the project to build properly. Let me know if there is anything I can help with so you can repro.

@Youssef1313
Copy link
Member Author

From the trace, I'm seeing DiagnosticInfo.GetHashCode showing up taking 6.7% of CPU time. I don't think the issue is that GetHashCode itself is slow, but it's probably that it's called excessively.

My understanding is that they are created while checking whether an overload is applicable or not, and likely non-applicable overloads produce diagnostics. But, I think those diagnostics are thrown away in the end anyway, and this is probably wasted time. So it feels like some work can be saved, but I don't really see exactly what should be done.

There is also #59733 which may help with the scenario above, but seems like the PR got stale :'(

@CyrusNajmabadi
Copy link
Member

i think this is being approached from the wrong direction. Looking at teh code, there's a lambda involved, and completion is Exceedingly slow. My guess is that there's something about thsi lambda and lambda binding in general, making this rough (perhaps some sort of large overload resolution case).

@Youssef1313
Copy link
Member Author

@CyrusNajmabadi Yeah we have multiple overloads for the extension methods and overload resolution definitely has to do more work. Still, I think it shouldn't be slow to that extent. I'm not sure how to best approach the issue here.

@CyrusNajmabadi
Copy link
Member

Generally speaking, extension methods, with complex lambdas, can explode very quickly into a perf morass. even if we do any work here, it's likely that even trivial changes you make to your api could increase costs by an order of magnitude or more. So, generally speaking, i'd advise restructuring the API.

@Youssef1313
Copy link
Member Author

@CyrusNajmabadi Well, the API already shipped, but I think we can consider changes for the next major release. However, I think we'll want API changes to still be source-compatible (will be okay to only break binary compatibility I guess). I don't see though how the modified API should look like.

#68996 (comment) is also still concerning as it feels like unnecessary work being done, but not entirely sure if my concern is right, and how much improvement it will bring in practice.

@CyrusNajmabadi
Copy link
Member

Right, my point is that that is happenign due to combinatorial explosion a not-uncommon problem with complex overload resolution scenarios and lambdas.

While we could attempt to squeeze down some of these scenarios, your performance will be dominated by this. And even a single other nesting, or overload, etc. will likely introduce more orders of magnitude perf cliffs.

It's largely because of this problem-space that we added the ability for lambdas to state both their argument type and return type. And yes, the recommendatoin would be for users of the API to supply that to ensure the compiler doesn't have to go through exponential computations to figure that stuff out.

@ToddGrun
Copy link
Contributor

ToddGrun commented Apr 8, 2024

@cston from compiler who has more context on this area

@CyrusNajmabadi
Copy link
Member

However, I think we'll want API changes to still be source-compatible

Sure. But if you are using patterns that really exacerbate things, then you might need to tell your customers that their perf experience will be greatly improved if they explicitly specify the arg-types and return-types of their lambdas.

I believe (and @cston may confirm) that we may even aggressively push people to this at the compiler level for when they start running into exponential blowup.

@cston
Copy link
Member

cston commented Apr 15, 2024

We are adding an info severity diagnostic reported for lambda expression bodies that are bound many times, typically in cases where nested lambda expressions are used as arguments to overloaded or generic method calls and where the lambda parameter types are inferred: see #72823.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE Tenet-Performance Regression in measured performance of the product from goals.
Projects
None yet
Development

No branches or pull requests

5 participants