Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Tips and tricks for performance #25259

Closed
Evangelink opened this issue Mar 6, 2018 · 20 comments
Closed

[Question] Tips and tricks for performance #25259

Evangelink opened this issue Mar 6, 2018 · 20 comments
Labels

Comments

@Evangelink
Copy link
Member

Hi folks,

We have noticed we have quite a lot of performance issues with our analyzers but we cannot find any way to have a correct measurement/profiling of the analyzers. I know that there is some project for bench-marking them that is in progress but in the meantime I am wondering if you guys would have some tips on things to avoid or things to try to enforce when developing analyzer so that we avoid some big performance pitfall.

For example when checking for a type, is it recommended to check the string value before getting to the symbol? Shall we avoid to use FirstAncestorOrSelf<T> or ChildNode().Where or DescendantNodes().Where...

Cheers

@Evangelink
Copy link
Member Author

As for now I only found the following:

  • Moving from ImmutableHashSet to HashSet to store members to check resulted in a pretty big performance improvement.
  • Using Lazy<T> instead of processing the result of GetHashCode multiple times also improved performances.

@CyrusNajmabadi
Copy link
Member

can you just run a .net profiler to see what's expensive in your scenarios? I'm personally a fan of DotPeek.

@jmarolf
Copy link
Contributor

jmarolf commented Mar 26, 2018

Analyzer Performance

How to profile

You can look at StyleCopTester which is a program written for testing performance in the style cop analyzers project.

You can also use a profiler over Visual Studio as @CyrusNajmabadi suggested

General Tips

Enable Concurrent Execution

There is an API newer versions of roslyn that allows you to do analysis in parallel. You need to set the method EnableConcurrentExecution on the context object to let the analysis engine know that it is ok to run you in a parallel fashion.

Syntax over Semantics

The syntax APIs are far less expensive than the semantic apis. This means that if you can do all or most of your analysis on syntax nodes things will be fast. Most of our analyzers have the pattern of doing quick syntactic checks so we bail out early before doing more expensive semanic work.

Prefer Stateless Analyzers over Stateful Analyzers

If you register a CompilationStartAction and then get called back on your CompilationAction you are allowed to collect stateful information like synbols etc in-between. The problem is that this is also the slowest way to gather information. CompilationAction's are expected to be very expensive so they are called infrquently in the IDE. If you have a non-stateful way of analyzing the code, things will be faster. Below is the general cadence for when analyzers are called based on what they have registered:

Very Fast, only parsing needs to be done

  • SyntaxNodeAction
  • SyntaxTreeAction

Not as Fast, some binding will need to be done

  • SymbolAction
  • SemanticModelAction
  • OperationAction
  • CodeBlockAction
  • OperationBlockAction

Slow, everything needs to be done

  • CompilationAction

@SamPruden
Copy link

@jmarolf that information is great, but horribly buried here in an obscure issue. Is it available in any other documentation anywhere? It really should be, but I've never found it.

I have a related question. I've seen vague references to the operation API being faster than symbols because it somehow requires less binding, but never found a good resource that discusses this in any detail. Is that true, and if so, in what way? The operations seem to mostly just provide references to symbols, so I don't understand how they can really be an alternative.

I prefer to use IOperation whenever I can anyway because the API is nicer, but am I potentially missing out on any performance tricks where I should be doing specific things to avoid referencing symbols or something?

@jmarolf
Copy link
Contributor

jmarolf commented Sep 25, 2019

@SamPruden IOperation will be slower than syntax operations. If you know you are going to analyze symbols IOperation is perfactly fine to use.

@jmarolf jmarolf added Question Resolution-Answered The question has been answered labels Sep 25, 2019
@jmarolf jmarolf closed this as completed Sep 25, 2019
@jmarolf jmarolf removed this from the Backlog milestone Sep 25, 2019
@SamPruden
Copy link

SamPruden commented Sep 25, 2019

@SamPruden IOperation will be slower than syntax operations. If you know you are going to analyze symbols IOperation is perfactly fine to use.

I wasn't comparing to syntax, which is obviously the fastest. I've seen it suggested that IOperation APIs can be significantly faster than the symbol APIs. Is that not true? It seemed a little bit implausible to me as operations appear to be wrappers around symbols, but I've heard a number of people suggest it and there doesn't seem to be clear documentation on these things.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Sep 25, 2019

I've seen it suggested that IOperation APIs can be significantly faster than the symbol APIs

but I've heard a number of people suggest it and there doesn't seem to be clear documentation on these things.

Who has suggested it? Please point out hte places where this has been stated so we can investigate.

@SamPruden
Copy link

I've come across a few passing references when googling discussions around Roslyn analyzer performance. These two are some I can remember:

roslyn-analyzers/1075

There may be performance gains with using IOperation APIs over syntax / semantic APIs.

roslyn-analyzers/2103

This analyzer currently invokes GetSymbolInfo on every single invocation in the compilation, which seems to very costly. We should just convert it to an IOperation analyzer as it just cares about the containing type of every bound invocation.

Without being intimately familiar with the implementation details of these things, it's hard to judge whether these imply that there are some specific guidelines we should be following for performance reasons.

It's not clear to me which (if any) APIs trigger additional expensive binding operations or other forms of lazy loading. The text of 2103 seems to imply that less binding is required when using the IOperation API. I've not seen clear documentation on how iterative update of the semantic model works in VS, so it's hard to judge how costly things are.

@jmarolf
Copy link
Contributor

jmarolf commented Sep 25, 2019

@CyrusNajmabadi I believe that comment was here.

@SamPruden
IOperation and the symbol APIs are sufficiently different that I don't thinks its fair to say that one is faster than the other. We need to bind to give you the IOperation nodes. There may be cases where using IOperation is faster than if you are calling GetSymbol many times and forcing us to bind many times. However there are certainly cases where use creating the IOperation nodes is going to be more expensive that getting a single SymbolInfo

@CyrusNajmabadi
Copy link
Member

I've come across a few passing references when googling discussions around Roslyn analyzer performance. These two are some I can remember:

Looking at one case, this appears to be algorithmic. i.e. in the original code it needed to do this:

                while (cur.Parent != null)
                {
                    SyntaxNode pNode = cur.Parent;
                    ISymbol sym = pNode.GetDeclaredOrReferencedSymbol(model);

meaning that not only did it hit every node and ask it for the symbol, it woudl then walk up the entire node chain above it. That's minimum n^2 (or n-log n) to try to look at the 'tree of information'. The point here is that IOp already gives you the tree. so instead of having to go figure it out again, you can just use it in its entirety.

So, if you need to just look at a node and its possible meaning, do so. If you need to look at a whole lot of surrounding nodes around a node to figure out what's going on, might as well just use IOp since it literally just gives that to you :)

@SamPruden
Copy link

I believe that comment was here.

I forgot about that, but I'd seen that too, yes.

Thanks for that info, that's helpful. I'm particularly curious whether accessing the symbols provided by operations has any significant cost.

As an example, I have an OperationVisitor called for every operation in blockContext.OperationBlocks[i].Descendants() with a method that checks some things about uses of overloaded operators that looks something like this:

public override void VisitBinaryOperator(IBinaryOperation operation)
{
    var method = operation.OperatorMethod;

    // Skip built in operators
    if (method == null) return;

    // Actual analysis happens here
}

I know that visiting every operation in the block probably seems crazy, but trust me, I do need to do that.

Might reading operation.OperatorMethod have any significant cost in this scenario? I'm wondering whether I should be falling back to operation.Syntax to do some basic early out checks before accessing the symbols. I could for example immediately give up if the syntax refers to an operator that is not overloadable.

This general pattern is one that comes up with many different operations throughout my analyzers, so I'm curious about general tips rather than just this specific example.

@CyrusNajmabadi
Copy link
Member

Other explanations in dotnet/roslyn-analyzers#2103

Seems like CA3075 is doing extremely large number of invocations of context.Compilation.GetSemanticModel, GetSymbolInfo and GetDeclaredOrReferencedSymbol helper. All of these have extremely bad performance implications.

The primary concern here is htat semantic models are our caching mechanism. And the analyzer infrastructure for node and operation analysis is already written to only look at one at a time. If, durin work, an analyzer goes and makes other semantic models, this is not good for perf. That's because it's basically causing all the information to be recreated for that other semantic model, only to throw it away afterwards when it returns back to the analysis engine and before the analysis engine calls into the next node. Doing this over and over again defeats the purpose of our analysis passes where we attempt to only create a semantic model once for a file and then use the same semantic model for all the analyzers that want to examine that file. This approach allows all the information computed by any analyzer to be cached and reused for other analyzers.

Note: this benefit applies to analyzers doing GetSymbolInfo or doing GetOperation. It is primarily about ensuring that we're only creating one semanticmodel for doc and not creating unnecessary ones over and over again.

@CyrusNajmabadi
Copy link
Member

Thanks for that info, that's helpful. I'm particularly curious whether accessing the symbols provided by operations has any significant cost.

It does not.

@CyrusNajmabadi
Copy link
Member

I know that visiting every operation in the block probably seems crazy

It's not crazy. All this will do is allocate some memory as the operation-tree is hydrated out of the internal compiler 'bound nodes'.

@CyrusNajmabadi
Copy link
Member

I'm wondering whether I should be falling back to operation.Syntax to do some basic early out checks before accessing the symbols

Yes. If you can avoid semantic work based on easy-to-check things in syntax, that's almost always a good idea.

@CyrusNajmabadi
Copy link
Member

I could for example immediately give up if the syntax refers to an operator that is not overloadable.

Yes. If you can algorithmically do less work, then do less work :)

@CyrusNajmabadi
Copy link
Member

This general pattern is one that comes up with many different operations throughout my analyzers, so I'm curious about general tips rather than just this specific example.

General tips:

  1. try to be syntax based entirely if possible.
  2. if you need syntax and semantics do as much syntax checking as possible, then move to semantics.
  3. if you can do easy checks that will cause you to bail out, front load them before doing more complex algorithmic work.
  4. optimize for the common case. if the common case is that nearly all code you check won't trigger your diagnostic, then check and bail. In other words, only once you are pretty confident the code needs to be checked and needs to incore a more algorithmically expensive walk, then do that.

In general, checking syntax local to you is nearly free. If you do need to walk, know that trees can be arbitrarily large though.

In general, doing a few local semantic checks is not that expensive. If you do need to walk though, this will scale up accordingly.

In general, thins are affected by the number of calls you do, not the types of check you're doing. So if you need to make 1000 checks because of the size of the tree, that's more expensive than a handful of checks done locally. As you scale to more and more checks the worse your system will do.

@SamPruden
Copy link

Thank you for all of the info!

Thanks for that info, that's helpful. I'm particularly curious whether accessing the symbols provided by operations has any significant cost.

It does not.

I'm wondering whether I should be falling back to operation.Syntax to do some basic early out checks before accessing the symbols

Yes. If you can avoid semantic work based on easy-to-check things in syntax, that's almost always a good idea.

You've got me a little bit confused here. If accessing operation.OperatorMethod doesn't do any work, why is it less work to access operation.Syntax and do that check than to just null check operation.OperatorMethod? What I'm still unclear on is whether falling back to syntax checks still makes sense when I've already got the IOperation.

In my particular case I do need to be at the operation level for everything, because I unavoidably access and assess operation.Type for every single operation in the block. I definitely can't be syntax based for everything.

@CyrusNajmabadi
Copy link
Member

You've got me a little bit confused here. If accessing operation.OperatorMethod doesn't do any work, why is it less work to access operation.Syntax

Sorry, i read it as "I'm wondering whether I should be falling back to operation.Syntax"

@CyrusNajmabadi
Copy link
Member

What I'm still unclear on is whether falling back to syntax checks still makes sense when I've already got the IOperation.

Depends. If your check is a syntactic check, then certainly use syntax for it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants