Fix #12210 (Cppcheck hang in SymbolDatabase::createSymbolDatabaseExprIds)#5699
Conversation
|
hmm... for the 12210 code this is significantly faster. For 11885 I see a slowdown however its ValueFlow that gets slower.. I am investigating.. |
|
I think it would be first better to assign an exprid to all expressions and build a dataflow-like graph to then apply CSE to combine duplicate expressions. |
Are you saying that would speedup this even further or that the analysis would be better? I do not understand what you mean with "dataflow-like" graph. But I fear some CSE that uses isSameExpression again will be slower. Would it be possible to make some proof of concept that we can use to test the performance difference? |
Well mainly speedup from the original implementation, and the analysis should be about the same. I dont know if it would be faster than whats done here(but perhaps the analysis is not as thorough though).
In the compiler the SSA graph lets you find everywhere a variable is used. In this case we would have a map from exprids to where its used. As we combine exprids we update this map to find new usages.
Well
Yes I will try to create a proof of concept. |
|
So we probably need some validation to make sure we aren't missing same expressions. Traversing up the graph to find same exprids does massively reduce down the number comparisons especially for large functions. However, trying to short-circuit I should probably add some tests for these different cases I found. The initial prototype I did here does show significant improvement. 11885 is about 2x faster doing a 1/6th of Currently we are setting The I'll try to do some cleanup on my prototype. |
|
@pfultz2 Thanks! I am starting to feel ready to merge this. I have tested it with test-my-pr.py for couple of days and there are good results. Not sure why but some false negatives are fixed. But if you are working on an alternative approach I will wait a little..
I don't know if it's interesting but I have added a macroName attribute to Token so we can determine which macro was expanded. I have the feeling the isSameExpression could use that attribute to improve the results. For example: If we have a condition |
|
So I created some unit tests to check the previous behavior. There are some errors with following references with this PR(but I have the same issues in my prototype). This is also significantly faster than my prototype. So its probably better to merge this PR in, maybe after #5722 is merged. It would still be good to have a validation function that checks if there were any same expressions we missed by applying the older O(n^2) algorithm.
I dont know if that really matters for valueflow analysis. We would want to identify same expression across multiple macros of the same name. The checkers can do such constraints, but we would still want to find null pointer references or unintialized memory inside macros. |
|
FYI this greatly reduced the Ir count in our callgrind CI job: |
No description provided.