Skip to content

Fix #12210 (Cppcheck hang in SymbolDatabase::createSymbolDatabaseExprIds)#5699

Merged
danmar merged 7 commits intocppcheck-opensource:mainfrom
cppchecksolutions:fix-12210
Dec 5, 2023
Merged

Fix #12210 (Cppcheck hang in SymbolDatabase::createSymbolDatabaseExprIds)#5699
danmar merged 7 commits intocppcheck-opensource:mainfrom
cppchecksolutions:fix-12210

Conversation

@danmar
Copy link
Copy Markdown
Collaborator

@danmar danmar commented Nov 24, 2023

No description provided.

Comment thread lib/symboldatabase.cpp Outdated
Comment thread lib/symboldatabase.cpp Outdated
@danmar
Copy link
Copy Markdown
Collaborator Author

danmar commented Nov 25, 2023

hmm... for the 12210 code this is significantly faster. For 11885 I see a slowdown however its ValueFlow that gets slower.. I am investigating..

Comment thread lib/symboldatabase.cpp Outdated
Comment thread lib/symboldatabase.cpp Outdated
Comment thread lib/symboldatabase.cpp
@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Nov 25, 2023

I think it would be first better to assign an exprid to all expressions and build a dataflow-like graph to then apply CSE to combine duplicate expressions.

@danmar
Copy link
Copy Markdown
Collaborator Author

danmar commented Nov 26, 2023

I think it would be first better to assign an exprid to all expressions and build a dataflow-like graph to then apply CSE to combine duplicate expressions.

Are you saying that would speedup this even further or that the analysis would be better?

I do not understand what you mean with "dataflow-like" graph. But I fear some CSE that uses isSameExpression again will be slower.

Would it be possible to make some proof of concept that we can use to test the performance difference?

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Nov 26, 2023

Are you saying that would speedup this even further or that the analysis would be better?

Well mainly speedup from the original implementation, and the analysis should be about the same. I dont know if it would be faster than whats done here(but perhaps the analysis is not as thorough though).

I do not understand what you mean with "dataflow-like" graph.

In the compiler the SSA graph lets you find everywhere a variable is used. In this case we would have a map from exprids to where its used. As we combine exprids we update this map to find new usages.

But I fear some CSE that uses isSameExpression again will be slower.

Well isSameExpression class would be much less since it would only be used on likely same expressions. However, we can probably write a special isSameExpression that short-circuits so it is much faster. The problem with the current isSameExpression is that it will traverse all the way down and we usually only need traverse down one step(except for literals and function names) since we would be processing up the tree.

Would it be possible to make some proof of concept that we can use to test the performance difference?

Yes I will try to create a proof of concept.

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Nov 30, 2023

So we probably need some validation to make sure we aren't missing same expressions. Traversing up the graph to find same exprids does massively reduce down the number comparisons especially for large functions. However, trying to short-circuit isSameExpression will miss finding same expressions for things like (b-(a))-(a) or (a)-(b-(a)), so more work is needed to get that to work.

I should probably add some tests for these different cases I found.

The initial prototype I did here does show significant improvement. 11885 is about 2x faster doing a 1/6th of isSameExpression calls. For 12210, is way faster because it does no isSameExpression calls because its all inside of expanded macros.

Currently we are setting macro=true for isSameExpression so for 12210 it was always returning false. On my updated branch I skip over tokens that have expanded macros since it will be false.

The exprId is important for valueflow analysis so it would be good to find issues inside of macros. Probably for normal analysis we can skip macros for large functions, but keep it for exhaustive mode.

I'll try to do some cleanup on my prototype.

@danmar
Copy link
Copy Markdown
Collaborator Author

danmar commented Dec 1, 2023

@pfultz2 Thanks! I am starting to feel ready to merge this. I have tested it with test-my-pr.py for couple of days and there are good results. Not sure why but some false negatives are fixed. But if you are working on an alternative approach I will wait a little..

The exprId is important for valueflow analysis so it would be good to find issues inside of macros.

I don't know if it's interesting but I have added a macroName attribute to Token so we can determine which macro was expanded. I have the feeling the isSameExpression could use that attribute to improve the results. For example: If we have a condition (X==X) and we don't warn because X is a macro then using the macroName we can determine that lhs and rhs are exactly the same.

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Dec 3, 2023

So I created some unit tests to check the previous behavior. There are some errors with following references with this PR(but I have the same issues in my prototype). This is also significantly faster than my prototype. So its probably better to merge this PR in, maybe after #5722 is merged.

It would still be good to have a validation function that checks if there were any same expressions we missed by applying the older O(n^2) algorithm.

If we have a condition (X==X) and we don't warn because X is a macro then using the macroName we can determine that lhs and rhs are exactly the same.

I dont know if that really matters for valueflow analysis. We would want to identify same expression across multiple macros of the same name. The checkers can do such constraints, but we would still want to find null pointer references or unintialized memory inside macros.

@danmar danmar merged commit 70745b5 into cppcheck-opensource:main Dec 5, 2023
@danmar danmar deleted the fix-12210 branch December 5, 2023 13:25
@firewave
Copy link
Copy Markdown
Collaborator

FYI this greatly reduced the Ir count in our callgrind CI job: 72,026,952,737 -> 59,372,614,043. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants