Skip to content

Check for functions calls in sizeof calculations#1111

Merged
danmar merged 11 commits into
cppcheck-opensource:masterfrom
pfultz2:sizeof-call
Mar 15, 2018
Merged

Check for functions calls in sizeof calculations#1111
danmar merged 11 commits into
cppcheck-opensource:masterfrom
pfultz2:sizeof-call

Conversation

@pfultz2
Copy link
Copy Markdown
Contributor

@pfultz2 pfultz2 commented Mar 9, 2018

A common mistake that I have found in our codebase is calling a function to get an integer or enum that represents the type such as:

int numBytes = numElements * sizeof(x.GetType());

This extends cppcheck for sizeof calculation to check for function calls as well.

Copy link
Copy Markdown
Collaborator

@amai2012 amai2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I want to keep numElements return codes of GetType()? What's wrong with it? I don't see an obvious error...

Even if it would indicate an error one should introduce another message probably.

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 10, 2018

Thanks! Could you tweak the error message so it says something like "found function call within sizeof()". A separate error id would help also. Before this is accepted I want that we test this on real code and see how much noise it generates (I will test it with the daca2 script).

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 11, 2018

Maybe I want to keep numElements return codes of GetType()?

What do you mean?

What's wrong with it? I don't see an obvious error...

Because GetType() returns an enum which represents the type such as float or double. In this case it will compute the number of bytes based on the size of the enum not on the size of float or double(which is the intention of the code). We see a lot of those errors(for example here).

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 11, 2018

A separate error id would help also.

Sure, is there a dev document that explains how to add new ids?

Before this is accepted I want that we test this on real code and see how much noise it generates

I did a simple grep(with grep -E '\bsizeof\([^()"*]+\([^()]*\)') over several codebases(like qtbase, llvm, cocos2d, rethinkdb, redis, and php-src) and found that most things were macros that did not do a function call. If cppcheck expands the macro it shouldn't be a problem. The only other issue was that it is common to use sizeof(f()) to create a type trait. Is there a way in cppcheck to detect if its in a template? Or perhaps the function it refers to is an overloaded function?

I will test it with the daca2 script

What is the daca2 script?

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 12, 2018

Sure, is there a dev document that explains how to add new ids?

No sadly not. But I think your solution is ok. Spontaneously I thought we would still have 1 check function but it could call both sizeofCalculationError() and sizeofFunctionError(). But if the logic is too different it's better that it is splitted.

most things were macros that did not do a function call. If cppcheck expands the macro it shouldn't be a problem. The only other issue was that it is common to use sizeof(f()) to create a type trait. Is there a way in cppcheck to detect if its in a template?

Maybe unexpanded macros could be a problem. It's normal that macro definitions are missing. you might want to ensure that it's really a function (the Token::function() is non-null, or _settings->library has function configuration). EDIT: I see, you use Token::function(), then unexpanded macros are not a problem.

There is no way now to detect if it is in a template. But we have Token::isTemplateArg() so you can check if the function return type depends on a template argument.

What is the daca2 script?

It's like your grep but a more powerful. The script runs latest Cppcheck on all debian source code (thousands of projects, 100+ GB of source code).

The results are shown here:

http://cppcheck.sourceforge.net/devinfo/daca2-report/daca2.html

If you want to look at some particular warnings and have a error id you want to look at you can use this cgi script (this takes several seconds to load):

http://cppcheck.sourceforge.net/cgi-bin/daca2-search.cgi?id=sizeofCalculation

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 12, 2018

I assume you will fix the test failure soonish:

test/testsizeof.cpp:113: Assertion failed.
Expected:
[test.cpp:1]: (warning) Calling 'sizeof' on 'sizeof'.\n

Actual:
[test.cpp:1]: (warning) Calling 'sizeof' on 'sizeof'.\n
[test.cpp:1]: (warning) Found function call inside sizeof().\n

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 12, 2018

I believe the code looks pretty ok. if the test is fixed I will apply this and then the daca2 script can test it.

@amai2012
Copy link
Copy Markdown
Collaborator

Maybe I want to keep numElements return codes of GetType()?

What do you mean?

What's wrong with it? I don't see an obvious error...
Because GetType() returns an enum which represents the type such as float or double. In this case it will compute the number of bytes based on the size of the enum not on the size of float or double(which is the intention of the code). We see a lot of those errors(for example here).

Would that change trigger a warning on

#include <cstdlib>
short foo() {
  return 1;
}
int main() {
  const int NUM=42;
  short* buffer = (short*)malloc(NUM*sizeof foo() );
  for (int i=0; i<NUM; ++i)
  {
     buffer[i] = foo();
  }
  return 0;
}

? That would be a false positive IMHO.

Even the small example from the PR

int foo() { return 1; }; int a,sizeof(sizeof(foo()))

is a message I don't want to see at all.

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 12, 2018

Would that change trigger a warning on

You have of course simplified your code. But it looks weird to me; why not use sizeof(short) instead in such cases? Are you talking about generic code (expanded macro/template) that ends up in this? Or do you see sizeof(foo()) in normal code?

Even the small example from the PR

Yes as far as I see in the code, the intention seems to be to only report the "Calling 'sizeof' on 'sizeof'." warning and not warn about the function call.

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 12, 2018

Would that change trigger a warning on

Yes it will, but grepping over qtbase, llvm, cocos2d, rethinkdb, redis, and php-src, I have only seen two cases for this. So its not very common.

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 12, 2018

Yes as far as I see in the code, the intention seems to be to only report the "Calling 'sizeof' on 'sizeof'." warning and not warn about the function call.

Actually that's not really the case. The code bails on sizeof because Token::match will match the inner sizeof. I have updated the codebase and removed the faulty test case.

I also found a problem with code like this:

template<class T>
struct A
{
    static B f(const B &);
    static A f(const A &);
    static A &g();
    static T &h();

    enum {
        X = sizeof(f(g() >> h())) == sizeof(A),
        Y = sizeof(f(g() << h())) == sizeof(A),
        Z = X & Y
    };
};

It seems like cppcheck cant resolve the function f. And then the code moves to g which is not overloaded, so it then reports an error. For now, I have it stop at the first eName found, so it won't report an error for this case.

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 13, 2018

It seems like cppcheck cant resolve the function f.

Hmm it is not conclusive how f should be resolved as far as I see. The type of the >> and << depends on how those are overloaded. However it is the convention that the result type for bit shifts is the same as the LHS type.

And then the code moves to g which is not overloaded, so it then reports an error. For now, I have it stop at the first eName found, so it won't report an error for this case.

It is a bit weird implementation. it is preferable to look at the ast tree instead of walking the token list. I would suggest that you only look at the astOperand2() for the "(" token.

 if (Token::simpleMatch(tok, "sizeof ("))
     const Token *checkToken = tok->next()->astOperand2();

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 13, 2018

Unfortunately, our ValueType handling has gaps, it does not always succeed. If you see some problems that is very interesting and should be fixed if possible. There are many checkers that rely on ValueType.

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 13, 2018

It is a bit weird implementation. it is preferable to look at the ast tree instead of walking the token list.

How do you walk the ast? I didn't see any ast matchers.

I would suggest that you only look at the astOperand2() for the "(" token.

But I don't know if this would catch the x.foo() case?

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 13, 2018

Unfortunately, our ValueType handling has gaps, it does not always succeed. If you see some problems that is very interesting and should be fixed if possible. There are many checkers that rely on ValueType.

Is this meant for PR #1114?

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 13, 2018

How do you walk the ast? I didn't see any ast matchers.

We don't have it. I would like to have it someday.

But I don't know if this would catch the x.foo() case?

It would.

It would not catch sizeof(1+foo()) but then you have a "calculation in sizeof()" warning instead.

Copy link
Copy Markdown
Collaborator

@danmar danmar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more stylistic nits

Comment thread test/testsizeof.cpp Outdated
"{\n"
" int bar() { return 1; };\n"
"}\n"
"Foo f;int a,sizeof(f.bar())");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test code is weird. it is not valid syntax. how about replacing int a,sizeof(f.bar()) with int a=sizeof(f.bar());.

Comment thread test/testsizeof.cpp

check("#define foo() int\n"
"int a,sizeof(foo())");
ASSERT_EQUALS("", errout.str());
Copy link
Copy Markdown
Collaborator

@danmar danmar Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not technically correct. The testcases are not preprocessed. so I believe that during testing the checker would see # define foo ( ) int .... But when you actually run it on this code, it will not see the define at all. Then the checker will just see sizeof ( int ).

Imho you can just remove the first line and assert that the checker will not warn when a sizeof(foo()) is seen and it does not know what foo is.

Comment thread test/testsizeof.cpp

check("#define M(x) sizeof(x)\n"
"int foo() { return 1; }; int a,M(foo())");
ASSERT_EQUALS("", errout.str());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not properly tested because we don't run the preprocessor in the tests. you can write a test code and indicate that certain tokens are expanded macro(s) using the special character $.

int a = $sizeof $( foo() $) ;

that would mean that the Token::isExpandedMacro() would be true for the outer "sizeof (" and ")" tokens. And it would be false for foo().

Comment thread lib/checksizeof.cpp Outdated
void CheckSizeof::sizeofFunctionError(const Token *tok, bool inconclusive)
{
reportError(tok, Severity::warning,
"sizeofFunction", "Found function call inside sizeof().", CWE682, inconclusive);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the id sizeofFunctionCall would be a bit better

Comment thread lib/checksizeof.cpp
}

for (const Token *argument = tok->next()->astOperand2(); argument; argument = argument->astOperand2()) {
const Token *checkToken = argument->previous();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you have a loop, is there a test case where it is not enough to only look at the first token?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when doing sizeof(x.foo()), but let me double check.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can rewrite that as ’sizeof((x.foo)())’. Then you can intuitively see that the operands for ’.’ are x and foo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I was doing something else wrong. I updated to just check the first token.

Comment thread lib/checksizeof.cpp Outdated
}
}

void CheckSizeof::sizeofFunctionError(const Token *tok, bool inconclusive)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inconclusive argument can be removed, it's always false. Unless you plan to start using it someday.

@amai2012
Copy link
Copy Markdown
Collaborator

Would that change trigger a warning on

You have of course simplified your code. But it looks weird to me; why not use sizeof(short) instead in such cases? Are you talking about generic code (expanded macro/template) that ends up in this? Or do you see sizeof(foo()) in normal code?

I think that quite reasonable generic code could be written that way.
A quick scan on my disc however reveals no examples except in boost.

Even the small example from the PR

Yes as far as I see in the code, the intention seems to be to only report the "Calling 'sizeof' on 'sizeof'." warning and not warn about the function call.

Yes, that intention I would support.

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 15, 2018

I think that quite reasonable generic code could be written that way.
A quick scan on my disc however reveals no examples except in boost.

A common case is to call an overloaded function in order to build a type trait. However, I avoid this case by making sure the function is not overloaded.

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 15, 2018

I think that quite reasonable generic code could be written that way.

A generic code example (invented, not real code):

template<class T>
struct Foo {
    auto foo1(T t) -> decltype(t + 1L) { return t + 1L; }
    void foo2() { int x = sizeof(foo1()); }
};
Foo<int> intfoo;
Foo<float> floatfoo;

I doubt that ValueType determines the foo1 return type properly now. But the checker should not warn for such code.

Well I think that when daca2 is running, we will see lots of real test cases.

@danmar danmar merged commit 166e4ca into cppcheck-opensource:master Mar 15, 2018
@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 15, 2018

I merged this. Now daca2 will be running this checker. Results from this checker will be available here:

http://cppcheck.sourceforge.net/cgi-bin/daca2-search.cgi?id=sizeofFunctionCall

The first results could come in a few hours.

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 15, 2018

I merged this. Now daca2 will be running this checker. Results from this checker will be available here:

How long does this take? I checked and there still aren't any results.

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 15, 2018

I would guess that tomorrow evening it has run through all the code.

We can already see that this is not noisy.

It is not a problem for me if it doesn't catch many bugs.

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Mar 18, 2018

Still no results. Weird. One explanation is that in the daca2 script, no include paths are set and therefore the function pointer will often be NULL. Maybe we could start trying to set the include paths.

@pfultz2
Copy link
Copy Markdown
Contributor Author

pfultz2 commented Mar 18, 2018

I have gotten results from some codebases without using include. Maybe those codebases are not part of daca2. I am not sure why there isn't any results.

@pfultz2 pfultz2 deleted the sizeof-call branch May 19, 2019 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants