Skip to content

Fix #10412 FN useStlAlgorithm with iterators#4157

Merged
danmar merged 53 commits intocppcheck-opensource:mainfrom
chrchr-github:chr_Fix10412
Oct 16, 2022
Merged

Fix #10412 FN useStlAlgorithm with iterators#4157
danmar merged 53 commits intocppcheck-opensource:mainfrom
chrchr-github:chr_Fix10412

Conversation

@chrchr-github
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread lib/checkstl.cpp Outdated
if (!Token::Match(loopVar, "%var%") || !loopVar->valueType() || loopVar->valueType()->type != ValueType::Type::ITERATOR)
continue;
const Token* initAssign = splitTok->astOperand1();
if (!Token::simpleMatch(initAssign, "=") || !Token::Match(initAssign->astOperand1(), "%var%", loopVar->varId()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%var% => %varid%

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should add an internal check for that - see https://trac.cppcheck.net/ticket/10987.

@firewave
Copy link
Copy Markdown
Collaborator

Sorry for going off-topic but we really need to take a strong look at this check. We currently do not apply these findings to our code and if we are not able to dogfood it we cannot expect others to use it.

IIRC correctly there are various issues (these also apply to readability-use-anyofallof and related checks from clang-tidy):

  • the readability might be worse
  • a conversion is not straight forward or not even possible at all
  • it might actually perform worse (unfortunately several C++ implementations are slower than the raw or C ones)

@chrchr-github
Copy link
Copy Markdown
Collaborator Author

chrchr-github commented May 31, 2022

Sorry for going off-topic but we really need to take a strong look at this check. We currently do not apply these findings to our code and if we are not able to dogfood it we cannot expect others to use it.

Then maybe we should enable it? There are some pretty verbose and ugly loops in our code base...

IIRC correctly there are various issues (these also apply to readability-use-anyofallof and related checks from clang-tidy):

  • the readability might be worse

I agree, especially in relation to std::transform.

  • a conversion is not straight forward or not even possible at all

Then that's a FP.

  • it might actually perform worse (unfortunately several C++ implementations are slower than the raw or C ones)

This could also be called an optimization/implementation bug.

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented May 31, 2022

We currently do not apply these findings to our code and if we are not able to dogfood it we cannot expect others to use it.

We should. I am not sure the reason why. Perhaps we need generic lambdas or something?

I use it on my code bases and it works quite well. And I have had emails from users that found it quite useful.
It helped to improve readability and even found some bugs.

these also apply to readability-use-anyofallof and related checks from clang-tidy

readability-use-anyofallof just checks for escape and non-mutation, which can find loops that might not be straighforward to transform.

the readability might be worse

It is confusing when there is transform_reduce, since older C++ versions does not provide this, and technically you can use std::accumulate.

However, overall it does improve readability, and you can fix the other issues by providing your own transform_reduce.

a conversion is not straight forward or not even possible at all

That maybe applicable to clang-tidy, but I dont that is the case here. Plus, I haven't seen an issue like this either.

Copy link
Copy Markdown
Contributor

@pfultz2 pfultz2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add comment about this in the releas-notes.txt.

@firewave
Copy link
Copy Markdown
Collaborator

firewave commented May 31, 2022

Then maybe we should enable it? There are some pretty verbose and ugly loops in our code base...

+1
We should also enable the clang-tidy checks and take a look at them. And profile the version with the changes to see if there is some impact at all. After the past few months I don't trust C++ (convenience) anymore...

I agree, especially in relation to std::transform.

Maybe we should drop these or make them "inconclusive"?

Then that's a FP.

From experience (and faint memory) the findings make sense but trying to fix some of them will drive you mad or the code is simply not the same or it just doesn't work. That could just be the ugly std::transform ones though...or ones from clang-tidy - it's been a while...

  • it might actually perform worse (unfortunately several C++ implementations are slower than the raw or C ones)

This could also be called an optimization/implementation bug.

Yes.

std::clamp() (also a pattern to detect) - https://trac.cppcheck.net/ticket/10693 / llvm/llvm-project#55098

And no.

std::string::find_first_of(): - https://trac.cppcheck.net/ticket/10777
std::copy() - https://trac.cppcheck.net/ticket/10706

It seems there are certain restrictions for C++ algorithms which do not apply to C implementations and obviously not to self-rolled implementations.

@firewave
Copy link
Copy Markdown
Collaborator

readability-use-anyofallof just checks for escape and non-mutation, which can find loops that might not be straighforward to transform.

Probably something to report upstream if we come across it in our code.

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Jun 1, 2022

That could just be the ugly std::transform ones though

Whats the problem with std::transform?

std::string::find_first_of(): - https://trac.cppcheck.net/ticket/10777

We dont suggest this.

std::copy() - https://trac.cppcheck.net/ticket/10706

We suggest std::copy as it is the safest choice, and it will work functionally the same as the loop. Users can still replace it with unsafe versions if it proves to be faster.

That doesn't seems to be reason enough to stop suggesting std::copy.

@firewave
Copy link
Copy Markdown
Collaborator

firewave commented Jun 1, 2022

std::string::find_first_of(): - https://trac.cppcheck.net/ticket/10777

We dont suggest this.

This was a list of examples where modern code is not an improvement.

std::regex (unfortunately no simple C or raw counterpart) and std::stringstream also come to mind.

We suggest std::copy as it is the safest choice, and it will work functionally the same as the loop. Users can still replace it with unsafe versions if it proves to be faster.

We do suggest that? I wasn't aware of that.

That doesn't seems to be reason enough to stop suggesting std::copy.

In the past I had a slow function which was in the red hot path which was using that function several times (we modernized it from memcpy to std::copy at some point) - there were bigger offenders. Somehow I didn't see or missed that it made it slower.

I think if is there is a known trade-off (like security for performance) that should be documented. But in case of std::copy it also changes the behavior since you essentially replace memcpy with memmove.

Or if you use an algorithm that improves readability but suddenly uses double-loops (like std::string::find_first_of()) or goes from a linear to exponential growth it is not a "safe" replacement since it also changes the behavior.

Similar to range-based for loop usage based on modernize-loop-convert when const_iterator is involved. The resulting loop will still have a const object but will utilize a regular iterator which might break if that is not available code or change the behavior.

modernize-use-equals-default is another example as = default changes the behavior and can lead to code breakage since it is not the same as an empty implementation.

Unlike clang-tidy there's no way to fine-tune such checks with Cppcheck - which I think is a good thing but we also need to be careful what we report.

If we need to have performant code (which you always should have) and suddenly need to plaster code with suppressions you won't be happy about it. clang-tidy recently added several checks which you simply cannot enable since it fires for generic pattern which might cause problem but are actually extremely unlikely to do so and need very careful review and possibly lots of suppressions. They seem more like some very specific internal requirement built as a clang-tidy check.

You should never blindly just change your code but if a finding results in a major headscratcher even for quite experienced devs it might be the best to report it.

We really need to open the discussion section of the repo so tickets are not being hijacked for discussions...@danmar

@chrchr-github
Copy link
Copy Markdown
Collaborator Author

That could just be the ugly std::transform ones though

Whats the problem with std::transform?

    for (auto& i : v)
        i += 5;

vs.

    std::transform(v.begin(), v.end(), v.begin(), [](auto i) {
        return i + 5;
    });

doesn't seem to improve readability.

@firewave
Copy link
Copy Markdown
Collaborator

firewave commented Jun 1, 2022

    std::transform(v.begin(), v.end(), v.begin(), [](auto i) {
        return i + 5;
    });

doesn't seem to improve readability.

This also seems prone to messing up the parameter order - especially in the four parameter version.

The first example on https://en.cppreference.com/w/cpp/algorithm/transform is particular cringeworthy...

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Jun 1, 2022

In the past I had a slow function which was in the red hot path which was using that function several times (we modernized it from memcpy to std::copy at some point)

We dont suggest to use std::copy instead of memcpy, so I am not sure how this is relevant.

clang-tidy recently added several checks which you simply cannot enable

Also, I dont see how all these clang-tidy issues are relevant here at all, since we do not have these equivalent "modernize" checks.

This also seems prone to messing up the parameter order - especially in the four parameter version.

Which is not relevant here since we never suggest std::transform when the four parameter version is needed.

The first example on https://en.cppreference.com/w/cpp/algorithm/transform is particular cringeworthy...

I dont find it cringeworthy. It would be better to make it more explicit its doing a transform in-place, that could improve readability. However, it still makes the intention of the code quite clear. Plus its easy to make it run in parallel.

@firewave
Copy link
Copy Markdown
Collaborator

firewave commented Jun 1, 2022

I dont find it cringeworthy. It would be better to make it more explicit its doing a transform in-place, that could improve readability. However, it still makes the intention of the code quite clear. Plus its easy to make it run in parallel.

A parallel_for would have been sufficient...

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Jun 3, 2022

Imho the stl algorithms are not that readable. It depends on the reader.

for (const std::string & p: suffixes) {
    if (isPrefixStringCharLiteral(str, q, p))
        return true;
}

=>

return std::any_of(suffixes.begin(), suffixes.end(), 
    [q](const std::string& p) {
        return isPrefixStringCharLiteral(str, q, p);
});

It would be more readable if the replacement was more like:

return any_of(suffixes, p, isPrefixStringCharLiteral(str,q,p));

@firewave
Copy link
Copy Markdown
Collaborator

firewave commented Jun 3, 2022

It would be more readable if the replacement was more like:

return any_of(suffixes, p, isPrefixStringCharLiteral(str,q,p));

LLVM has such helpers for many algorithms. Probably because of the readability among other things.

@chrchr-github
Copy link
Copy Markdown
Collaborator Author

C++20 improves things a little:

bool f(const std::vector<std::string>& v) {
    return std::ranges::any_of(v, [](const auto& s) { return s.empty(); });
}

@chrchr-github
Copy link
Copy Markdown
Collaborator Author

But this PR is still about flagging iterators, which also make for unreadable code.

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Jun 3, 2022

It would be more readable if the replacement was more like:

return any_of(suffixes, p, isPrefixStringCharLiteral(str,q,p));

I dont that would be more readable as it not clear that you are declaring p as a variable.

Do you think a range-based any_of is less readable?

return any_of(suffixes, [&](const std::string& p) {
    return isPrefixStringCharLiteral(str, q, p);
});

This doesn't seem anymore verbose than the original for loop plus it makes the intention clearer.

HOFs could make it even more succinct. Such as boost::hof::partial:

return any_of(suffixes, partial(&isPrefixStringCharLiteral)(str, q));

Or boost::hof::lazy:

return any_of(suffixes, lazy(&isPrefixStringCharLiteral)(str, q, _1));

Which is like std::bind so it could be written like this in C++11:

return any_of(suffixes, std::bind(&isPrefixStringCharLiteral, str, q, _1));

Since we dont have C++20 yet to use a range-based any_of we could add simple wrappers around it:

template <class C, class Predicate>
bool any_of(const C& c, const Predicate& p)
{
    return std::any_of(c.begin(), c.end(), p);
}

Although, the goal of this check is not just readability. Its to find bugs in code. If the intention of the loop is not any_of then there is a mistake in the code(and I have made this mistake in the past). Of course it could be a little more verbose(for iterator-based algorithms at least), but at the same time its more verbose to use const or explicit(which we warn about). The readability improvement comes from making the intentions of the code clear, and if that wasn't the intention of the code then cppcheck has found a bug.

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Jun 3, 2022

But this PR is still about flagging iterators, which also make for unreadable code.

This true, using the algorithm in this case is not anymore verbose than the original loop.

@danmar
Copy link
Copy Markdown
Collaborator

danmar commented Jun 3, 2022

Do you think a range-based any_of is less readable?

That is much better. I guess the readability for that code is similar to the readability of the for loop to my eyes. It is annoying that the lambda stuff is needed.

Those boost::hof and std::bind examples are confusing and not readable at all to my eyes. :-) So the lambda is better than those options.

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Jun 6, 2022

That is much better. I guess the readability for that code is similar to the readability of the for loop to my eyes.

Sure, I guess its a matter of opinion, although I disagree. I think its fine we disable this for cppcheck code then, but I dont see that reason enough to remove it from cppcheck since it can find bugs and is highly recommended from many prominent C++ developers.

It is annoying that the lambda stuff is needed.

Yea, I dont see it as annoying. In fact, since I have started using stl algorithms on a regular basis especially with cppcheck scolding me about it, it has become quite natural to write it this way. Perhaps others may not be as familiar with algorithms or lambdas so would not see this as better.

We have flags about which C++ version to use, and we only warn about the loops with C++14 or later. Perhaps we could have flags about certain feature users would like to avoid or are less familiar with so we could adjust the checks based on that.

Those boost::hof and std::bind examples are confusing and not readable at all to my eyes.

Sure, another idea is to assign the lambda to a variable:

auto hasStrPrefix = [&](const std::string& p) {
    return isPrefixStringCharLiteral(str, q, p);
};
return any_of(suffixes, hasStrPrefix);

@pfultz2
Copy link
Copy Markdown
Contributor

pfultz2 commented Jun 6, 2022

I think if is there is a known trade-off (like security for performance) that should be documented. But in case of std::copy it also changes the behavior since you essentially replace memcpy with memmove.

I assume its easier to find places to replace std::copy with memcpy then trying to find raw loops. Of course, they can't be blindly replaced so an advanced developer that understand the constraints can make the memcpy replacement(we could add a check for this as well in the future).

If its very important to use memcpy or strbk, then it would be important to warn about raw loops like we do, so an advanced developer who understand the code and constraints can replace the raw loop with the unsafe version.

I am sure if you see cppcheck complaining about a raw loop needing to use std::copy then you will consider memcpy if its possible. So ultimately, it seems like an argument to keep this check.

Comment thread lib/checkstl.cpp Outdated
Comment thread lib/checkstl.cpp Outdated
@chrchr-github chrchr-github marked this pull request as ready for review June 23, 2022 22:31
@chrchr-github
Copy link
Copy Markdown
Collaborator Author

Any more feedback on this?

@firewave
Copy link
Copy Markdown
Collaborator

Any more feedback on this?

My opinion is still the same. Looking at the code I also wonder if this might impact the performance.

@chrchr-github
Copy link
Copy Markdown
Collaborator Author

So I made this mergeable once more. How about it?

@danmar danmar merged commit 3273e51 into cppcheck-opensource:main Oct 16, 2022
@chrchr-github chrchr-github deleted the chr_Fix10412 branch October 22, 2022 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants