Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asan issue with Clang 6 and std::stable_sort w/ indirect_adapter #139

Closed
Morwenn opened this issue Dec 21, 2018 · 15 comments
Closed

asan issue with Clang 6 and std::stable_sort w/ indirect_adapter #139

Morwenn opened this issue Dec 21, 2018 · 15 comments

Comments

@Morwenn
Copy link
Owner

Morwenn commented Dec 21, 2018

Asan finds a stack buffer overflow with the following test and the std::stable_sort equivalent:

Test #135: every sorter with indirect adapter - cppsort::std_sorter

I've no idea why. Instinctively I would say that it's linked to projection_compare because std_sorter has to use if, but it could also be linked to the new indirect_adapter that accepts projection-only sorters. It's probably a mix of both, but I can't find the actual issue.

Here is the failing job issue with the full asan log: https://travis-ci.org/Morwenn/cpp-sort/jobs/471096685

@Morwenn
Copy link
Owner Author

Morwenn commented Dec 31, 2018

I switched the C++17 branch to Clang 7 and it seems that the error disappeared, which is quite surprising considering that it occurred at every single run with Clang 6. Two possibilities come to my mind:

  • It was a false positive inherent to ASAN with Clang 6
  • I got super lucky with the RNG seed and didn't trigger the error

Whatever it is I'm keeping this issue open for now.

@Morwenn Morwenn changed the title asan issue with Clang 6 and asan issue with Clang 6 and std::stable_sort w/ indirect_adapter Feb 20, 2019
@sehe
Copy link

sehe commented Mar 17, 2021

On my ubuntu 18.04 with clang

clang++-6.0 --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

I cannot reproduce this error:

CXX=clang++-6.0 CPPSORT_SANITIZE=undefined cmake -B build . -DCPPSORT_SANITIZE=undefined,address -DCMAKE_CXX_COMPILER=/usr/bin/clang++-6.0 -DCMAKE_EXPORT_COMPILE_COMMANDS=On
(cd build && ctest -I 304,305)

Prints

ct /home/sehe/Projects/stackoverflow/cpp-sort/build
    Start 304: every random-access sorter with indirect adapter - cppsort::std_sorter
1/2 Test #304: every random-access sorter with indirect adapter - cppsort::std_sorter ............................   Passed    0.08 sec
    Start 305: every random-access sorter with indirect adapter - cppsort::stable_adapter<cppsort::std_sorter>
2/2 Test #305: every random-access sorter with indirect adapter - cppsort::stable_adapter<cppsort::std_sorter> ...   Passed    0.08 sec

100% tests passed, 0 tests failed out of 2

Total Test time (real) =   0.18 sec

Full output is:
test.log

Like you said, with later versions it already wasn't happening.

I assume that - unless I'm missing some obvious flag - this is no longer an issue on develop?

@sehe
Copy link

sehe commented Mar 17, 2021

I also repeated using clang-6.0 on that old commit:

commit 2b9bb7ea79536e9f65d3a320d39cabb335444317 (HEAD)
Author: Morwenn <morwenn29@hotmail.fr>
Date:   Fri Dec 21 19:00:57 2018 +0100

    scope_exit is always available in C++17

Of course with the old variable naming in cmake invocation

Now it repros

The following tests FAILED:
        135 - every sorter with indirect adapter - cppsort::std_sorter (Failed)
        136 - every sorter with indirect adapter - cppsort::stable_adapter<cppsort::std_sorter> (Failed)
Errors while running CTest

Looking at it

@Morwenn
Copy link
Owner Author

Morwenn commented Mar 17, 2021

Interesting, so some magic change made the bug disappear since then?

@sehe
Copy link

sehe commented Mar 18, 2021

The bug or false-positive. I'll keep you posted.

@sehe
Copy link

sehe commented Mar 22, 2021

Okay, so I sat with it first (before just bisecting, which will likely find that problem is gone with 2ab06c1). For many hours.

There's bad news. The problem seems traceable to the indirection projection. This is an immutable lambda, but still it ostensibly changes during execution (as part of project_compare).

I really can't decide yet whether that's a miscompilation, or I'm missing some source of UB lurking there.

I added a unit test issue_139 on branch issue_139. The unit test does not repro the ASAN diagnostic, but it does manifest the source of the UB, where the captured &proj reference (originally include/cpp-sort/adapters/indirect_adapter.h:L123) gets clobbered.

The good news is you can see this happening. The bad news is it's not just with Clang6. ((In fact with GCC-10 + ASAN it actually reports AddressSanitizer: stack-buffer-overflow).

What follows is some of the thinking/observations that went into making the repro:


  • I figured it might have been down to lambda codegen, but rewriting it as a callable struct made no difference. (Omitted from the unit test for brevity).

  • Next I figured it might be due to some weird codegen issue with empty-element tuple<> optimization in projection_compare (because replacing the tuple by it's elements did remove the symptom).

    But sketching an "adhoc" projection_compare in terms of tuple does not display the same symptom. [Included in the repro code in case you want to check]

Among the weirder observations (if that's not enough for you)

  • depending on how I sequence the tests, the first invocation of projection_compare::compare() does the right thing (&proj == projptr), but a second invocation fails.
  • depending on whether I phrase taking the address of proj as &proj or std::addressof(proj) can give different diagnostics some of the time: at times ASAN croaked about invoking a member on a null pointer when using std::addressof. ¯\(ツ)
  • It's weird that this manifested with std_sort only. Then again, the repro case fails with other compilers/clang versions just as well, even though that never tripped ASAN before.

I fail to see something wrong with the actual code. Could it really be just... miscompilation? Pretty scary. Maybe I'm just going crazy and the error is right before our eyes. I'd appreciate a second pair of eyes. Meanwhile, I'm sleeping on it. Maybe tomorrow brings fresh ideas.


Apparent workarounds:

  • capturing proj by value
  • replacing tuple projection_compare::data by its parts works
  • perhaps it is gone with 2ab06c1 (I still wish to understand the phenomenon I'm seeing)

sehe added a commit to sehe/cpp-sort that referenced this issue Mar 22, 2021
Okay, so I sat with it first (before just bisecting, which will likely find that problem is gone with 2ab06c1). For many hours.

There's bad news. The problem seems traceable to the indirection projection. This is an immutable lambda, but still it ostensibly changes during execution (as part of `project_compare`).

I really can't decide yet whether that's a miscompilation, or I'm missing some source of UB lurking there.

I added a unit test `issue_139` on branch `issue_139`. The unit test does not repro the ASAN diagnostic, but it *does* manifest the source of the UB, where the captured `&proj` reference (originally include/cpp-sort/adapters/indirect_adapter.h:L123) gets clobbered.

The good news is you can see this happening. The bad news is it's not just with Clang6. ((In fact with GCC-10 + ASAN it actually reports _AddressSanitizer: stack-buffer-overflow_).

What follows is some of the thinking/observations that went into making the repro:

-----

- I figured it might have been down to lambda codegen, but rewriting it as a callable struct made no difference. (Omitted from the unit test for brevity).

- Next I figured it might be due to some weird codegen issue with empty-element `tuple<>` optimization in `projection_compare` (because replacing the tuple by it's elements **did** remove the symptom).

  But sketching an "adhoc" projection_compare in terms of `tuple` does **not** display the same symptom. [Included in the repro code in case you want to check]

Among the weirder observations (if that's not enough for you)

 - depending on how I sequence the tests, the first invocation of `projection_compare::compare()` does the right thing (`&proj == projptr`), but a second invocation fails.
 - depending on whether I phrase taking the address of `proj` as `&proj` or `std::addressof(proj)` can give different diagnostics some of the time: at times ASAN croaked about invoking a member on a null pointer when using `std::addressof`. ¯\\_(ツ)_/¯
 -  It's weird that this manifested with std_sort only. Then again, the repro case fails with other compilers/clang versions just as well, even though that never tripped ASAN before.

I fail to see something wrong with the actual code. Could it really be just... miscompilation? Pretty scary. Maybe I'm just going crazy and the error is right before our eyes. I'd appreciate a second pair of eyes. Meanwhile, I'm sleeping on it. Maybe tomorrow brings fresh ideas.

-----

Apparent workarounds:

 - capturing `proj` by value
 - replacing `tuple` projection_compare::data by its parts works
 - perhaps it is gone with 2ab06c1 (I still wish to understand the phenomenon I'm seeing)
@sehe
Copy link

sehe commented Mar 22, 2021

Sleep brought me the convincing realization that there /must/ be UB if so many compilers/versions agree. Is it worth putting this through CI? I want to rule out mis-builds due to my build system.

@sehe
Copy link

sehe commented Mar 22, 2021

Ah. Got it. I think.

The tuple elements in projection_compare are deduced wrongly:

Also checking back:

        using compare_t = decltype(utility::as_function(std::declval<Compare&>()));
        using projection_t = decltype(utility::as_function(std::declval<Projection&>()));
        static_assert(not std::is_reference_v<projection_t>);
        static_assert(not std::is_reference_v<compare_t>);

Fires the second static_assert.

I did check earlier what the actual types deduced were. However (a) it unsuspect (b) just tracing the names made the ASAN diagnostic go away.

By now, since I've established my own way of detecting the badness in the UT, I don't need the ASAN diagnostic to know it went bad. So I can see it.

Will prepare a PR if still relevant to existing release(s)

sehe added a commit to sehe/cpp-sort that referenced this issue Mar 22, 2021
Okay, so I sat with it first (before just bisecting, which will likely find that problem is gone with 2ab06c1). For many hours.

There's bad news. The problem seems traceable to the indirection projection. This is an immutable lambda, but still it ostensibly changes during execution (as part of `project_compare`).

I really can't decide yet whether that's a miscompilation, or I'm missing some source of UB lurking there.

I added a unit test `issue_139` on branch `issue_139`. The unit test does not repro the ASAN diagnostic, but it *does* manifest the source of the UB, where the captured `&proj` reference (originally include/cpp-sort/adapters/indirect_adapter.h:L123) gets clobbered.

The good news is you can see this happening. The bad news is it's not just with Clang6. ((In fact with GCC-10 + ASAN it actually reports _AddressSanitizer: stack-buffer-overflow_).

What follows is some of the thinking/observations that went into making the repro:

-----

- I figured it might have been down to lambda codegen, but rewriting it as a callable struct made no difference. (Omitted from the unit test for brevity).

- Next I figured it might be due to some weird codegen issue with empty-element `tuple<>` optimization in `projection_compare` (because replacing the tuple by it's elements **did** remove the symptom).

  But sketching an "adhoc" projection_compare in terms of `tuple` does **not** display the same symptom. [Included in the repro code in case you want to check]

Among the weirder observations (if that's not enough for you)

 - depending on how I sequence the tests, the first invocation of `projection_compare::compare()` does the right thing (`&proj == projptr`), but a second invocation fails.
 - depending on whether I phrase taking the address of `proj` as `&proj` or `std::addressof(proj)` can give different diagnostics some of the time: at times ASAN croaked about invoking a member on a null pointer when using `std::addressof`. ¯\\_(ツ)_/¯
 -  It's weird that this manifested with std_sort only. Then again, the repro case fails with other compilers/clang versions just as well, even though that never tripped ASAN before.

I fail to see something wrong with the actual code. Could it really be just... miscompilation? Pretty scary. Maybe I'm just going crazy and the error is right before our eyes. I'd appreciate a second pair of eyes. Meanwhile, I'm sleeping on it. Maybe tomorrow brings fresh ideas.

-----

Apparent workarounds:

 - capturing `proj` by value
 - replacing `tuple` projection_compare::data by its parts works
 - perhaps it is gone with 2ab06c1 (I still wish to understand the phenomenon I'm seeing)
@Morwenn
Copy link
Owner Author

Morwenn commented Mar 22, 2021

Thanks a lot for analyzing this so throroughly. After taking a look at the library, I likely have the issue in the following places:

  • stable_compare in <cpp-sort/adapters/stable_adapter.h>
  • Possibly indirect_t in <cpp-sort/detail/functional.h>

Apparently the bug in projection_compare was fixed: I already added remove_cvref_t where it mattered.

@sehe
Copy link

sehe commented Mar 22, 2021

It is not relevant anymore, you fixed it in 89eb52d


tree ac29349e6cf9bed28acf476308ca0bb7e88ac7ff
parent c769e514afd1c6e194bd706b4f49b08aed4ce987
author Morwenn <morwenn29@hotmail.fr> Sun Aug 30 01:06:19 2020 +0200
committer Morwenn <morwenn29@hotmail.fr> Sun Aug 30 01:06:19 2020 +0200

Fix bug with projection_compare

It's not the fix I had in mind though. I had this:

    using compare_t = decltype(utility::as_function(std::declval<Compare>()));
    using projection_t = decltype(utility::as_function(std::declval<Projection>()));

Why remove a reference that shouldn't be there? And in the process unqualify cv as well

@Morwenn
Copy link
Owner Author

Morwenn commented Mar 22, 2021

I guess that the & in std::declval might have been an issue of habit: I used to mostly use std::declval on types that weren't guaranteed to be copyable or even default-constructible (I don't remember the specifics), so the & helped to handle all of those types. It indeed does not make sense here, especially since comparison and projection functions are always supposed to be copyable.

Cargo cult if you will.

@sehe
Copy link

sehe commented Mar 22, 2021

Yeah I had a hunch that declval<T&>() makes sense because it's what one usually needs :) Anyhoops, this settles it. I'd say this bug can be closed, unless you "support" any pre-89eb52d releases?

@Morwenn
Copy link
Owner Author

Morwenn commented Mar 22, 2021

I virtually don't have users, so I don't plan to fix old versions unless someone specifically asks for it :p

I guess that your solution is strictly better than the one using remove_cvref_t though? If so I might change projection_compare to use it, and fix stable_compare at the same time. I will likely try borrow some of your unit tests if you feel they are relevant: projection_compare is a public type now, so having a regression test for that would be nice.

Thanks again for the investigation, without that the issue might have remained open forever 😄

@sehe
Copy link

sehe commented Mar 22, 2021

Cheers. I would not retain the unit test. Even though it was able to pick up on this UB before ASAN did some of the time, it will not detect unexpected issues anyways.

It was a narrow tool for closed analysis :)

Likewise, I thought better of adding the static_assert to assert that the types aren't reference types. People will find a way to pass std::cref(my_projection) or something and you'd never know (granted, if that were a problem, that means the bug is in their code anyways).

If you are positive that there will not be a use for storing projection/comparator by reference I'm actually happier with the remove_reference_t approach¹. I guess it comes down to the question whether decaying plays well with e.g. constant_function etc. I didn't delve into that territory of the library, so I don't know all the ramifications.

¹(/OT is there a practical difference between std::decay_t and remove_cvref_t?)

@Morwenn
Copy link
Owner Author

Morwenn commented Mar 22, 2021

To answer the last question: I replaced std::decay_t with remove_cvref_t where it was clear that there was no difference between both since remove_cvref_t typically instantiates fewer tempates - it's not extremely noticeable, but it's still less job for the compiler without much effort on my part, so I'm willing to take that.

I don't have a use case for storing comparator/projection by reference, so I'll probably keep using remove_cvref_t until I have an interesting use case for those. I'm gonna take your word for the usefulness of the test and just close the issue, I'm really glad that there is finally an explanation for that old weird issue 😄

@Morwenn Morwenn closed this as completed Mar 22, 2021
Morwenn added a commit that referenced this issue Mar 22, 2021
@Morwenn Morwenn added this to the 1.10.0 milestone Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants