Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r23~b6 miscompiles some coroutines (fixed upstream, mere days after point of last merge) #1540

Closed
saurik opened this issue Jul 20, 2021 · 12 comments
Assignees
Labels
Projects

Comments

@saurik
Copy link

saurik commented Jul 20, 2021

r23~b6 is https://android.googlesource.com/toolchain/llvm-project/+/c935d99d7cf2016289302412d708641d52d2f7ee, aka upstream llvm/llvm-project@8456c3a, made on Fri Jan 15 22:53:25 2021 -0800.

My project is crashing with this build of the NDK (and r23b5; earlier betas untested), but not with any other copy of clang I've tried (including other clang-12s). I'm hoping this r23 doesn't become the final build ;P.

I whittled down a still-very-complex test case and bisected the compiler to figure out that this was fixed by upstream mere days later, in llvm/llvm-project@c1bc798, made on Wed Jan 20 18:59:00 2021 +0800.

The upstream bug tracking this issue was https://bugs.llvm.org/show_bug.cgi?id=48712 and the commit which caused the issue was llvm/llvm-project@b3a722e, made on Mon Sep 28 15:48:00 2020 +0800.

(I've uploaded my probably-unhelpful reproduction--which I failed to whittle cppcoro entirely out of--mostly for my own organizational purposes; but, given the LLVM issue, I hope it doesn't matter: unhelpful.zip.)

@saurik saurik added the bug label Jul 20, 2021
@DanAlbert DanAlbert added this to Triaged in r23 via automation Jul 20, 2021
@DanAlbert
Copy link
Member

Thanks for all the details! Coroutines support is still incomplete in Clang, so we won't spin another RC for this but it's unavoidable that we'll need an r23b release and we'll pick this up then :)

@DanAlbert DanAlbert added this to Awaiting triage in LLVM via automation Jul 20, 2021
@DanAlbert DanAlbert moved this from Awaiting triage to Awaiting update in LLVM Jul 20, 2021
@DanAlbert DanAlbert moved this from Triaged to Needs prebuilt update in r23 Jul 20, 2021
@saurik
Copy link
Author

saurik commented Jul 20, 2021

FWIW, it is "incomplete", but definitely usable? Like, I realize it says "partial", but I feel like they are being pedantic there (as they should, being compiler people): AFAIK Facebook has been using C++ coroutines in production as part of their Folly library for a long while now (though probably only on servers). My project makes rather heavy use of coroutines, on Android even in specific, since 2019; so, I can attest to them seemingly working great at least as far back as r19.

Regardless: I mostly have everyone compile our code using r21 anyway--as it is an LTS, which I'm super glad you provide!! :D--and have only recently been dipping my feet in r22, so I'm not like, super sad at your response or anything ;P. Thanks for paying attention!

@DanAlbert DanAlbert removed this from Needs prebuilt update in r23 Aug 13, 2021
@DanAlbert DanAlbert added this to Triaged in r23b via automation Aug 13, 2021
@pirama-arumuga-nainar
Copy link
Collaborator

https://android-review.googlesource.com/c/toolchain/llvm_android/+/1817582 cherry-picks the fix to r23b toolchain.

@DanAlbert DanAlbert moved this from Triaged to Needs prebuilt update in r23b Sep 14, 2021
@DanAlbert DanAlbert moved this from Needs prebuilt update to Needs cherry-pick in r23b Sep 16, 2021
@DanAlbert
Copy link
Member

https://ci.android.com/builds/branches/aosp-master-ndk/grid?head=7732824&tail=7732824 has the fix. I wasn't able to repro the crash (guessing at build flags didn't get me there) so I can't verify that it fixed the issue.

@saurik
Copy link
Author

saurik commented Sep 17, 2021

OK, so before I had just been using my project's giant build system to do the compile, and I sadly deleted the folder that I had been doing the bisection work in a month later (and so a month ago), so I experienced first-hand the complexity of my "unhelpful.zip" ;P. I'm sorry I didn't spend the time before to whittle down the build system (as it didn't even take me long to do just now... I was just feeling burned out after spending days bisecting before).

@DanAlbert The secret is (apparently) using -Os.

So, here, using lewissbaker/cppcoro@a87e97f (the current version) and llvm-mirror/libcxx@2076f53 (the tip of the release_90 branch, which I appreciate is an old version), here is a single call of clang that replicates the issue.

clang++ -o unhelpful coro.cpp main.cpp \
    -Ilibcxx/include libcxx/src/{charconv,exception,stdexcept,string}.cpp \
    -Icppcoro/include cppcoro/lib/lightweight_manual_reset_event.cpp \
    -std=c++2a -fcoroutines-ts -Os -nostdinc++ -D__GLIBCXX__

I've re-verified that this works on r21/r22--as well as Ubuntu's clang++-1[012]--and fails on r23~b6 (as before); I then verified it failed on the official release of r23 (as expected)... but, sadly, it is still failing on this r24 canary build :(.

@pirama-arumuga-nainar
Copy link
Collaborator

Can you double check? For me (on Linux), unhelpful built with clang-r416183c1 (which should be in the canary build) runs successfully but when built with clang-r416183b (in r23), it segfaults.

@saurik
Copy link
Author

saurik commented Sep 17, 2021

@pirama-arumuga-nainar The canary build @DanAlbert linked to (which is the first canary I've ever downloaded: I didn't know this was a thing I could do before, so I couldn't have anything mixed up locally) is reporting as "r416183c", which is missing the "1" at the end?

/home/saurik/opt/android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --version
Android (7416131, based on r416183c) clang version 12.0.6 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/saurik/opt/android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin

FWIW, it just doesn't seem to have that patch (so at least that's consistent? ;P).

# git log c935d99d7cf2016289302412d708641d52d2f7ee | grep 'Remain alignment' # (sic)
# git log c935d99d7cf2016289302412d708641d52d2f7ee | grep 'c1bc7981bab'
# git log c935d99d7cf2016289302412d708641d52d2f7ee | grep 'I1dc66055502a4eaace36c2ed6aab05a1f0f68c22'
# git blame c935d99d7cf2016289302412d708641d52d2f7ee -- llvm/lib/Transforms/Coroutines/CoroFrame.cpp | grep -nC3 CouldMerge
Blaming lines: 100% (2304/2304), done.
596-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  596)     // NonOverlappedAllocaSet.
597-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  597)     for (auto &AllocaSet : NonOverlapedAllocas) {
598-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  598)       assert(!AllocaSet.empty() && "Processing Alloca Set is not empty.\n");
599:b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  599)       bool CouldMerge = none_of(AllocaSet, [&](auto Iter) {
600-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  600)         return IsAllocaInferenre(Alloca, Iter);
601-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  601)       });
602:b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  602)       if (!CouldMerge)
603-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  603)         continue;
604-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  604)       AllocaIndex[Alloca] = AllocaIndex[*AllocaSet.begin()];
605-b3a722e66b753 (Chuanqi Xu           2020-09-28 15:40:35 +0800  605)       AllocaSet.push_back(Alloca);

^ This is the old code, per the patch. v

https://android-review.googlesource.com/c/toolchain/llvm_android/+/1817582/3/patches/cherry/c1bc7981babcae20247650a4b8adab9c0c97890a.patch#35

diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index 26d4ea482ffb..e2642234ac9b 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -596,9 +596,21 @@ void FrameTypeBuilder::addFieldForAllocas(const Function &F,
     // NonOverlappedAllocaSet.
     for (auto &AllocaSet : NonOverlapedAllocas) {
       assert(!AllocaSet.empty() && "Processing Alloca Set is not empty.\n");
-      bool CouldMerge = none_of(AllocaSet, [&](auto Iter) {
+      bool NoInference = none_of(AllocaSet, [&](auto Iter) {
         return IsAllocaInferenre(Alloca, Iter);
       });
+      // If the alignment of A is multiple of the alignment of B, the address
+      // of A should satisfy the requirement for aligning for B.
+      //
+      // There may be other more fine-grained strategies to handle the alignment
+      // infomation during the merging process. But it seems hard to handle
+      // these strategies and benefit little.
+      bool Alignable = [&]() -> bool {
+        auto *LargestAlloca = *AllocaSet.begin();
+        return LargestAlloca->getAlign().value() % Alloca->getAlign().value() ==
+               0;
+      }();
+      bool CouldMerge = NoInference && Alignable;
       if (!CouldMerge)
         continue;
       AllocaIndex[Alloca] = AllocaIndex[*AllocaSet.begin()];

@saurik
Copy link
Author

saurik commented Sep 17, 2021

Huh. That canary build is actually showing as the same commit as r23.

# /usr/local/lib/android/sdk/ndk/23.0.7599858/toolchains/llvm/prebuilt/linux-x86_64/bin/clang-12 --version
Android (7284624, based on r416183b) clang version 12.0.5 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/lib/android/sdk/ndk/23.0.7599858/toolchains/llvm/prebuilt/linux-x86_64/bin
# android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin/clang-12 --version
Android (7416131, based on r416183c) clang version 12.0.6 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/saurik/opt/android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin
# sha256sum android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin/clang-12
50df359209e625f6a1d4a423571a25732e9f34484aecbc7e81d987b4051906cb  android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin/clang-12
# unzip -p android-ndk-7732824-linux-x86_64.zip android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin/clang-12 | sha256sum
50df359209e625f6a1d4a423571a25732e9f34484aecbc7e81d987b4051906cb  -

@saurik
Copy link
Author

saurik commented Sep 17, 2021

OK, so I am not sure I understand what that commit is referencing anymore? :( Shouldn't I be able to rebuild the toolchain using that commit? I was just digging around in your CI system and pulled an artifact for the most recent finished build (7740932), which 1) has the "1" after the "c"; 2) works! \o/; but... 3) also claims to be the same llvm-project commit?...

# unzip android-ndk-7740932-linux-x86_64.zip
...
# ./android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --version
Android (7714059, based on r416183c1) clang version 12.0.8 (https://android.googlesource.com/toolchain/llvm-project c935d99d7cf2016289302412d708641d52d2f7ee)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/saurik/opt/./android-ndk-r24-canary/toolchains/llvm/prebuilt/linux-x86_64/bin

@pirama-arumuga-nainar Regardless: it does seem like "r416183c1" (with the "1") has the fix for the issue, and I can confirm it works for me ;P.

@pirama-arumuga-nainar
Copy link
Collaborator

@DanAlbert I think you may have to fix the link to the canary build here :)

  1. also claims to be the same llvm-project commit?

That's expected - it's the base revision where we branched llvm-project. We apply cherry-picks, including the one for this issue, on top of that revision during build.

@DanAlbert
Copy link
Member

@DanAlbert I think you may have to fix the link to the canary build here :)

D'oh! Yeah, my mistake. That build updates the sysroot. https://ci.android.com/builds/branches/aosp-master-ndk/grid?head=7740141&tail=7740141 updates the compiler. Sorry about that.

chenguoyin-alibaba pushed a commit to riscv-android-src/platform-ndk that referenced this issue Oct 21, 2021
Test: ./checkbuild.py && ./run_tests.py
Bug: android/ndk#1540
Bug: android/ndk#1551
Bug: android/ndk#1555
Change-Id: I2d345546457d8c461b603be4ea9e725230f11af9
(cherry picked from commit 2af6bc5)
chenguoyin-alibaba pushed a commit to riscv-android-src/toolchain-llvm_android that referenced this issue Oct 21, 2021
Bug: android/ndk#1540

c1bc7981bab [Coroutine] Remain alignment information when merging frame variables

Test: N/A
Change-Id: I1dc66055502a4eaace36c2ed6aab05a1f0f68c22
@rprichard
Copy link
Collaborator

Fixed in r23b.

r23b automation moved this from Needs cherry-pick to Merged Oct 29, 2021
@DanAlbert DanAlbert moved this from Awaiting update to Prebuilts submitted in LLVM Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
LLVM
  
Prebuilts submitted
r23b
  
Merged
Development

No branches or pull requests

5 participants