Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless AddressSanitizer:DEADLYSIGNAL loop with GCC 13 since Ubuntu 22.04 202403*10*.1.0 but not before #9524

Closed
2 of 13 tasks
hartwork opened this issue Mar 16, 2024 · 6 comments · Fixed by eBay/NuRaft#492
Closed
2 of 13 tasks

Comments

@hartwork
Copy link

hartwork commented Mar 16, 2024

Description

Workflow https://github.com/cpptest/cpptest/blob/master/.github/workflows/linux.yml was working fine until a few days agao with image 20240304.1.0 and now the same commit (cpptest/cpptest@4430bb3 if curious) with image 20240310.1.0 is failing with GCC 13 (but not Clang 18!) looping runtime error AddressSanitizer:DEADLYSIGNAL? and then timing out.

Could be the same cause as #9491 but #9491 seems to be about Clang instead of GCC.

CC @hannob

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • macOS 11
  • macOS 12
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

Good: 20240304.1.0
Bad: 20240310.1.0

Good: https://github.com/cpptest/cpptest/actions/runs/8198156643/job/22421247085
Bad: https://github.com/cpptest/cpptest/actions/runs/8290749936/job/22726589918

Is it regression?

20240304.1.0

Expected behavior

CI finishes in under 2 minutes

Actual behavior

CI times out after a seeming endless loop from GCC 13 ASan saying AddressSanitizer:DEADLYSIGNAL?.

Repro steps

Re-run https://github.com/cpptest/cpptest/actions/runs/8290749936/job/22726589918 and watch output of "make all".

@phil-blain
Copy link

phil-blain commented Mar 16, 2024

It's the same cause, GCC uses the same sanitizers as Clang does [source].

In stock GCC 13.1.0, which is the version in the ubuntu-22.04 images, the sanitizers are merged from LLVM commit llvm/llvm-project@ae59131 [source]. The same version is used in GCC 13.2.0 [source].

This commit was released as part of LLVM 16.0.0, and thus predates the fixes to AdressSanitizer to make it compatible with the changes in the Ubuntu azure kernel package (version 6.5.0-1016-azure) to bump the ASLR entropy which are discussed here (#9491 (comment)), i.e. llvm/llvm-project@fb77ca0, which went into LLVM 17.0.0.

Note that LeakSanitizer was also impacted, which was fixed in llvm/llvm-project@5ffe955 which also made it into LLVM 17.0.0.

For completeness, here is the commit in the Ubuntu kernel repo which made the changes to ARCH_MMAP_RND_BITS: 6b522637c6a7dabd8530026ae933fb5ff17e877f (and associated Launchpad bug: #1983357). This change was propagated to the linux-azure-6.5 repository, which is the source repo for the Azure-infused kernel used by the GitHub actions image, in ff755408f6f81000a45e2b2e54a15a28d6a8cff2.

Finally, note that the Ubuntu GCC package (gcc-13) was updated yesterday to include the LLVM patch for ASan: 6c5be2a496335c513dbe6fa85df2402cfc0f0a8b, following Debian: d7d908e4d4da4b181d2e875e75c4f804c8a1691e. However, this package is not used for the GCC 13.1.0 compiler available in the ubuntu-22.04 GitHub Actions image (since that package is not available for 22.04), the image uses the "Toolchain test build" PPA, [source], which as of this writing has not been updated yet.

andy5995 added a commit to andy5995/canfigger that referenced this issue Mar 17, 2024
These weird issues are probably related to
actions/runner-images#9524
@andy5995
Copy link

with GCC 13 (but not Clang 18!) looping runtime error AddressSanitizer:DEADLYSIGNAL? and then timing out.

This is happening to me, too, but with gcc 11 (and now 13, which I am trying after seeing this ticket).

CI tests that were passing are now suddently failing. With clang, I don't get an endless loop as mentioned above, but some units tests are failing. Sometimes
they are different units tests that fail, sometimes they pass. Usually they all pass.

https://github.com/andy5995/canfigger/actions/runs/8312710880/job/22747867106
logs_21799559162.zip

https://github.com/theimpossibleastronaut/rmw/actions/runs/8312663731/job/22747768800

https://productionresultssa5.blob.core.windows.net/actions-results/5ec2690f-2ed9-4417-88c4-c3575b6db226/workflow-job-run-c4625b57-391a-5be5-f30a-26a1e0a53cce/logs/job/job-logs.txt?rsct=text%2Fplain&se=2024-03-17T03%3A54%3A16Z&sig=%2BMP9BSQPY1ljAAZVEsYWSsdEC2YR3vnjvz7kDZDQDG8%3D&sp=r&spr=https&sr=b&st=2024-03-17T03%3A44%3A11Z&sv=2021-12-02

@hartwork
Copy link
Author

Regarding a workaround: Is there a way to force the CI into using version "20240304.1.0" of image "Ubuntu 22.04" again in my workflow? Do I have any sane options but having a stuck-at-red CI until the image is updated?

@mikhailkoliada
Copy link
Contributor

Duplicate of #9491

@mikhailkoliada mikhailkoliada marked this as a duplicate of #9491 Mar 17, 2024
@phil-blain
Copy link

I don't think it's possible to request the older image. In the mean time, the workaround is (cf #9491 (comment))

    - name: Fix kernel mmap rnd bits
      # Asan in llvm 14 provided in ubuntu 22.04 is incompatible with
      # high-entropy ASLR in much newer kernels that GitHub runners are
      # using leading to random crashes: https://reviews.llvm.org/D148280
      run: sudo sysctl vm.mmap_rnd_bits=28

@hartwork
Copy link
Author

In the mean time, the workaround is (cf #9491 (comment))

    - name: Fix kernel mmap rnd bits
      # Asan in llvm 14 provided in ubuntu 22.04 is incompatible with
      # high-entropy ASLR in much newer kernels that GitHub runners are
      # using leading to random crashes: https://reviews.llvm.org/D148280
      run: sudo sysctl vm.mmap_rnd_bits=28

@phil-blain from what I can see, this seems to work well. Thank you!

I don't think it's possible to request the older image.

That's a design problem then that will need fixing mid-term. I'll try to come up with a dedicated issue explaining why the current updating and magically moving images is a maintenance nightmare on the GitHub Actions user side of things for a while already.

@hartwork hartwork changed the title Endless AddressSanitizer:DEADLYSIGNAL loop with GCC 13 since Ubuntu 22.04 202403**10**.1.0 but not before Endless AddressSanitizer:DEADLYSIGNAL loop with GCC 13 since Ubuntu 22.04 202403*10*.1.0 but not before Mar 17, 2024
andy5995 added a commit to andy5995/canfigger that referenced this issue Mar 18, 2024
These weird issues are probably related to
actions/runner-images#9524
andy5995 added a commit to andy5995/canfigger that referenced this issue Mar 18, 2024
* Fail-fast false, add gcc-9 and clang-13

* meson_options.txt: Add single quotes around boolean values

* Modify concurrency

These weird issues are probably related to
actions/runner-images#9524

* Configure meson with sanitize

This was removed from the default options in
abb9da2
Dennisbonke added a commit to Dennisbonke/mlibc that referenced this issue Mar 18, 2024
Dennisbonke added a commit to Dennisbonke/mlibc that referenced this issue Mar 18, 2024
FedeDP added a commit to falcosecurity/libs that referenced this issue Mar 18, 2024
See actions/runner-images#9524 (comment) for the fix.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
FedeDP added a commit to falcosecurity/libs that referenced this issue Mar 18, 2024
See actions/runner-images#9524 (comment) for the fix.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
poiana pushed a commit to falcosecurity/libs that referenced this issue Mar 18, 2024
See actions/runner-images#9524 (comment) for the fix.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
no92 added a commit to no92/mlibc that referenced this issue Mar 18, 2024
no92 added a commit to Dennisbonke/mlibc that referenced this issue Mar 18, 2024
FedeDP added a commit to falcosecurity/libs that referenced this issue Mar 19, 2024
See actions/runner-images#9524 (comment) for the fix.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
greensky00 added a commit to greensky00/NuRaft that referenced this issue Mar 20, 2024
greensky00 added a commit to eBay/NuRaft that referenced this issue Mar 20, 2024
poiana pushed a commit to falcosecurity/libs that referenced this issue Mar 22, 2024
See actions/runner-images#9524 (comment) for the fix.

Signed-off-by: Federico Di Pierro <nierro92@gmail.com>
mlindgren added a commit to microsoft/SymCrypt that referenced this issue Apr 5, 2024
When running our unit tests on the address sanitizer build using the GLIBC_TUNABLES to disable use of AVX, we intermittently hit a bug where the console outputs AddressSanitizer:DEADLYSIGNAL in an infinite loop. This appears to be caused by an incompatibility between certain versions of GCC and certain Linux kernels. See e.g. this GitHub issue: actions/runner-images#9524

Currently the OneBranch build pipeline uses an Ubuntu container running on a Mariner kernel. Using an Ubuntu kernel instead should resolve the issue.

Tested: pipeline builds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants