Skip to content

Conversation

@vadorovsky
Copy link
Member

@vadorovsky vadorovsky commented Aug 3, 2023

Co-authored-by: Tamir Duberstein tamird@gmail.com

@vadorovsky vadorovsky force-pushed the aya/ci-build-artifacts branch 2 times, most recently from 5e06e10 to cf231e6 Compare August 3, 2023 12:27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to do something smarter now. Unlike the usage in the bpf-linker repo, we should always use the cache here. The cache key should be just an encoding of the arguments we pass to cmake.

Put differently: imagine that we are applying our changes to a new LLVM branch from rustc -- we want to make use of whatever cached build products are available to us.

Then we should always run the LLVM build and upload the result.

There are also some vestiges of the reusable workflow here (see workflow_call above), which we probably no longer need.

Lastly: how are we going to consume the build product generated here from:

  • the bpf-linker automation
  • a user's local machine
    ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably put a link in bpf-linker's README and tell them how to set the env var to pick it up. And we should probably also build for mac :P

Also I don't think we need to build dynamic libs? Nothing uses them and their usage is discouraged

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic libs are very useful when building the linker in CI because the binaries are way way smaller.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic libs are actually disabled here - dylibs are enabled by the -DBUILD_SHARED_LIBS option, which is not used here. I would prefer to stick to static ones even though they are bigger - as Alessandro said, dylibs are discouraged and I don't think we will hit any storage limits with the static ones.

About consuming the artifacts, I was thinking about having a cargo xtask build command which would by default pull the newest artifact from Github (we could just look for a newest artifact from the newest rustc/* branch). Of course only on feature/fix-di branch for now. Once everything is ready an we are going to release bpf-linker with BTF support, I think the best would be releasing binaries with the newest LLVM artifact, statically linked - in CI, on every tag. WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have hallucinated I swear I saw those options 😂 Yeah I don't have a strong opinion, personally I'd just download the tarball somewhere and then set LLVM_SYS_BLA to point to it, but up to you

Copy link
Member Author

@vadorovsky vadorovsky Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have hallucinated I swear I saw those options 😂

It wasn't a microdose! 😏 Seriously speaking, we have that option in bpf-linker CI, but I removed it here. I guess we could also remove it there?

Yeah I don't have a strong opinion, personally I'd just download the tarball somewhere and then set LLVM_SYS_BLA to point to it, but up to you

Yeah, I was thinking about doing exactly that in xtask. I have a strong preference for that, because asking checking what's the newest rustc/* branch and what's the newest artifact isn't something I want to do manually. And I would also prefer to not do it in Bash. 😛

And we should probably also build for mac :P

Good call, although I will have to figure out how to cross-build for M1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equivalent script in bpf-linker enables dynamic libs because I did in fact observe running out of space. Remember that each feature permutation links another copy of LLVM, and again for test binaries. It's not unusual for this to reach several gigabytes.

Regarding building for Mac: no need to cross compile, just run this workflow a second time on a Mac using a matrix strategy?

Copy link
Member Author

@vadorovsky vadorovsky Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding building for Mac: no need to cross compile, just run this workflow a second time on a Mac using a matrix strategy?

That won't work, because Github actions has Intel Mac runners, not M1. So I think I will have to cross-compile. I mean, I can also do that with a matrix strategy. Maybe I could even use a Mac runner to compile the Mac version for both architectures. But again, compiling for aarch64-darwin will require cross-compilation for sure, no matter on which kind of runner we do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equivalent script in bpf-linker enables dynamic libs because I did in fact observe running out of space. Remember that each feature permutation links another copy of LLVM, and again for test binaries. It's not unusual for this to reach several gigabytes.

To be precise - you were running out of space when building? Here it doesn't seem to happen. And I don't think that the size of artifacts at the end exceeds 2 GB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about cross compilation.

To be precise - you were running out of space when building? Here it doesn't seem to happen.

It doesn't happen here because you're not building bpf-linker.

And I don't think that the size of artifacts at the end exceeds 2 GB?

Best to check. Also, as I said, it is the size of the bpf-linker artifacts that is much bigger without dynamic linking.

I spent a lot of time making this work, I strongly encourage you to test thoroughly, it's about to become much harder to iterate once these things span two repos.

@vadorovsky vadorovsky force-pushed the aya/ci-build-artifacts branch 4 times, most recently from a40ebe2 to c344d3d Compare August 10, 2023 11:55
@tamird
Copy link
Member

tamird commented Aug 10, 2023

FYI rustc is now on LLVM 17 and we use https://github.com/aya-rs/llvm-project/tree/rustc/17.0-2023-07-29 in bpf-linker CI

@vadorovsky vadorovsky force-pushed the aya/ci-build-artifacts branch from c344d3d to fac40e1 Compare August 10, 2023 21:52
@vadorovsky vadorovsky changed the base branch from rustc/16.0-2023-06-05 to rustc/17.0-2023-07-29 August 10, 2023 23:00
Co-authored-by: Tamir Duberstein <tamird@gmail.com>
@vadorovsky vadorovsky force-pushed the aya/ci-build-artifacts branch from fac40e1 to 4dd261d Compare August 10, 2023 23:04
@vadorovsky
Copy link
Member Author

I'm closing this. It seems like we cannot use GitHub artifacts in the way we intended.

It's impossible to download artifacts without PAT. When you access the zip link directly, either with curl, browser or any HTTP client, without being authenticated:

https://api.github.com/repos/aya-rs/llvm-project/actions/artifacts/856190639/zip (that link shows in https://api.github.com/repos/aya-rs/llvm-project/actions/artifacts/856190639)

it always responds with:

{
  "message": "You must have the actions scope to download artifacts.",
  "documentation_url": "https://docs.github.com/rest/actions/artifacts#download-an-artifact"
}

And since LLVM 17 is in rustc already, I think we could just use it. We can rethink about static linking with LLVM when we are about to release bpf-linker, so we could official LLVM builds to link.

@vadorovsky vadorovsky closed this Aug 25, 2023
vadorovsky pushed a commit that referenced this pull request Sep 15, 2024
```
  UBSan-Standalone-sparc :: TestCases/Misc/Linux/diag-stacktrace.cpp
```
`FAIL`s on 32 and 64-bit Linux/sparc64 (and on Solaris/sparcv9, too: the
test isn't Linux-specific at all). With
`UBSAN_OPTIONS=fast_unwind_on_fatal=1`, the stack trace shows a
duplicate innermost frame:
```
compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:14:31: runtime error: execution reached the end of a value-returning function without returning a value
    #0 0x7003a708 in f() compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:14:35
    #1 0x7003a708 in f() compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:14:35
    #2 0x7003a714 in g() compiler-rt/test/ubsan/TestCases/Misc/Linux/diag-stacktrace.cpp:17:38
```
which isn't seen with `fast_unwind_on_fatal=0`.

This turns out to be another fallout from fixing
`__builtin_return_address`/`__builtin_extract_return_addr` on SPARC. In
`sanitizer_stacktrace_sparc.cpp` (`BufferedStackTrace::UnwindFast`) the
`pc` arg is the return address, while `pc1` from the stack frame
(`fr_savpc`) is the address of the `call` insn, leading to a double
entry for the innermost frame in `trace_buffer[]`.

This patch fixes this by moving the adjustment before all uses.

Tested on `sparc64-unknown-linux-gnu` and `sparcv9-sun-solaris2.11`
(with the `ubsan/TestCases/Misc/Linux` tests enabled).

(cherry picked from commit 3368a32)
vadorovsky pushed a commit that referenced this pull request May 30, 2025
`clang-repl --cuda` was previously crashing with a segmentation fault,
instead of reporting a clean error
```
(base) anutosh491@Anutoshs-MacBook-Air bin % ./clang-repl --cuda
#0 0x0000000111da4fbc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x150fbc)
#1 0x0000000111da31dc llvm::sys::RunSignalHandlers() (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x14f1dc)
#2 0x0000000111da5628 SignalHandler(int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x151628)
rust-lang#3 0x000000019b242de4 (/usr/lib/system/libsystem_platform.dylib+0x180482de4)
rust-lang#4 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0)
rust-lang#5 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0)
rust-lang#6 0x0000000107f6bac8 clang::Interpreter::createWithCUDA(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x2173ac8)
rust-lang#7 0x000000010206f8a8 main (/opt/local/libexec/llvm-20/bin/clang-repl+0x1000038a8)
rust-lang#8 0x000000019ae8c274
Segmentation fault: 11
```

The underlying issue was that the `DeviceCompilerInstance` (used for
device-side CUDA compilation) was never initialized with a `Sema`, which
is required before constructing the `IncrementalCUDADeviceParser`.

https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/DeviceOffload.cpp#L32

https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/IncrementalParser.cpp#L31

Unlike the host-side `CompilerInstance` which runs `ExecuteAction`
inside the Interpreter constructor (thereby setting up Sema), the
device-side CI was passed into the parser uninitialized, leading to an
assertion or crash when accessing its internals.

To fix this, I refactored the `Interpreter::create` method to include an
optional `DeviceCI` parameter. If provided, we know we need to take care
of this instance too. Only then do we construct the
`IncrementalCUDADeviceParser`.

(cherry picked from commit 21fb19f)
vadorovsky pushed a commit that referenced this pull request May 30, 2025
llvm#138091)

Check this error for more context
(https://github.com/compiler-research/CppInterOp/actions/runs/14749797085/job/41407625681?pr=491#step:10:531)

This fails with
```
* thread #1, name = 'CppInterOpTests', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x55500356d6d3)
  * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99
    frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830
    frame #2: 0x00007fffee20917a libclangCppInterOp.so.21.0gitstd::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 58
    frame rust-lang#3: 0x00007fffee224796 libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 838
    frame rust-lang#4: 0x00007fffee22494d libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 13
    frame rust-lang#5: 0x00007fffed95ec62 libclangCppInterOp.so.21.0gitclang::IncrementalCUDADeviceParser::~IncrementalCUDADeviceParser() + 98
    frame rust-lang#6: 0x00007fffed9551b6 libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 102
    frame rust-lang#7: 0x00007fffed95598d libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 13
    frame rust-lang#8: 0x00007fffed9181e7 libclangCppInterOp.so.21.0gitcompat::createClangInterpreter(std::vector<char const*, std::allocator<char const*>>&) + 2919
```

Problem :

1) The destructor currently handles no clearance for the DeviceParser
and the DeviceAct. We currently only have this

https://github.com/llvm/llvm-project/blob/976493822443c52a71ed3c67aaca9a555b20c55d/clang/lib/Interpreter/Interpreter.cpp#L416-L419

2) The ownership for DeviceCI currently is present in
IncrementalCudaDeviceParser. But this should be similar to how the
combination for hostCI, hostAction and hostParser are managed by the
Interpreter. As on master the DeviceAct and DeviceParser are managed by
the Interpreter but not DeviceCI. This is problematic because :
IncrementalParser holds a Sema& which points into the DeviceCI. On
master, DeviceCI is destroyed before the base class ~IncrementalParser()
runs, causing Parser::reset() to access a dangling Sema (and as Sema
holds a reference to Preprocessor which owns PragmaNamespace) we see
this
```
  * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99
    frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830

```

(cherry picked from commit 529b6fc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants