Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM bug with StaticArray#sort_by! on aarch64-linux-musl and aarch64-darwin #11358

Open
straight-shoota opened this issue Oct 24, 2021 · 9 comments

Comments

@straight-shoota
Copy link
Member

Two of our aarch64 CI jobs consistently fail with an invalid memory access while building std_spec since #10889.

Latest master runs:

The invalid memory access comes from LLVM. With a debug compiler we get more info. The backtrace is identical for both targets:

Invalid memory access (signal 11) at address 0x2e
[0x5638f657f176] *Exception::CallStack::print_backtrace:Nil +118
[0x5638f654fcda] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +330
[0x7fd8e12603c0] ???
[0x7fd8e24f1110] ???
[0x7fd8e250baaf] _ZNK4llvm14TargetLowering11LowerCallToERNS0_16CallLoweringInfoE +6607
[0x7fd8e2528cfa] _ZN4llvm19SelectionDAGBuilder14lowerInvokableERNS_14TargetLowering16CallLoweringInfoEPKNS_10BasicBlockE +1034
[0x7fd8e2513aea] _ZN4llvm19SelectionDAGBuilder11LowerCallToENS_17ImmutableCallSiteENS_7SDValueEbPKNS_10BasicBlockE +2442
[0x7fd8e24feed1] _ZN4llvm19SelectionDAGBuilder9visitCallERKNS_8CallInstE +1441
[0x7fd8e24f41c9] _ZN4llvm19SelectionDAGBuilder5visitERKNS_11InstructionE +105
[0x7fd8e2577c10] _ZN4llvm16SelectionDAGISel16SelectBasicBlockENS_14ilist_iteratorINS_12ilist_detail12node_optionsINS_11InstructionELb0ELb0EvEELb0ELb1EEES6_Rb +112
[0x7fd8e2577a07] _ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE +6487
[0x7fd8e2575396] _ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE +1942
[0x7fd8e222e5e8] _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE +280
[0x7fd8e2098d76] _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE +1126
[0x7fd8e2098ff3] _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE +51
[0x7fd8e20994a0] _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE +960
[0x7fd8e326f523] ???
[0x7fd8e326f35f] LLVMTargetMachineEmitToFile +175
[0x5638f736d019] *LLVM::TargetMachine#emit_to_file<LLVM::Module, String, LLVM::CodeGenFileType>:Bool +89
[0x5638f736d05b] *LLVM::TargetMachine#emit_obj_to_file<LLVM::Module, String>:Bool +11
[0x5638f7363c19] *Crystal::Compiler#cross_compile<Crystal::Program, Array(Crystal::Compiler::CompilationUnit), String>:Nil +313
[0x5638f73637ce] *Crystal::Compiler#codegen<Crystal::Program, Crystal::ASTNode+, Array(Crystal::Compiler::Source), String>:(Tuple(Array(Crystal::Compiler::CompilationUnit), Array(String)) | Nil) +1774
[0x5638f736789c] *Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result +188
[0x5638f760bab9] *Crystal::Command::CompilerConfig#compile<String>:Crystal::Compiler::Result +57
[0x5638f760ba64] *Crystal::Command::CompilerConfig#compile:Crystal::Compiler::Result +36
[0x5638f75fade2] *Crystal::Command#build:Crystal::Compiler::Result +290
[0x5638f75f9fed] *Crystal::Command#run:(Bool | Nil) +413
[0x5638f75f9d19] *Crystal::Command::run<Array(String)>:(Bool | Nil) +25
[0x5638f75f9cdd] *Crystal::Command::run:(Bool | Nil) +29
[0x5638f6530984] __crystal_main +2820
[0x5638f711fe16] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x5638f711fc25] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +53
[0x5638f653c8e6] main +6
[0x7fd8e10070b3] __libc_start_main +243
[0x5638f652fdbe] _start +46
[0x0] ???

I've been able to reduce the code to this:

StaticArray["foo"].map {|e| {e, 1}}.sort!

The generic argument type must be a reference type. Inlining sort! as to_slice.sort! makes the error go away.
The backtrace is a little bit different from the one from static_array_spec, though:

Invalid memory access (signal 11) at address 0x28
[0x559a4371c176] *Exception::CallStack::print_backtrace:Nil +118
[0x559a436eccda] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +330
[0x7fad219ea3c0] ???
[0x7fad22bea02e] ???
[0x7fad22c74088] ???
[0x7fad22c7375c] _ZN4llvm18ScheduleDAGSDNodes12EmitScheduleERNS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEE +748
[0x7fad22d02466] _ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv +1702
[0x7fad22d01a07] _ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE +6487
[0x7fad22cff396] _ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE +1942
[0x7fad229b85e8] _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE +280
[0x7fad22822d76] _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE +1126
[0x7fad22822ff3] _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE +51
[0x7fad228234a0] _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE +960
[0x7fad239f9523] ???
[0x7fad239f935f] LLVMTargetMachineEmitToFile +175
[0x559a4450a019] *LLVM::TargetMachine#emit_to_file<LLVM::Module, String, LLVM::CodeGenFileType>:Bool +89
[0x559a4450a05b] *LLVM::TargetMachine#emit_obj_to_file<LLVM::Module, String>:Bool +11
[0x559a44500c19] *Crystal::Compiler#cross_compile<Crystal::Program, Array(Crystal::Compiler::CompilationUnit), String>:Nil +313
[0x559a445007ce] *Crystal::Compiler#codegen<Crystal::Program, Crystal::ASTNode+, Array(Crystal::Compiler::Source), String>:(Tuple(Array(Crystal::Compiler::CompilationUnit), Array(String)) | Nil) +1774
[0x559a4450489c] *Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result +188
[0x559a447a8ab9] *Crystal::Command::CompilerConfig#compile<String>:Crystal::Compiler::Result +57
[0x559a447a8a64] *Crystal::Command::CompilerConfig#compile:Crystal::Compiler::Result +36
[0x559a44797de2] *Crystal::Command#build:Crystal::Compiler::Result +290
[0x559a44796fed] *Crystal::Command#run:(Bool | Nil) +413
[0x559a44796d19] *Crystal::Command::run<Array(String)>:(Bool | Nil) +25
[0x559a44796cdd] *Crystal::Command::run:(Bool | Nil) +29
[0x559a436cd984] __crystal_main +2820
[0x559a442bce16] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x559a442bcc25] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +53
[0x559a436d98e6] main +6
[0x7fad217910b3] __libc_start_main +243
[0x559a436ccdbe] _start +46
[0x0] ???
@straight-shoota straight-shoota changed the title LLVM bug with StaticArray sort on aarch64-linux-musl and aarch64-darwin LLVM bug with StaticArray#sort_by! on aarch64-linux-musl and aarch64-darwin Oct 24, 2021
@asterite
Copy link
Member

Did you mean with a debug LLVM?

@straight-shoota
Copy link
Member Author

No, a compiler build with debug info. Just a local make crystal vs. the distribution compiler.

@asterite
Copy link
Member

If you compile LLVM with debug mode on (assertions on) you'll find out the issue.

How would a compiler build with debug info help in this case? LLVM is dying when generating an obj file.

@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 24, 2021

Without debug info, the stacktrace is just this:

Invalid memory access (signal 11) at address 0x0
[0x7f3480cd30e6] ???
[0x7f3480cd30b3] ???
[0x7f3481d469c4] ???

So you wouldn't even know it's caused by LLVM.

@asterite
Copy link
Member

Oh, I missed that it's already using a compiler with debug info on.

That said, to understand the problem I suggest using an LLVM debug build.

@straight-shoota
Copy link
Member Author

straight-shoota commented Oct 24, 2021

This does not reproduce with LLVM 13. So I'd assume it's a legitimate LLVM bug that has been fixed by now. I didn't check with 12 (original compiler was built with patched 11.1 from homebrew locally and 10.0 in CI). Maybe it's even a different face of one of the bugs we've already known about.

Anyways, I'm not sure we need to do much about this. Upcoming Crystal releases will presumablty be built with LLVM 13. When we release StaticArray#sort_by in the next release, the respective compiler will be able to handle that.

For the record, somewhen in between I also saw this: LLVM ERROR: Unknown mismatch in getCopyFromParts!

@beta-ziliani
Copy link
Member

Since this is now handled by the compiler, I think we can close it. WDYT @straight-shoota ?

@straight-shoota
Copy link
Member Author

Sorry, what is handled by the compiler? The workaround from #11359 is still in place:

# StaticArray#sort_by and #sort_by! don't compile on aarch64-darwin and
# aarch64-linux-musl due to a codegen error caused by LLVM < 13.0.0.
# See https://github.com/crystal-lang/crystal/issues/11358 for details.
{% unless compare_versions(Crystal::LLVM_VERSION, "13.0.0") < 0 && flag?(:aarch64) && (flag?(:musl) || flag?(:darwin)) %}

@HertzDevil
Copy link
Contributor

This is also reproducible for AArch64 Android, by the way: https://github.com/crystal-lang/crystal/actions/runs/4331137915/jobs/7562790343

HertzDevil added a commit to HertzDevil/crystal that referenced this issue Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants