Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimum makes Julia crash on A64FX #44401

Closed
giordano opened this issue Mar 2, 2022 · 4 comments
Closed

minimum makes Julia crash on A64FX #44401

giordano opened this issue Mar 2, 2022 · 4 comments
Labels
compiler:llvm For issues that relate to LLVM kind:upstream The issue is with an upstream dependency, e.g. LLVM system:arm ARMv7 and AArch64

Comments

@giordano
Copy link
Contributor

giordano commented Mar 2, 2022

$ JULIA_LLVM_ARGS="--aarch64-sve-vector-bits-min=512" julia -q
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (aarch64-unknown-linux-gnu)
  CPU: unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, a64fx)
Environment:
  JULIA_LLVM_ARGS = --aarch64-sve-vector-bits-min=512

julia> minimum([1])
LLVM ERROR: Cannot select: 0xadee70: v2i64 = AArch64ISD::DUPLANE64 0xada7e0, Constant:i64<1>, reduce.jl:638
  0xada7e0: nxv2i64 = AArch64ISD::SMIN_PRED 0xaaf958, 0x917e98, 0xadd400, reduce.jl:638
    0xaaf958: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
      0xadeed8: i64 = TargetConstant<2>
    0x917e98: nxv2i64 = AArch64ISD::SMIN_PRED 0xaaf958, 0xae1c30, 0xadce50, reduce.jl:638
      0xaaf958: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
        0xadeed8: i64 = TargetConstant<2>
      0xae1c30: nxv2i64 = AArch64ISD::SMIN_PRED 0xaaf958, 0xae24b8, 0xadcf20, reduce.jl:638
        0xaaf958: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
          0xadeed8: i64 = TargetConstant<2>
        0xae24b8: nxv2i64 = insert_subvector undef:nxv2i64, 0xada230, Constant:i64<0>, reduce.jl:638
          0xadefa8: nxv2i64 = undef
          0xada230: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %58, reduce.jl:638
            0xadf280: v2i64 = Register %58
          0xada640: i64 = Constant<0>
        0xadcf20: nxv2i64 = insert_subvector undef:nxv2i64, 0xadcf88, Constant:i64<0>, reduce.jl:638
          0xadefa8: nxv2i64 = undef
          0xadcf88: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %59, reduce.jl:638
            0xadd878: v2i64 = Register %59
          0xada640: i64 = Constant<0>
      0xadce50: nxv2i64 = insert_subvector undef:nxv2i64, 0xadcff0, Constant:i64<0>, reduce.jl:638
        0xadefa8: nxv2i64 = undef
        0xadcff0: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %60, reduce.jl:638
          0xada298: v2i64 = Register %60
        0xada640: i64 = Constant<0>
    0xadd400: nxv2i64 = insert_subvector undef:nxv2i64, 0xaaf548, Constant:i64<0>, reduce.jl:638
      0xadefa8: nxv2i64 = undef
      0xaaf548: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %61, reduce.jl:638
        0xadd6d8: v2i64 = Register %61
      0xada640: i64 = Constant<0>
  0xada438: i64 = Constant<1>
In function: julia_mapreduce_impl_65

signal (6): Aborted
in expression starting at REPL[2]:1
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so (unknown line)
Allocations: 649721 (Pool: 649264; Big: 457); GC: 1
Aborted

This is a reduced reproducer of the crashes you get with BenchmarkTools.@benchmark (in particular the BenchmarkTools.asciihist function).

Edit: looking at the error message referencing mapreduce_impl, I've got a more basic reproducer:

julia> Base.mapreduce_impl(identity, min, [1], 1, 1)
LLVM ERROR: Cannot select: 0xad9c80: v2i64 = AArch64ISD::DUPLANE64 0x917dc8, Constant:i64<1>, reduce.jl:638
  0x917dc8: nxv2i64 = AArch64ISD::SMIN_PRED 0xae1fd8, 0xadd608, 0xadf968, reduce.jl:638
    0xae1fd8: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
      0xae5210: i64 = TargetConstant<2>
    0xadd608: nxv2i64 = AArch64ISD::SMIN_PRED 0xae1fd8, 0xadd878, 0xadeda0, reduce.jl:638
      0xae1fd8: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
        0xae5210: i64 = TargetConstant<2>
      0xadd878: nxv2i64 = AArch64ISD::SMIN_PRED 0xae1fd8, 0xae57c0, 0xadf760, reduce.jl:638
        0xae1fd8: nxv2i1 = AArch64ISD::PTRUE TargetConstant:i64<2>, reduce.jl:638
          0xae5210: i64 = TargetConstant<2>
        0xae57c0: nxv2i64 = insert_subvector undef:nxv2i64, 0xadf558, Constant:i64<0>, reduce.jl:638
          0xad9c18: nxv2i64 = undef
          0xadf558: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %58, reduce.jl:638
            0xaddae8: v2i64 = Register %58
          0xae1ea0: i64 = Constant<0>
        0xadf760: nxv2i64 = insert_subvector undef:nxv2i64, 0xae28c8, Constant:i64<0>, reduce.jl:638
          0xad9c18: nxv2i64 = undef
          0xae28c8: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %59, reduce.jl:638
            0x9477b8: v2i64 = Register %59
          0xae1ea0: i64 = Constant<0>
      0xadeda0: nxv2i64 = insert_subvector undef:nxv2i64, 0xae22b0, Constant:i64<0>, reduce.jl:638
        0xad9c18: nxv2i64 = undef
        0xae22b0: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %60, reduce.jl:638
          0xada7e0: v2i64 = Register %60
        0xae1ea0: i64 = Constant<0>
    0xadf968: nxv2i64 = insert_subvector undef:nxv2i64, 0xad9d50, Constant:i64<0>, reduce.jl:638
      0xad9c18: nxv2i64 = undef
      0xad9d50: v2i64,ch = CopyFromReg 0x6ad628, Register:v2i64 %61, reduce.jl:638
        0xae4fa0: v2i64 = Register %61
      0xae1ea0: i64 = Constant<0>
  0xae5688: i64 = Constant<1>
In function: julia_mapreduce_impl_277

First part of the backtrace in GDB:

(gdb) bt
#0  0x0000400000132bec in raise () from /lib64/libc.so.6
#1  0x000040000012096c in abort () from /lib64/libc.so.6
#2  0x0000400001199c60 in llvm::report_fatal_error(llvm::Twine const&, bool) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#3  0x0000400001199d98 in llvm::report_fatal_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
   from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#4  0x00004000019c2698 in llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#5  0x00004000019c5320 in llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#6  0x0000400003017994 in (anonymous namespace)::AArch64DAGToDAGISel::Select(llvm::SDNode*) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#7  0x00004000019c1310 in llvm::SelectionDAGISel::DoInstructionSelection() () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#8  0x00004000019c7fa8 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#9  0x00004000019cab30 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#10 0x00004000019cc414 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) [clone .part.869] () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#11 0x00004000015b7ad4 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#12 0x00004000013705c4 in llvm::FPPassManager::runOnFunction(llvm::Function&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#13 0x0000400001370d08 in llvm::FPPassManager::runOnModule(llvm::Module&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#14 0x000040000136fa54 in llvm::legacy::PassManagerImpl::run(llvm::Module&) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#15 0x00004000004741d0 in JuliaOJIT::CompilerT::operator() (this=0x0, M=...) at /buildworker/worker/package_linuxaarch64/build/src/jitlayers.cpp:612
#16 0x0000400002b0f394 in llvm::orc::IRCompileLayer::emit(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >, llvm::orc::ThreadSafeModule)
    () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#17 0x0000400002b197fc in llvm::orc::BasicIRLayerMaterializationUnit::materialize(std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >) ()
   from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#18 0x0000400002af0fb0 in llvm::orc::ExecutionSession::materializeOnCurrentThread(std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#19 0x0000400002aef2e4 in std::_Function_handler<void (std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >), void (*)(std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >)>::_M_invoke(std::_Any_data const&, std::unique_ptr<llvm::orc::MaterializationUnit, std::default_delete<llvm::orc::MaterializationUnit> >&&, std::unique_ptr<llvm::orc::MaterializationResponsibility, std::default_delete<llvm::orc::MaterializationResponsibility> >&&) ()
   from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#20 0x0000400002aefa98 in llvm::orc::ExecutionSession::dispatchOutstandingMUs() () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#21 0x0000400002af75cc in llvm::orc::ExecutionSession::OL_completeLookup(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState> >, std::shared_ptr<llvm::orc::AsynchronousSymbolQuery>, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) ()
   from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#22 0x0000400002af7ac4 in llvm::orc::InProgressFullLookupState::complete(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState> >) ()
   from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#23 0x0000400002ae8f7c in llvm::orc::ExecutionSession::OL_applyQueryPhase1(std::unique_ptr<llvm::orc::InProgressLookupState, std::default_delete<llvm::orc::InProgressLookupState> >, llvm::Error) ()
   from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#24 0x0000400002aefd64 in llvm::orc::ExecutionSession::lookup(llvm::orc::LookupKind, std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolLookupSet, llvm::orc::SymbolState, llvm::unique_function<void (llvm::Expected<llvm::DenseMap<llvm::orc::SymbolStringPtr, llvm::JITEvaluatedSymbol, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr>, llvm::detail::DenseMapPair<llvm::orc::SymbolStringPtr, llvm::JITEvaluatedSymbol> > >)>, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#25 0x0000400002af03f0 in llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolLookupSet const&, llvm::orc::LookupKind, llvm::orc::SymbolState, std::function<void (llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> >, llvm::DenseMapInfo<llvm::orc::JITDylib*>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr> > > > const&)>) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so
#26 0x0000400002af074c in llvm::orc::ExecutionSession::lookup(std::vector<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags>, std::allocator<std::pair<llvm::orc::JITDylib*, llvm::orc::JITDylibLookupFlags> > > const&, llvm::orc::SymbolStringPtr, llvm::orc::SymbolState) () from /vol0003/ra000019/a04463/julia-1.7.2-aarch64/bin/../lib/julia/libLLVM-12jl.so

Contrary to #44263, LLVM can generated the code:

julia> @code_llvm debuginfo=:none Base.mapreduce_impl(identity, min, [1], 1, 1)
define i64 @julia_mapreduce_impl_186({}* nonnull align 16 dereferenceable(40) %0, i64 signext %1, i64 signext %2) #0 {
top:
  %3 = alloca [1 x i64], align 8
  %4 = add i64 %1, -1
  %5 = bitcast {}* %0 to i64**
  %6 = load i64*, i64** %5, align 8
  %7 = getelementptr inbounds i64, i64* %6, i64 %4
  %8 = load i64, i64* %7, align 8
  %9 = add i64 %1, 1
  %10 = add i64 %1, 253
  %11 = add i64 %2, -3
  %.not63 = icmp sgt i64 %10, %11
  br i1 %.not63, label %L107, label %L30.lr.ph

L30.lr.ph:                                        ; preds = %top
  %12 = getelementptr inbounds [1 x i64], [1 x i64]* %3, i64 0, i64 0
  %13 = bitcast {}* %0 to {}**
  %14 = getelementptr inbounds {}*, {}** %13, i64 3
  %15 = bitcast {}** %14 to i64*
  br label %L30

L30:                                              ; preds = %L98, %L30.lr.ph
  %value_phi570 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi21, %L98 ]
  %value_phi469 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi20, %L98 ]
  %value_phi368 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi19, %L98 ]
  %value_phi267 = phi i64 [ %8, %L30.lr.ph ], [ %value_phi18, %L98 ]
  %value_phi165 = phi i64 [ %9, %L30.lr.ph ], [ %40, %L98 ]
  %value_phi64 = phi i64 [ %10, %L30.lr.ph ], [ %41, %L98 ]
  %16 = call i64 @j_steprange_last_188(i64 signext %value_phi165, i64 signext 4, i64 signext %value_phi64) #0
  %.not41 = icmp sgt i64 %value_phi165, %16
  br i1 %.not41, label %L81, label %L47.preheader

L47.preheader:                                    ; preds = %L30
  %17 = load i64*, i64** %5, align 8
  br label %L47

L47:                                              ; preds = %L47, %L47.preheader
  %value_phi9 = phi i64 [ %24, %L47 ], [ %value_phi267, %L47.preheader ]
  %value_phi10 = phi i64 [ %28, %L47 ], [ %value_phi368, %L47.preheader ]
  %value_phi11 = phi i64 [ %32, %L47 ], [ %value_phi469, %L47.preheader ]
  %value_phi12 = phi i64 [ %21, %L47 ], [ %value_phi570, %L47.preheader ]
  %value_phi13 = phi i64 [ %33, %L47 ], [ %value_phi165, %L47.preheader ]
  %18 = add i64 %value_phi13, -1
  %19 = getelementptr inbounds i64, i64* %17, i64 %18
  %20 = load i64, i64* %19, align 8
  %.not42 = icmp slt i64 %20, %value_phi12
  %21 = select i1 %.not42, i64 %20, i64 %value_phi12
  %22 = getelementptr inbounds i64, i64* %17, i64 %value_phi13
  %23 = load i64, i64* %22, align 8
  %.not43 = icmp slt i64 %23, %value_phi9
  %24 = select i1 %.not43, i64 %23, i64 %value_phi9
  %25 = add i64 %value_phi13, 1
  %26 = getelementptr inbounds i64, i64* %17, i64 %25
  %27 = load i64, i64* %26, align 8
  %.not44 = icmp slt i64 %27, %value_phi10
  %28 = select i1 %.not44, i64 %27, i64 %value_phi10
  %29 = add i64 %value_phi13, 2
  %30 = getelementptr inbounds i64, i64* %17, i64 %29
  %31 = load i64, i64* %30, align 8
  %.not45 = icmp slt i64 %31, %value_phi11
  %32 = select i1 %.not45, i64 %31, i64 %value_phi11
  %.not46 = icmp eq i64 %value_phi13, %16
  %33 = add i64 %value_phi13, 4
  br i1 %.not46, label %L81, label %L47

L81:                                              ; preds = %L47, %L30
  %value_phi18 = phi i64 [ %value_phi267, %L30 ], [ %24, %L47 ]
  %value_phi19 = phi i64 [ %value_phi368, %L30 ], [ %28, %L47 ]
  %value_phi20 = phi i64 [ %value_phi469, %L30 ], [ %32, %L47 ]
  %value_phi21 = phi i64 [ %value_phi570, %L30 ], [ %21, %L47 ]
  %34 = add i64 %value_phi64, 3
  %35 = load i64, i64* %15, align 8
  %36 = icmp slt i64 %34, 1
  %37 = icmp sgt i64 %34, %35
  %38 = or i1 %36, %37
  br i1 %38, label %L96, label %L98

L96:                                              ; preds = %L81
  store i64 %34, i64* %12, align 8
  %39 = call nonnull {}* @j_throw_boundserror_189({}* nonnull %0, [1 x i64]* nocapture readonly %3) #0
  call void @llvm.trap()
  unreachable

L98:                                              ; preds = %L81
  %40 = add i64 %value_phi165, 256
  %41 = add i64 %value_phi64, 256
  %.not = icmp sgt i64 %41, %11
  br i1 %.not, label %L6.L107_crit_edge, label %L30

L103:                                             ; preds = %L126, %middle.block, %L107
  %merge = phi i64 [ %44, %L107 ], [ %68, %middle.block ], [ %72, %L126 ]
  ret i64 %merge

L6.L107_crit_edge:                                ; preds = %L98
  store i64 %34, i64* %12, align 8
  br label %L107

L107:                                             ; preds = %L6.L107_crit_edge, %top
  %value_phi1.lcssa = phi i64 [ %40, %L6.L107_crit_edge ], [ %9, %top ]
  %value_phi2.lcssa = phi i64 [ %value_phi18, %L6.L107_crit_edge ], [ %8, %top ]
  %value_phi3.lcssa = phi i64 [ %value_phi19, %L6.L107_crit_edge ], [ %8, %top ]
  %value_phi4.lcssa = phi i64 [ %value_phi20, %L6.L107_crit_edge ], [ %8, %top ]
  %value_phi5.lcssa = phi i64 [ %value_phi21, %L6.L107_crit_edge ], [ %8, %top ]
  %.not47 = icmp slt i64 %value_phi2.lcssa, %value_phi5.lcssa
  %42 = select i1 %.not47, i64 %value_phi2.lcssa, i64 %value_phi5.lcssa
  %.not48 = icmp slt i64 %value_phi4.lcssa, %value_phi3.lcssa
  %43 = select i1 %.not48, i64 %value_phi4.lcssa, i64 %value_phi3.lcssa
  %.not49 = icmp slt i64 %43, %42
  %44 = select i1 %.not49, i64 %43, i64 %42
  %.not50 = icmp sgt i64 %value_phi1.lcssa, %2
  %45 = add i64 %value_phi1.lcssa, -1
  %46 = select i1 %.not50, i64 %45, i64 %2
  %.not51 = icmp slt i64 %46, %value_phi1.lcssa
  br i1 %.not51, label %L103, label %L126.preheader

L126.preheader:                                   ; preds = %L107
  %47 = load i64*, i64** %5, align 8
  %48 = add i64 %46, 1
  %49 = sub i64 %48, %value_phi1.lcssa
  %min.iters.check = icmp ult i64 %49, 8
  br i1 %min.iters.check, label %L126, label %vector.ph

vector.ph:                                        ; preds = %L126.preheader
  %n.vec = and i64 %49, -8
  %ind.end = add i64 %value_phi1.lcssa, %n.vec
  %minmax.ident.splatinsert = insertelement <2 x i64> poison, i64 %44, i32 0
  %minmax.ident.splat = shufflevector <2 x i64> %minmax.ident.splatinsert, <2 x i64> poison, <2 x i32> zeroinitializer
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.phi = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %63, %vector.body ]
  %vec.phi103 = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %64, %vector.body ]
  %vec.phi104 = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %65, %vector.body ]
  %vec.phi105 = phi <2 x i64> [ %minmax.ident.splat, %vector.ph ], [ %66, %vector.body ]
  %offset.idx = add i64 %value_phi1.lcssa, %index
  %50 = add i64 %offset.idx, -1
  %51 = getelementptr inbounds i64, i64* %47, i64 %50
  %52 = bitcast i64* %51 to <2 x i64>*
  %wide.load = load <2 x i64>, <2 x i64>* %52, align 8
  %53 = getelementptr inbounds i64, i64* %51, i64 2
  %54 = bitcast i64* %53 to <2 x i64>*
  %wide.load106 = load <2 x i64>, <2 x i64>* %54, align 8
  %55 = getelementptr inbounds i64, i64* %51, i64 4
  %56 = bitcast i64* %55 to <2 x i64>*
  %wide.load107 = load <2 x i64>, <2 x i64>* %56, align 8
  %57 = getelementptr inbounds i64, i64* %51, i64 6
  %58 = bitcast i64* %57 to <2 x i64>*
  %wide.load108 = load <2 x i64>, <2 x i64>* %58, align 8
  %59 = icmp slt <2 x i64> %wide.load, %vec.phi
  %60 = icmp slt <2 x i64> %wide.load106, %vec.phi103
  %61 = icmp slt <2 x i64> %wide.load107, %vec.phi104
  %62 = icmp slt <2 x i64> %wide.load108, %vec.phi105
  %63 = select <2 x i1> %59, <2 x i64> %wide.load, <2 x i64> %vec.phi
  %64 = select <2 x i1> %60, <2 x i64> %wide.load106, <2 x i64> %vec.phi103
  %65 = select <2 x i1> %61, <2 x i64> %wide.load107, <2 x i64> %vec.phi104
  %66 = select <2 x i1> %62, <2 x i64> %wide.load108, <2 x i64> %vec.phi105
  %index.next = add i64 %index, 8
  %67 = icmp eq i64 %index.next, %n.vec
  br i1 %67, label %middle.block, label %vector.body

middle.block:                                     ; preds = %vector.body
  %rdx.minmax.cmp = icmp slt <2 x i64> %63, %64
  %rdx.minmax.select = select <2 x i1> %rdx.minmax.cmp, <2 x i64> %63, <2 x i64> %64
  %rdx.minmax.cmp109 = icmp slt <2 x i64> %rdx.minmax.select, %65
  %rdx.minmax.select110 = select <2 x i1> %rdx.minmax.cmp109, <2 x i64> %rdx.minmax.select, <2 x i64> %65
  %rdx.minmax.cmp111 = icmp slt <2 x i64> %rdx.minmax.select110, %66
  %rdx.minmax.select112 = select <2 x i1> %rdx.minmax.cmp111, <2 x i64> %rdx.minmax.select110, <2 x i64> %66
  %rdx.shuf = shufflevector <2 x i64> %rdx.minmax.select112, <2 x i64> poison, <2 x i32> <i32 1, i32 undef>
  %rdx.minmax.cmp113 = icmp slt <2 x i64> %rdx.minmax.select112, %rdx.shuf
  %rdx.minmax.select114 = select <2 x i1> %rdx.minmax.cmp113, <2 x i64> %rdx.minmax.select112, <2 x i64> %rdx.shuf
  %68 = extractelement <2 x i64> %rdx.minmax.select114, i32 0
  %cmp.n = icmp eq i64 %49, %n.vec
  br i1 %cmp.n, label %L103, label %L126

L126:                                             ; preds = %L126, %middle.block, %L126.preheader
  %value_phi25 = phi i64 [ %73, %L126 ], [ %ind.end, %middle.block ], [ %value_phi1.lcssa, %L126.preheader ]
  %value_phi27 = phi i64 [ %72, %L126 ], [ %68, %middle.block ], [ %44, %L126.preheader ]
  %69 = add i64 %value_phi25, -1
  %70 = getelementptr inbounds i64, i64* %47, i64 %69
  %71 = load i64, i64* %70, align 8
  %.not52 = icmp slt i64 %71, %value_phi27
  %72 = select i1 %.not52, i64 %71, i64 %value_phi27
  %.not53 = icmp eq i64 %value_phi25, %46
  %73 = add i64 %value_phi25, 1
  br i1 %.not53, label %L103, label %L126
}
@giordano giordano added the system:arm ARMv7 and AArch64 label Mar 2, 2022
@giordano
Copy link
Contributor Author

giordano commented Mar 2, 2022

Ok, the title of the issue may not be accurate: minimum([1]) works on the latest nightly:

julia> versioninfo()
Julia Version 1.9.0-DEV.106
Commit 394af38501 (2022-02-28 23:39 UTC)
Platform Info:
  OS: Linux (aarch64-unknown-linux-gnu)
  CPU: 50 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, a64fx)
  Threads: 1 on 50 virtual cores
Environment:
  JULIA_LLVM_ARGS = --aarch64-sve-vector-bits-min=512

However BenchmarkTools.asciihist([1]) still crashes with the same error reported above, and the code reported in the above issue was a reduced reproducer in Julia v1.7 of the crash I always get with the ASCII histogram generated by BenchmarkTools.@benchmark.

@Keno
Copy link
Member

Keno commented Mar 3, 2022

Try to reproduce on LLVM master with llc and then file the reproducer upstream?

@giordano
Copy link
Contributor Author

I still haven't got the time to (re)build Julia with LLVM master because compiling anything on A64FX is excruciatingly slow (and compiling LLVM even more so), but the error message looks like llvm/llvm-project#53331

@giordano giordano added kind:upstream The issue is with an upstream dependency, e.g. LLVM compiler:llvm For issues that relate to LLVM labels Mar 19, 2022
@giordano
Copy link
Contributor Author

It appears @benchmark finally works on Julia master with LLVM 14 (although llvm/llvm-project#53331 is still open):

$ JULIA_LLVM_ARGS="--aarch64-sve-vector-bits-min=512" ./julia -q
julia> using BenchmarkTools

julia> function sumsimd(x)
           s = zero(eltype(x))
           @simd for xi in x
               s += xi
           end
           s
       end
sumsimd (generic function with 1 method)

julia> @benchmark sumsimd(x) setup=(x = [randn(Float64) for _ in 1:1_000_000])
BenchmarkTools.Trial: 94 samples with 1 evaluation.
 Range (min … max):  185.532 μs … 250.102 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     206.148 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   205.783 μs ±  12.503 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▁▄▁           █  ▃  ▁   ▆ ▁                               
  ▇▄▁▄▄███▄▆▇▆▄▁▄▄▄▆▆█▁▆█▇▆█▆▇▄█▆█▁▄▆▆▆▄▁▁▁▇▁▁▁▁▁▁▁▁▄▁▁▁▁▄▁▁▁▁▄ ▁
  186 μs           Histogram: frequency by time          242 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sumsimd(x) setup=(x = [randn(Float32) for _ in 1:1_000_000])
BenchmarkTools.Trial: 92 samples with 1 evaluation.
 Range (min … max):  78.991 μs … 88.341 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     80.206 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.670 μs ±  1.603 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂    ▁█                                                      
  █▇▁▆▃██▇▆▆▇▇▇▆▄▆▃▄▆▄▄▁▁▃▄▁▄▁▁▄▁▃▁▃▄▁▃▁▁▃▃▁▁▁▃▁▃▃▁▁▁▁▁▁▁▁▁▁▃ ▁
  79 μs           Histogram: frequency by time        85.6 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark sumsimd(x) setup=(x = [randn(Float16) for _ in 1:1_000_000])
BenchmarkTools.Trial: 92 samples with 1 evaluation.
 Range (min … max):  41.970 μs … 47.470 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     42.961 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   43.265 μs ±  1.025 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▆▄▆▄▆█  ▂                                           
  ▆▄▆▆▆▁▆█████████████▄█▆▆▁▁▁▄▁▁▁▁▄▄▄▁▁▁▁▄▆▁▁▁▆▁▄▁▁▁▆▁▁▄▁▄▁▁▄ ▁
  42 μs           Histogram: frequency by time        46.1 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> versioninfo()
Julia Version 1.9.0-DEV.809
Commit 9b83dd8920 (2022-06-19 19:31 UTC)
Platform Info:
  OS: Linux (aarch64-unknown-linux-gnu)
  CPU: 48 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.3 (ORCJIT, a64fx)
  Threads: 1 on 48 virtual cores

Performance is same as #40308 (comment), which is good, but issue #44263 is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:llvm For issues that relate to LLVM kind:upstream The issue is with an upstream dependency, e.g. LLVM system:arm ARMv7 and AArch64
Projects
None yet
Development

No branches or pull requests

2 participants