Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntelLLVM cannot handle DEAL_II_RESTRICT keyword: internal compiler error in clang fronted #16722

Open
ryanstoner1 opened this issue Mar 6, 2024 · 9 comments
Labels
Milestone

Comments

@ryanstoner1
Copy link

I can't install deal.II master using the intel compiler on TACC Stampede3 without it crashing. However, deal.II 9.5.2 compiles and installs without issue. This was first noticed in in geodynamics/aspect#5594.

I've found the source of the error and would like to open a pull request, but I wanted to open an issue before a pull request because I wasn't sure if there's a flag I could pass that would prevent this behavior and potentially make a change in candi instead.

The offending line is:

apply_matrix_vector_product(const Number2 *DEAL_II_RESTRICT matrix,

Without the DEAL_II_RESTRICT I don't have an error compiling.

With the current master (including DEAL_II_RESTRICT) I get the error:

0.	Program arguments: /opt/intel/oneapi/compiler/2024.0/bin/compiler/clang++ @/tmp/icpx08425178640TVCLz/icpxarg3UffBl
1.	<eof> parser at end of file
2.	Optimizer
 #0 0x000055fbb491a573 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5144573)
 #1 0x000055fbb4918a60 llvm::sys::RunSignalHandlers() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5142a60)
 #2 0x000055fbb49199ef llvm::sys::CleanupOnSignal(unsigned long) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x51439ef)
 #3 0x000055fbb48aec45 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #4 0x000055fbb48aee89 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #5 0x0000154fc7454df0 __restore_rt (/lib64/libc.so.6+0x54df0)
 #6 0x000055fbb45b1032 combineInstructionsOverFunction(llvm::Function&, llvm::InstructionWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, bool, bool, bool, bool, bool, llvm::LoopInfo*) InstructionCombining.cpp:0:0
 #7 0x000055fbb45afd25 llvm::InstCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x4dd9d25)
 #8 0x000055fbb535c2ed llvm::detail::PassModel<llvm::Function, llvm::InstCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BackendUtil.cpp:0:0
 #9 0x000055fbb44693a3 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x4c933a3)
#10 0x000055fbb39d145d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) NVPTXTargetMachine.cpp:0:0
#11 0x000055fbb446fd22 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x4c99d22)
#12 0x000055fbb39d11ed llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) NVPTXTargetMachine.cpp:0:0
#13 0x000055fbb44683a3 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x4c923a3)
#14 0x000055fbb5353a66 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>&, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>&) BackendUtil.cpp:0:0
#15 0x000055fbb534f05b clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream>>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5b7905b)
#16 0x000055fbb56f7323 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0
#17 0x000055fbb685412b clang::ParseAST(clang::Sema&, bool, bool) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x707e12b)
#18 0x000055fbb56f5d68 clang::CodeGenAction::ExecuteAction() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5f1fd68)
#19 0x000055fbb566752a clang::FrontendAction::Execute() (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5e9152a)
#20 0x000055fbb55f5e72 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5e1fe72)
#21 0x000055fbb56f26ff clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5f1c6ff)
#22 0x000055fbb36921b1 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x3ebc1b1)
#23 0x000055fbb368e4d3 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#24 0x000055fbb54e4239 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#25 0x000055fbb48aebde llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x50d8bde)
#26 0x000055fbb54e3319 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::optional<llvm::StringRef>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>*, bool*) const (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5d0d319)
#27 0x000055fbb5485690 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5caf690)
#28 0x000055fbb5485b02 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&, bool) const (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5cafb02)
#29 0x000055fbb54a90dc clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*>>&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x5cd30dc)
#30 0x000055fbb368c785 clang_main(int, char**, llvm::ToolContext const&) (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x3eb6785)
#31 0x000055fbb369bb5e main (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x3ec5b5e)
#32 0x0000154fc743feb0 __libc_start_call_main (/lib64/libc.so.6+0x3feb0)
#33 0x0000154fc743ff60 __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3ff60)
#34 0x000055fbb368b329 _start (/opt/intel/oneapi/compiler/2024.0/bin/compiler/clang+++0x3eb5329)
icpx: error: clang frontend command failed with exit code 139 (use -v to see invocation)
Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2024.0/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2024.0/bin/compiler/../icpx.cfg

Attaching the detailed.log file.

detailed.log

@kronbichler
Copy link
Member

As far as I can see, the error comes from the __restrict keyword that hides behind the DEAL_II_RESTRICT macro. This appears to be an internal error in the compiler, so it should definitely get inspected by the clang/llvm people. It seems that it really is the combination of multiple call sites and optimizations done across functions if I read clang's call stack correctly.

That said, what would the suggested fix be? We could remove the DEAL_II_RESTRICT macro at this spot, which might have a performance impact. (In my experience, I did not observe a difference when removing it, but I only have data from a limited set of experiments at this point.) We could also generate a way to disable the macro, as I don't think compilers really care about it too much. Maybe aliasing analysis has become good enough that compilers don't care much anymore.

@bangerth
Copy link
Member

bangerth commented Mar 6, 2024

In the current case, with a declaration of

    apply_matrix_vector_product(const Number2 *DEAL_II_RESTRICT matrix,
                                const Number                   *in,
                                Number                         *out,

it's unclear what restrict could actually achieve. First, it will only ever apply if Number2 == Number since objects of different type cannot conflict. Second, with restrict, the compiler can create local temporary variables that store parts of the input data and reference those instead of the original input data because it knows that the input data will not be overwritten. It's not clear to me what the compiler might use this freedom for in the current case -- in fact, the community's broader consensus seems to be that compilers don't actually benefit from this freedom, which is why the keyword is not widely used.

It would not at all surprise me if removing the 21 occurrences of the keyword in our code base leads to no measurable differences in performance. It would even not very much surprise me if there were no differences in assembly code at all, but perhaps I'm being overly pessimistic about what compilers could do :-)

@tamiko
Copy link
Member

tamiko commented Mar 6, 2024

@bangerth: One place where it made a difference was when we were evaluating the performance impact of the contraction templates in base/tensor_accessors.h. I remember that __restrict lead to better and more compact code - when explicitly instantiating some contraction functions to evaluate what assembly code we would get. But note, that we still don't need the __restrict keyword here because we include this header file into all compilation units.

So, I guess, I am "concurring in part, and dissenting in part".

@ryanstoner1: Would you mind to configure by adding -DDEAL_II_COMPILER_HAS_RESTRICT_KEYWORD=no to your cmake invocation? Please check that after the configuration run DEAL_II_RESTRICT is set to the empty string in <BUILD DIR>/include/deal.II/base/config.h.

@tamiko
Copy link
Member

tamiko commented Mar 6, 2024

@kronbichler I have not observed such an issue with clang/llvm at all. The compiler in question is actually IntelLLVM 2024.0.0 on platform Linux x86_64. So my immediate response is "whatever patchset Intel is carrying on top of LLVM."

Let me check out the usage terms of the intel oneAPI stuff. Maybe we can set up a regression testsuite configuration for it.

@kronbichler
Copy link
Member

it's unclear what restrict could actually achieve. First, it will only ever apply if Number2 == Number since objects of different type cannot conflict. Second, with restrict, the compiler can create local temporary variables that store parts of the input data and reference those instead of the original input data because it knows that the input data will not be overwritten. It's not clear to me what the compiler might use this freedom for in the current case -- in fact, the community's broader consensus seems to be that compilers don't actually benefit from this freedom, which is why the keyword is not widely used.

We often call this with Number2 == Number, so that means overlap. Unfortunately, the API of intrinsics also means that VectorizedArray<double> * and double * pointers can alias, so the compiler needs to consider those as well. The main intent here and in other places of this file is the following: we have matrix as an input in this function and two vectors. But we use this in a loop over several "vectors" in the code that calls this small matrix-vector product. To be precise, this is then not a matrix-vector product but a matrix-matrix product or even more general a contraction between tensors when in 3D. The main way to achieve efficiency is to perform register blocking, i.e., to use the matrix entries loaded for one matrix-vector product also for the other products in the nearby loop, and avoid loading them again. This is an essential part of achieving more than 30% of the arithmetic peak performance in matrix-matrix multiplications (and we do target to get substantially more than that). If we decorate the pointer with restrict, we tell the compiler that the matrix will not overlap with the output array of the underlying operation, and so it can be sure that entries once loaded will not need to be updated again when results are written back. If we don't tell it, the compiler will need to figure out by other means that this is not the case, and it can get complicated. The other good case is when the output variable is an array allocated locally on the stack: Then the compiler also knows that a pointer coming from outside cannot overlap with the local array.

That basic explanation given, I believe that the __restrict keyword in the surrounding function calling apply_matrix_vector_product is the crucial one, and we could skip it from this function, which seems to be the problematic case.

As to:

It would not at all surprise me if removing the 21 occurrences of the keyword in our code base leads to no measurable differences in performance. It would even not very much surprise me if there were no differences in assembly code at all, but perhaps I'm being overly pessimistic about what compilers could do :-)

I would aim to measure things and inspect the code in question. As I said, the main intent is to not re-load entries of an interpolation matrix. We should have enough benchmarks that keep track of this. I volunteer to look at it, if that is what we want.

@bangerth
Copy link
Member

bangerth commented Mar 9, 2024

No need to volunteer to time this! I wasn't proposing that we should get rid of the keywords, I was just saying that I wouldn't have been surprised if they make no difference. But with your explanation, I'm perhaps not so sure about that any more ;-)

@ryanstoner1
Copy link
Author

@tamiko I added that to the cmake invocation, and now it compiles without error. Thank you everyone for the explanations and comments!

My original issue is solved now. Unless anyone has any objections or plans to do benchmarking I'll close this issue next week.

@tamiko tamiko added the Compiler label Mar 9, 2024
@tamiko tamiko added this to the Release 9.6 milestone Mar 9, 2024
@tamiko tamiko changed the title Error compiling master on Stampede3 IntelLLVM cannot handle DEAL_II_RESTRICT keyword: internal compiler error in clang fronted Mar 9, 2024
@tamiko
Copy link
Member

tamiko commented Mar 9, 2024

@ryanstoner1 Let's keep the issue open. We might decide to work around this issue in various different ways.

@tamiko
Copy link
Member

tamiko commented Mar 9, 2024

@kronbichler @bangerth I agree. Let's not overreact here. Our use of DEAL_II_RESTRICT is limited. And we verified that the individual us eis safe and substantiated by benchmarks that indicate that it helps (at least when we wrote the code). Also, I have not observed an ICE with any clang/llvm version so far. No idea what Intel is doing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants