Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang error compiling included C++ headers with GPU locale model #19754

Closed
milthorpe opened this issue May 4, 2022 · 8 comments · Fixed by #20239
Closed

Clang error compiling included C++ headers with GPU locale model #19754

milthorpe opened this issue May 4, 2022 · 8 comments · Fixed by #20239

Comments

@milthorpe
Copy link
Contributor

milthorpe commented May 4, 2022

Summary of Problem

A Chapel package module (ZMQ) that includes C++ code does not compile with the GPU locale model.

Steps to Reproduce

If I try to compile the arkouda code to check for ZeroMQ, which requires a C library wrapper, it works fine with the regular locale model:

milthorpe@equinox:/noback/milthorpe/chapel[main ?]$ chpl ../arkouda/dep/checkZMQ.chpl -I`spack location -i libzmq%gcc@11.1.0 arch=linux-ubuntu20.04-broadwell`/include -L`spack location -i libzmq%gcc@11.1.0 arch=linux-ubuntu20.04-broadwell`/lib
milthorpe@equinox:/noback/milthorpe/chapel[main !?]$ ./checkZMQ
Found ZMQ version: 4.3.3

but crashes when compiled with the GPU locale model:

milthorpe@equinox:/noback/milthorpe/chapel[main !?]$ CHPL_LLVM=bundled CHPL_CUDA_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/21.7/cuda/11.4 CHPL_LOCALE_MODEL=gpu chpl --gpu-arch sm_60 --no-checks ../arkouda/dep/checkZMQ.chpl -I`spack location -i libzmq%gcc@11.1.0 arch=linux-ubuntu20.04-broadwell`/include -L`spack location -i libzmq%gcc@11.1.0 arch=linux-ubuntu20.04-broadwell`/lib
warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11040. Assuming the latest supported version 10.1
In file included from <built-in>:3:
In file included from /tmp/chpl-milthorpe.deleteme-aIK9XQ/command-line-includes.h:7:
In file included from /noback/milthorpe/chapel/modules/packages/ZMQHelper/zmq_helper.h:26:
In file included from /noback/milthorpe/chapel/runtime/include/qio/qio.h:26:
In file included from /noback/milthorpe/chapel/runtime/include/qio/qbuffer.h:39:
In file included from /noback/milthorpe/chapel/runtime/include/qio/deque.h:44:
In file included from /noback/milthorpe/chapel/runtime/include/chpl-mem.h:30:
In file included from /noback/milthorpe/chapel/runtime/include/arg.h:24:
In file included from /noback/milthorpe/chapel/runtime/include/chpltypes.h:39:
In file included from /noback/milthorpe/chapel/third-party/llvm/install/linux64-x86_64-clang/lib/clang/13.0.0/include/cuda_wrappers/complex:33:
/noback/milthorpe/spack/opt/spack/linux-ubuntu20.04-broadwell/gcc-9.4.0/gcc-11.1.0-wnfs2uguhyb3qe4tnvgy3ff43hruxwsw/lib/gcc/x86_64-pc-linux-gnu/11.1.0/../../../../include/c++/11.1.0/type_traits:56:3: error: templates must have C++ linkage
  template<typename _Tp, _Tp __v>
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/chpl-milthorpe.deleteme-aIK9XQ/command-line-includes.h:2:1: note: extern "C" language linkage specification begins here
extern "C" {
^
...
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated when compiling for sm_60.
error: error running clang during code generation

The clang error indicates that the extern "C" in the generated file command-line-includes.h is the problem. If I remove the extern declaration, I can compile and run with the GPU locale model, but I'm not sure if this is the correct solution.

From what I can see, command-line-includes.h is including a bunch of mostly-C header files, but one include chain runs from modules/packages/ZMQHelper/zmq_helper.h down to runtime/include/chpltypes.h, which includes a C++ header file clang/13.0.0/include/cuda_wrappers/complex that includes templated functions. This won't compile in the context of an extern "C" include, but I'm not sure of the best solution to ensure C++ code is correctly included as such.

The compiler function isCHeader just checks whether the filename suffix is .h, which won't work for files like the runtime headers chpltypes.h, chplcast.h which function either as C or C++ headers depending on how they're compiled (#ifdef __cplusplus).

Configuration Information

  • Output of chpl --version:
chpl version 1.27.0 pre-release (298724cbcb)
  built with LLVM version 13.0.0
  • Output of $CHPL_HOME/util/printchplenv --anonymize:
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: gpu *
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: bundled
CHPL_LLVM: bundled
CHPL_AUX_FILESYS: none
  • Back-end compiler and version, e.g. gcc --version or clang --version:
clang version 13.0.0 (git@github.com:milthorpe/chapel.git 298724cbcb3712d300a98d4734d6e233b87ad201)
@bradcray

This comment was marked as resolved.

@milthorpe

This comment was marked as resolved.

@bradcray
Copy link
Member

Based on today's discussion over on #19891, I'm wondering whether the right thing to do here would be to remove the code from compiler/llvm/clang.cpp that adds the extern "C" block wholesale to the command-line-includes.h file and rely instead on the header files themselves to be appropriately guarded with extern "C" and/or #ifdef __cplusplus blocks (?).

Something that's unclear to me in the case of the complex includes in ctypes.h is whether part of the problem here is that the header file for C vs. C++ complex numbers is one and the same? Or is it two different files in two different search paths that a compiler will choose between for C vs. C++? Maybe put another way, is there a way we can simply have this code path use the C complex types and version of the header for all compiles (since this is one of the few places I'm aware of where our runtime tries to inject C++ code into the mix?). This SO post gives me some slight hope that perhaps the answer is "yes", but I'm not confident: https://stackoverflow.com/questions/23414270/c-complex-and-complex-h-in-the-same-file and won't have time to try it tonight (and perhaps it's too g++-specific? though does any C/C++ compiler not use the gnu headers these days?)

@mppf
Copy link
Member

mppf commented May 27, 2022

We might need to provide a flag or something so that you can indicate if a header file is a C or C++ header.

Or, maybe, we just need to #include the usual Chapel runtime stuff before the command line includes, so that something like chpltypes.h will be included first in C++ mode (where it can do the right thing w.r.t. extern "C" and C++ headers), and then the include guard #define would prevent it from being included again within the extern "C" block in command-line-includes.h. That seems like a less drastic solution if we can get it to work.

@hokiegeek2
Copy link

@mppf @bradcray this is what I am seeing when I attempt to compile Arkouda w GPU locales:

image

@hokiegeek2
Copy link

ah, wonder if it's the compute capability, will set that, recompile Chapel, and see what happens:

image

@e-kayrakli
Copy link
Contributor

The solution to the immediate problem is to put some #includes that are in #ifdef __cplusplus in extern "C++" blocks. Outer extern "C" blocks break linkage for those functions that require C++ linkage.

However, this'll bump into lack of complex support in the GPU locale model after adding missing extern "C++"s. Which is something we're aware of (https://github.com/chapel-lang/chapel/blob/main/test/gpu/native/math.chpl#L20-L23).

diff --git a/runtime/include/atomics/cstdlib/chpl-atomics.h b/runtime/include/atomics/cstdlib/chpl-atomics.h
index 962a90e6cd..76d3322db6 100644
--- a/runtime/include/atomics/cstdlib/chpl-atomics.h
+++ b/runtime/include/atomics/cstdlib/chpl-atomics.h
@@ -32,6 +32,7 @@
 #endif

 #if defined(__cplusplus)
+extern "C++" {
   #if __cplusplus >= 201103L
     #include <atomic>
     #define Atomic(T) std::atomic<T>
@@ -55,9 +56,12 @@
   #else
     #error "The cstdlib atomics need at least C++11.  Use intrinsics or locks."
   #endif
+}
 #elif __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_ATOMICS__)
+extern "C++" {
   #include <stdatomic.h>
   #define Atomic(T) _Atomic T
+}
 #else
   #error "The cstdlib atomics need at least C11.  Use intrinsics or locks."
 #endif
diff --git a/runtime/include/chpltypes.h b/runtime/include/chpltypes.h
index ebb21c82fb..ab2b3de379 100644
--- a/runtime/include/chpltypes.h
+++ b/runtime/include/chpltypes.h
@@ -36,9 +36,11 @@
 typedef float complex        _complex64;
 typedef double complex       _complex128;
 #else
+extern "C++" {
 #include <complex>
 typedef std::complex<float>  _complex64;
 typedef std::complex<double> _complex128;
+}
 #endif

 #ifdef __cplusplus

This patch makes chpl test/library/packages/ZMQ/hello.chpl compile beyond the error reported in this issue (until it fails with known lack of complex support).


I don't think the lack of complex support is something too challenging to handle, but it is not near the top of our priorities. We can make it a priority, though. Is this a blocker @milthorpe and/or @hokiegeek2 ?

@hokiegeek2
Copy link

@e-kayrakli thanks for the follow-up! Um, I'd really appreciate getting this issue resolved as I am trying to prototype Arkouda on GPUs where downselected pdarrays can be shared with Python tools such as Ray and RAPIDS vi locales being on GPUs. Even if it's just a code change I need to grab and patch my 1.27.0 codebase, that would be awesome!

e-kayrakli added a commit that referenced this issue Jul 28, 2022
Fix some issues with the GPU support to enable simple Arkouda builds

This branch makes bunch of small-ish changes in an attempt to get Arkouda to
compile with the GPU locale model. I can separate this into multiple PRs, but I
want to test it some more as a whole. Fixes with this PR:

- Resolves #19754
  - Stops adding `extern "C"` in the C headers included at the command line.
    This was proposed by @milthorhpe.
  - Fixes an interop test broken by this.
- To be able to do that makes runtime's generated code interface for complex
  numbers C-based even if we're compiling with C++. (A prior PR that made some
  improvements in that direction is
  #16847)
- This enables defining `c_string_to_complex*` functions to be included in the
  generated executable, because their return types are no longer
  `std::complex<>` which you can't link with C linkage.
- Adds a `no gpu codegen` pragma to thwart gpuization within some internal
  functions. Currently this only applies to:
  - `chpl__initCopy_shapeHelp`: Has PRIM_ASSIGN that gets normalized after
    gpuization, but not resolved. We probably need this function, and the lack
    might be preventing us from using loop expressions to initialize arrays on
    gpus.
- Fixes `.name` implementation of the `GPULocale` type.
  - Adds a relevant test.
- Adds support for `BitOps` module.
  - Adds a relevant test.
- While there, cleans up `gpuTransforms.cpp` a bit.

[Reviewed by @stonea]

Test:

- [x] gpu/native
- [x] gpu/interop
- [x] `make check` + spot checks in `types/complex` with `intel` as the host+target compiler
- [x] `make check` + spot checks in `types/complex` with `cray` as the host+target compiler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants