Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while building rocALUTION for rocm5.1.x #144

Closed
dipietrantonio opened this issue May 16, 2022 · 14 comments
Closed

Error while building rocALUTION for rocm5.1.x #144

dipietrantonio opened this issue May 16, 2022 · 14 comments

Comments

@dipietrantonio
Copy link

Hi,
I am trying to build rocm5.1 from source, and this time I am blocked by a build error in rocALUTION.
Below is the build error. Here is the version I am using

ubuntu@cdp-rocmbuild:~/rocm-from-source/build/rocALUTION$ git log
commit 7c8611fd4e86fcfb7a73f2cb14c4a5f43db87765 (HEAD, tag: rocm-5.1.1, tag: rocm-5.1.0, rocm-swplat/release/rocm-rel-5.1, m/roc-5.1.x)
Installing rocALUTION ..
mkdir: cannot create directory ‘build’: File exists
Running command cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/rocm-dev2  -DCMAKE_CXX_COMPILER=hipcc -DAMDGPU_TARGETS=gfx908 -DBUILD_CLIENTS_SAMPLES=OFF -DCMAKE_MODULE_PATH=/opt/rocm-dev2/hip/cmake;/opt/rocm-dev2 ..
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES) 
-- Could NOT find OpenMP (missing: OpenMP_CXX_FOUND) 
-- OpenMP not found. Compiling WITHOUT OpenMP support.
-- Checking for module 'mpi-cxx'
--   No package 'mpi-cxx' found
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS) 
-- Could NOT find MPI (missing: MPI_CXX_FOUND) 
-- MPI not found. Compiling WITHOUT MPI support.
-- Found HIP: /opt/rocm-dev2/hip (found version "5.1.20531-") 
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
-- hip::amdhip64 is SHARED_LIBRARY
CMake Warning at /opt/rocm-dev2/share/rocm/cmake/ROCMUtilities.cmake:50 (message):
  Could not determine the version of program rpmbuild.
Call Stack (most recent call first):
  /opt/rocm-dev2/share/rocm/cmake/ROCMCreatePackage.cmake:284 (rocm_find_program_version)
  src/CMakeLists.txt:221 (rocm_create_package)


INFOrocm_set_cpack_gen didn't find ROCM_PKGTYPE in environment
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/rocm-from-source/build/rocALUTION/build
Running command make -j 8 install
[  1%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_vector.cpp.o
[  2%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_backend_hip.cpp.o
[  3%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_allocate_free.cpp.o
[  5%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_conversion.cpp.o
[  6%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_blas.cpp.o
[  7%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_matrix_bcsr.cpp.o
[  8%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_matrix_csr.cpp.o
[ 10%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_matrix_coo.cpp.o
[ 11%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_matrix_dense.cpp.o
[ 12%] Building HIPCC object src/CMakeFiles/rocalution_hip.dir/base/hip/rocalution_hip_generated_hip_matrix_dia.cpp.o
In file included from /home/ubuntu/rocm-from-source/build/rocALUTION/src/base/hip/hip_matrix_csr.cpp:53:
/home/ubuntu/rocm-from-source/build/rocALUTION/src/base/hip/hip_kernels_csr.hpp:53:33: error: invalid operands to binary expression ('std::complex<double>' and 'std::complex<double>')
                val[aj] = alpha * val[aj];
                          ~~~~~ ^ ~~~~~~~
/home/ubuntu/rocm-from-source/build/rocALUTION/src/base/hip/hip_matrix_csr.cpp:2880:33: note: in instantiation of function template specialization 'rocalution::kernel_csr_scale_diagonal<std::complex<double>, int>' requested here
            hipLaunchKernelGGL((kernel_csr_scale_diagonal<ValueType, int>),
                                ^
/opt/rocm-dev2/hip/include/hip/amd_detail/amd_hip_runtime.h:349:15: note: candidate function not viable: no known conversion from 'std::complex<double>' to '__HIP_Coordinates<__HIP_GridDim>::__X' for 1st argument
std::uint32_t operator*(__HIP_Coordinates<__HIP_GridDim>::__X,
              ^
/opt/rocm-dev2/hip/include/hip/amd_detail/amd_hip_runtime.h:355:15: note: candidate function not viable: no known conversion from 'std::complex<double>' to '__HIP_Coordinates<__HIP_BlockDim>::__X' for 1st argument
std::uint32_t operator*(__HIP_Coordinates<__HIP_BlockDim>::__X,
              ^
/opt/rocm-dev2/hip/include/hip/amd_detail/amd_hip_runtime.h:361:15: note: candidate function not viable: no known conversion from 'std::complex<double>' to '__HIP_Coordinates<__HIP_GridDim>::__Y' for 1st argument
std::uint32_t operator*(__HIP_Coordinates<__HIP_GridDim>::__Y,
              ^
/opt/rocm-dev2/hip/include/hip/amd_detail/amd_hip_runtime.h:367:15: note: candidat
@ntrost57
Copy link
Contributor

Looks like an issue with your environment. Can you compile other rocm libraries successfully?

@cgmb
Copy link
Contributor

cgmb commented May 16, 2022

-- Found HIP: /opt/rocm-dev2/hip (found version "5.1.20531-")

I don't think it's the cause of the build failures you're describing here, but that version number looks like it may suffer from the problem described in ROCm/HIP#2218. You may want to apply the second patch in this gist before building hipamd.

@dipietrantonio
Copy link
Author

@ntrost57 Thanks for helping me with this one. What makes you think so? I have managed to build several other ROCm libraries (using hipcc) with no issue; for instance, rocRAND, rocSPARSE, rocSOLVER. On the other hand, I am having issues with rocFFT as well but again it seems a programming error to me.

May be some CMake option I am missing? It would be very helpful if you could you share with me the CMake options you use to build this library :)

@cgmb Thank you Cory for your prompt help, once again! I took note of that patch and added it to my build process.

@ntrost57
Copy link
Contributor

Can you try install.sh script for compilation?

@dipietrantonio
Copy link
Author

I can't use the install script because of the reasons listed in this other similar issue. But I went through your install script and I believe that the issue is the option

[line 21]   echo "    [--host] build library for host backend only"

Might be that ${build_host} is set to true in your testing and then the code

 # HIP / Host only
  if [[ "${build_host}" == true ]]; then
    cmake_common_options="${cmake_common_options} -DSUPPORT_HIP=OFF"
  else
    cmake_common_options="${cmake_common_options} -DSUPPORT_HIP=ON"
  fi

disables HIP support? Then you wouldn't compile the above code that gives me error.

To confirm this, i will try to compile with -DSUPPORT_HIP=OFF. But what are the implications? The code will only run on CPU right?

@ntrost57
Copy link
Contributor

We are testing for both, host and device build. If you compile with -DSUPPORT_HIP=OFF the code will only run on CPU.
I am not able to reproduce the error you see, using rocm-5.1.1 with posted commit id running the given cmake command on Ubuntu 20.04.

@dipietrantonio
Copy link
Author

@ntrost57 would you post your cmake configuration here? I have noticed that my version of hip is 5.1.2xx, you might be using 5.1.1? To obtain the rocm repositories i followed the instructions in your website and used the repo command. It checked out the branch 5.1.x of the rocm repository, where instructions to download all repositories are stored. So i might be using a newer version of hip than you. I am not on my laptop now, but I'll post the exact version (commit hash) of hip I am using, maybe the bug is there

@ntrost57
Copy link
Contributor

I have been using HIP 5.1.20531-cacfa990 (rocm-5.1.1) what is exactly what you pasted in the logs.
You can use the rocm/dev-ubuntu-20.04:latest docker image for a clean rocm-5.1.1 installation. After installing rocsparse, rocblas, rocprim and rocrand, you should be able to successfully build rocalution.

@dipietrantonio
Copy link
Author

@ntrost57 that is very interesting. I am currently using a VM to try the build but ultimately I need to install ROCm on a supercomputer, so I would like to avoid containers. Coming back to the error I get, looks like the problem is that the * operator is called on two object of std::complex<double>. The overloaded function to handle this operation is defined in the standard library and is host code that can't be called within a kernel unless it has been redefined to have also the __device__ attribute. So it seems you have such definition in your HIP installation, and I don't. I will dig more into it, hopefully I will find a way to solve this.

@stepannassyr
Copy link

Ran into the same issue with 5.1.3. Wrote this workaround:

diff --git a/src/base/hip/hip_vector.hpp b/src/base/hip/hip_vector.hpp
index 0444528..a432359 100644
--- a/src/base/hip/hip_vector.hpp
+++ b/src/base/hip/hip_vector.hpp
@@ -30,11 +30,77 @@
 #include "../base_vector.hpp"
 
 #include <hip/hip_runtime.h>
+#include <hip/amd_detail/amd_hip_complex.h>
 
 #include <complex>
 
 #include "hip_rand.hpp"
 
+#if defined(__HIPCC_RTC__)
+#define __HOST_DEVICE__ __device__
+#else
+#define __HOST_DEVICE__ __host__ __device__
+#endif // !defined(__HIPCC_RTC__)
+
+// Gotta put these somewhere
+__HOST_DEVICE__ inline std::complex<float> operator+(const std::complex<float> a, const std::complex<float> b)
+{
+    auto ahip = make_hipFloatComplex(a.real(), a.imag());
+    auto bhip = make_hipFloatComplex(b.real(), b.imag());
+    auto res = a+b;
+    return res;
+}
+__HOST_DEVICE__ inline std::complex<float> operator-(const std::complex<float> a, const std::complex<float> b)
+{
+    auto ahip = make_hipFloatComplex(a.real(), a.imag());
+    auto bhip = make_hipFloatComplex(b.real(), b.imag());
+    auto res = a-b;
+    return res;
+}
+__HOST_DEVICE__ inline std::complex<float> operator*(const std::complex<float> a, const std::complex<float> b)
+{
+    auto ahip = make_hipFloatComplex(a.real(), a.imag());
+    auto bhip = make_hipFloatComplex(b.real(), b.imag());
+    auto res = a*b;
+    return res;
+}
+__HOST_DEVICE__ inline std::complex<float> operator/(const std::complex<float> a, const std::complex<float> b)
+{
+    auto ahip = make_hipFloatComplex(a.real(), a.imag());
+    auto bhip = make_hipFloatComplex(b.real(), b.imag());
+    auto res = a/b;
+    return res;
+}
+
+__HOST_DEVICE__ inline std::complex<double> operator+(const std::complex<double> a, const std::complex<double> b)
+{
+    auto ahip = make_hipDoubleComplex(a.real(), a.imag());
+    auto bhip = make_hipDoubleComplex(b.real(), b.imag());
+    auto res = a+b;
+    return res;
+}
+__HOST_DEVICE__ inline std::complex<double> operator-(const std::complex<double> a, const std::complex<double> b)
+{
+    auto ahip = make_hipDoubleComplex(a.real(), a.imag());
+    auto bhip = make_hipDoubleComplex(b.real(), b.imag());
+    auto res = a-b;
+    return res;
+}
+__HOST_DEVICE__ inline std::complex<double> operator*(const std::complex<double> a, const std::complex<double> b)
+{
+    auto ahip = make_hipDoubleComplex(a.real(), a.imag());
+    auto bhip = make_hipDoubleComplex(b.real(), b.imag());
+    auto res = a*b;
+    return res;
+}
+__HOST_DEVICE__ inline std::complex<double> operator/(const std::complex<double> a, const std::complex<double> b)
+{
+    auto ahip = make_hipDoubleComplex(a.real(), a.imag());
+    auto bhip = make_hipDoubleComplex(b.real(), b.imag());
+    auto res = a/b;
+    return res;
+}
+
 namespace rocalution
 {
     template <typename ValueType>

@stepannassyr
Copy link

Aaand now I'm realizing what I wrote doesn't even use the converted ahip = .... Hm but it does compile though

@dipietrantonio
Copy link
Author

Thanks for your input @stepannassyr. Indeed it is very strange, to be honest looks like a recursive definition that never ends.. so not sure it will work at runtime. I think there is definitely an issue somewhere in rocm..

@dipietrantonio
Copy link
Author

dipietrantonio commented Jul 12, 2022

This issue is not present anymore in ROCm 5.2, although I found another one in rocALUTION/src/solvers/multigrid/ruge_stueben_amg.hpp:

#if defined(WIN32) || defined(_WIN32) || defined(__WIN32)
#else
        [[deprecated("This function will be removed in a future release. Use "
                     "SetStrengthThreshold() instead")]]
#endif

Had to remove the code in the else block because it wouldnt compile using hipcc (clang-14).

@doctorcolinsmith
Copy link
Collaborator

Hello @dipietrantonio. As the original issue is resolved, I am closing this issue.

The new comment about ruge_stueben_amg.hpp is copied to a new issue (#151).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants