Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files tagged with HIP_SOURCE_PROPERTY_FORMAT generate wrong HIP_CLANG_PATH #174

Open
tpkessler opened this issue May 27, 2023 · 6 comments
Assignees

Comments

@tpkessler
Copy link

Hi! I'm packaging ROCm for Arch Linux. With HIP 5.5.1 I encountered the issue that HIP_CLANG_PATH was pointing to /opt/rocm/llvm (instead of the correct /opt/rocm/llvm/bin) as the env var with the same name is overwritten by the cmake macro HIP_PREPARE_TARGET_COMMANDS declared in FindHIP.cmake. If HIP_SOURCE_PROPERTY_FORMAT is set to TRUE, cmake modifies the env vars for hipcc in an inconsistent way. Removing the tags in src/CMakeLists.txt resolves this issue for me.

--- rocALUTION-rocm-5.5.1/src/CMakeLists.txt.bak	2023-05-27 18:12:40.853740486 +0200
+++ rocALUTION-rocm-5.5.1/src/CMakeLists.txt	2023-05-27 18:12:49.834025607 +0200
@@ -115,11 +115,6 @@

 # Create rocALUTION hip library
 if(SUPPORT_HIP)
-  # Flag source file as a hip source file
-  foreach(i ${HIP_SOURCES})
-    set_source_files_properties(${i} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT TRUE)
-  endforeach()
-
   # HIP flags workaround while target_compile_options do not work
   # list(APPEND HIP_HIPCC_FLAGS "-O3 -march=native -Wno-unused-command-line-argument -fPIC -std=c++14")
   list(APPEND HIP_HIPCC_FLAGS "-O3 -Wno-unused-command-line-argument -fPIC -std=c++14")

I couldn't find any documentation for HIP_SOURCE_PROPERTY_FORMAT, and it isn't used in similar libraries like rocBLAS or rocSOLVER. It's only mentioned in a comment on building with rocBLAS with CUDA

# ########################################################################
# NOTE:  CUDA compiling path
# ########################################################################
# I have tried compiling rocBLAS library source with multiple methods,
# and ended up using the approach where we set the CXX compiler to hipcc.
# I didn't like using the HIP_ADD_LIBRARY or CUDA_ADD_LIBRARY approaches,
# for the reasons I list here.
# 1.  Adding header include directories is through HIP_INCLUDE_DIRECTORIES(), which
# is global to a directory and affects all targets
# 2.  You must add HIP_SOURCE_PROPERTY_FORMAT OBJ properties to .cpp files
# to get HIP_ADD_LIBRARY to recognize the file
# 3.  HIP_ADD_LIBRARY invokes a call to add_custom_command() to compile files,
# and rocBLAS does the same.  The order in which custom commands execute is
# undefined, and sometimes a file is attempted to be compiled before it has
# been generated.  The fix for this is to create 'PHONY' targets, which I
# don't desire.
@cgmb
Copy link
Contributor

cgmb commented Jun 5, 2023

Hi @tpkessler, I'm not an expert on the rocALUTION build system, but my understanding is that it uses FindHIP.cmake because when MPI is enabled, rocALUTION needs to build some files as C++ and other files as HIP. The HIP_SOURCE_PROPERTY_FORMAT property is used in FindHIP.cmake to distinguish the two. If you remove those lines, I think you can build everything with hipcc, but only if you don't enable MPI.

To me, it sounds like the real problem is whatever the code that is setting that incorrect path to clang. When looking through this history of FindHIP.cmake, I noticed a commit titled "Change to get correct clang path in findhip". I wonder if backporting that change would fix the path to clang? ROCm/HIP@e15925e

These are just my initial thoughts before we properly debug the problem. Once we reproduce the issue and debug, it should be possible to answer with more confidence.

@tpkessler
Copy link
Author

Hi Cory, thanks for looking into my issue. The linked line is a good starting point. I recompiled rocalution with the patched hip but I get the same error messages.

I think you can build everything with hipcc, but only if you don't enable MPI.

I could build rocalution with MPI and the patch without problems.

@ntrost57
Copy link
Contributor

Closing this issue. Thanks @cgmb

@cgmb
Copy link
Contributor

cgmb commented Jun 26, 2023

@ntrost57, I don't think I've resolved their problem.

@tpkessler
Copy link
Author

No, the problem still exists.

@cgmb
Copy link
Contributor

cgmb commented Jul 12, 2023

It's merely a workaround, but I've added an option to build without FindHIP.cmake in 5bd014f. It relies on CMake's HIP language support instead, and can be enabled with -DUSE_HIPCXX=ON.

Note that you will need to specify your build flags differently if you use that option (e.g., CXXFLAGS vs HIPFLAGS) and some of the default CMake options may differ as well (e.g., AMDGPU_TARGETS vs. CMAKE_HIP_ARCHITECTURES). They will require some adjustments to your build scripts and that is why it is an opt-in flag.

ROCmMathLibrariesBot pushed a commit that referenced this issue Nov 21, 2023
* PMIS MPI support (#100)

* mpi to hip backend

* MPI enablement for PMIS

* MPI RS example

* clang-format

* example

* fix

* global Ext+I interpolation (#112)

* unused function

* Added PM as argument for aggregation function

* added pm for each level to MG class

* Added PM as argument for aggregation function #2

* Added PM as argument for aggregation function #3

* RS Ext+I added to global matrix

* modified example to dump some data for validation - work in progress

* RS Ext+I function added to headers

* RS Ext+I HIP implementation

* RS Ext+I host implementation

* Improved PM

* global Ext+I kernel update

* some multinode improvements (#118)

* added some more useful guards to parallel manager

* added CopyFromHostData, CopyToHostData, ExclusiveSum and Sort functionality for vector class ; moved boundary information from vector to matrix ; added GetFormat() to GlobalMatrix class

* clang-format

* P and R should be OperatorType, not LocalMatrix

* clang-format

* check if PM is valid when required

* added extra function for triple matrix product for simplicity

* clang-format

* clang-format

* skip free when ptr is nullptr

* fix memory leak in BaseAMG class

* rocsparse_csrgeam added

* renamed csr ext+i kernel because it is generally usable

* allowing CSR zero matrices with row_offset != nullptr, as well as zero vectors with size == 0

* clang-format

* duplicated row column entries throw a warning

* copy_x2x() functions added for readability and simplicity

* search and replace memcpy with copy fct

* search and replace memcpy with copy fct #2 ; fix for random csr generator to not generate duplicated row col entries

* clang-format

* fixes

* those asserts are wrong

* major version bump

* OpenMP parallel loop threshold need to be int64_t in order to work with larger structures

* Allow basic structures with 64bit entries, e.g. for global indices

* nnz should be 64bit ; also restructured RSAMG for readability

* vector size need to be 64bit locally - also added inclusive and exclusive sum functionality

* 64bit sizes for stencils

* host vector implementation changes for 64bit sizes and in/exclusive sum ; host I/O changed to always write 64bit sizes

* host stencil 64bit changes

* max residual index changed to 64bit accordingly

* solvers adjusted for 64bit nnz

* int64_t to double conversion

* allocation size should always be 64bit ; also added copy_h2h() for simplicity

* long and long long communication support added

* cleaned up types ; IndexType2 was a stupid name anyway

* removed deprecations (major release); enabled global structure support in RSAMG

* major changes to PM; added guards for transfers; removed deprecations; fixed int overflows; functionality to generate a PM from global ghost column ids, and a parent PM

* matrix conversions 64bit nnz support with guards

* host matrix I/O changed to always write 64bit sizes ; backward compatible

* host matrix implementations changed to 64bit nnz

* RSAMG restructured - global communication should not happen in local implementations ; switched to 64bit sizes

* host CSR matrix implementation

* hip implementation ; added copy_d2h/h2d/d2d for simplicity, with async flag

* adjusted unit tests to removed deprecated functions

* RSAMG MPI example updated

* fixed sanity assert

* doc update

* example should work with only 1 process

* global routines should work with single process

* global transpose operator

* using copy_h2h()

* _rocalution_sync should force a global barrier, too

* improved asynchronous apply / comm / halo apply

* accelerator must be available for pinned alloc/free

* fixing few compiler warnings

* readability

* removed the flood of printf on multi gpu systems

* adjusted openmp nested (deprecation) to v5.0

* weak scaling examples

* distributed laplacian generator

* updated rsamg example

* updated rsamg mpi example

* should use OperatorType, nothing else

* fixed RSDirectInterpolation(); fixed const PM issue

* updated unit tests

* types.hpp generated by cmake ; CSR(64/32) added on host ; moved RSPMIS communication into global matrix class

* removed old types.hpp

* initial implementation for unordered set and map on hip backend

* outsourced RSAMG to improve compilation performance; added async communication for multinode; moved multinode rspmis into globalmatrix; outsourced atomics

* clang-format

* fix for streams when not building for mpi

* SA amg merge fix

* fixed missing shared memory size

* clang-format

* clang-format

* typo

* clang format

* add blockdim to UAAMG benchmark

* adjusting unit tests for removed deprecated functions

* clang format

* test fix

* clang-format

* std::sort required algorithm header

* fixing merge error

* merge fix #2

* header cleaned up

* header cleaned up #2

* fix issue with HIP not being found

* free_pinned() does nothing on nullptr

* global triple matrix product

* proper error message when coarsening fails

* fixed a bug in global triplematrixproduct

* fixed a typo

* fixed compilation issue when HIP=off

* fixes COO and CSR conversions on both host and device, and ELL on host only (#211)

* empty matrix conversion fix

* host fallback fix for rsamg and triplematprod

* Fix documentation failures (#214)

Co-authored-by: jsandham <james.sandham@amd.com>

* Add Smoothed Aggregation to amgmpi branch (#213)

* Adding global aggregation to SAAMG (#166)

Co-authored-by: jsandham <james.sandham@amd.com>

* Add MPI support for global prolongation to SAAMG (#171)

Co-authored-by: jsandham <james.sandham@amd.com>

* Add MPI support for SAAMG global transpose (#172)

* Add MPI support for SAAMG global transpose

* Fix failures in greedy aggregation caused by unfilled aggregate_root_nodes array

---------

Co-authored-by: jsandham <james.sandham@amd.com>

* Add MPI unsmoothed aggregation (#174)

Co-authored-by: jsandham <james.sandham@amd.com>

* Adding debug printing to test triple product

* Adding debug print statements for testing

* Adding more debug printing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Fix floating point fault caused by division by zero

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Testing

* Fix failures in local matrix when max_nnz_per_row is too high

* Testing

* Fix bug where we were not using a large enough hash table size

* Fix discrepency in host and hip assert in ExtractSubMatrix

* Fixing hangs in multinode hip backend

* Fix RSAMG documentation warnings

* Testing MPI uaamg

* Fix testing_local_matrix failure

* Remove comments and temporary testing code

* PR fixes

* PR fixes

* PR fixes

* Clang formatting

---------

Co-authored-by: jsandham <james.sandham@amd.com>

* removed unused variables

* Add back functions that cannot be removed until next major release (#216)

Co-authored-by: jsandham <james.sandham@amd.com>

* fix for very large sizes where local ext matrix exceeds int32

* Remove print statements from saamg testing file

---------

Co-authored-by: James Sandham <33790278+jsandham@users.noreply.github.com>
Co-authored-by: jsandham <james.sandham@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants