Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Frontier installation #1208

Merged
merged 11 commits into from
Apr 29, 2024
Merged

Conversation

sethrj
Copy link
Member

@sethrj sethrj commented Apr 26, 2024

This updates the build on Frontier to use the new hep143 allocation and installation with ROCm 5.7.1.

The only weird thing was that somehow thrust now assumes that it's building CUDA when we build from clang (and include it via device_runtime_api.h):

In file included from /ccs/home/s3j/Code/celeritas-frontier/src/corecel/sys/Device.cc:21:
In file included from /ccs/home/s3j/Code/celeritas-frontier/src/corecel/device_runtime_api.h:28:
In file included from /opt/rocm-5.7.1/include/thrust/mr/memory_resource.h:25:
In file included from /opt/rocm-5.7.1/include/thrust/detail/config/memory_resource.h:22:
In file included from /opt/rocm-5.7.1/include/thrust/detail/alignment.h:24:
/opt/rocm-5.7.1/include/thrust/detail/type_traits.h:31:10: fatal error: 'cuda/std/type_traits' file not found
#include <cuda/std/type_traits>
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

@sethrj sethrj added documentation Improvements or additions to documentation, examples, and tests core Software engineering infrastructure labels Apr 26, 2024
@sethrj sethrj requested a review from esseivaju April 26, 2024 20:24
sethrj added a commit to sethrj/celeritas that referenced this pull request Apr 26, 2024
@esseivaju
Copy link
Contributor

Are you using clang directly or hipcc? Looking at rocThrust, compiler.h and device_system.h, if __hip__ isn't defined then it's picking cuda. Wouldn't you have to also define __THRUST_DEVICE_SYSTEM_NAMESPACE

@sethrj
Copy link
Member Author

sethrj commented Apr 28, 2024

@esseivaju This was happening through the .cc files compiled by clang++. Thrust was setting THRUST_DEVICE_COMPILER to THRUST_DEVICE_COMPILER_CLANG, and then defaulting THRUST_DEVICE_SYSTEM to THRUST_DEVICE_SYSTEM_CUDA. By overriding THRUST_DEVICE_SYSTEM in device_runtime_api.h we give thrust the correct "device system" , and then it will automatically set __THRUST_DEVICE_SYSTEM_NAMESPACE.

The change is only to provide Thrust more information when going into device_system.h, not to replace that header.

Copy link
Contributor

@esseivaju esseivaju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the rocmThrust readme they recommend using hipcc to compile cc files but I guess this work around works as long as we include device_runtime_api.h before any Thrust headers.

@sethrj
Copy link
Member Author

sethrj commented Apr 28, 2024

OLCF recommends using their wacky Cray compiler wrappers... and those guys forward to llvm directly apparently

@sethrj sethrj merged commit 69cdb1a into celeritas-project:develop Apr 29, 2024
28 checks passed
@sethrj sethrj deleted the frontier-update branch April 29, 2024 12:15
sethrj added a commit that referenced this pull request Apr 29, 2024
* Fix thrust build with rocm 5.7.1
* Fix non-agnostic test name
* Update frontier environment
* Load miniforge for python
* Ignore pr workflow for unrelated scripts
* Fix loaded data and cmake flags
* Use more cores
* Use conda path
* Unload darshan
sethrj added a commit that referenced this pull request May 6, 2024
* Fix thrust build with rocm 5.7.1
* Fix non-agnostic test name
* Update frontier environment
* Load miniforge for python
* Ignore pr workflow for unrelated scripts
* Fix loaded data and cmake flags
* Use more cores
* Use conda path
* Unload darshan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Software engineering infrastructure documentation Improvements or additions to documentation, examples, and tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants