Update Frontier installation #1208

sethrj · 2024-04-26T20:24:41Z

This updates the build on Frontier to use the new hep143 allocation and installation with ROCm 5.7.1.

The only weird thing was that somehow thrust now assumes that it's building CUDA when we build from clang (and include it via device_runtime_api.h):

In file included from /ccs/home/s3j/Code/celeritas-frontier/src/corecel/sys/Device.cc:21:
In file included from /ccs/home/s3j/Code/celeritas-frontier/src/corecel/device_runtime_api.h:28:
In file included from /opt/rocm-5.7.1/include/thrust/mr/memory_resource.h:25:
In file included from /opt/rocm-5.7.1/include/thrust/detail/config/memory_resource.h:22:
In file included from /opt/rocm-5.7.1/include/thrust/detail/alignment.h:24:
/opt/rocm-5.7.1/include/thrust/detail/type_traits.h:31:10: fatal error: 'cuda/std/type_traits' file not found
#include <cuda/std/type_traits>
         ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.

esseivaju · 2024-04-27T21:08:19Z

Are you using clang directly or hipcc? Looking at rocThrust, compiler.h and device_system.h, if __hip__ isn't defined then it's picking cuda. Wouldn't you have to also define __THRUST_DEVICE_SYSTEM_NAMESPACE

sethrj · 2024-04-28T12:21:47Z

@esseivaju This was happening through the .cc files compiled by clang++. Thrust was setting THRUST_DEVICE_COMPILER to THRUST_DEVICE_COMPILER_CLANG, and then defaulting THRUST_DEVICE_SYSTEM to THRUST_DEVICE_SYSTEM_CUDA. By overriding THRUST_DEVICE_SYSTEM in device_runtime_api.h we give thrust the correct "device system" , and then it will automatically set __THRUST_DEVICE_SYSTEM_NAMESPACE.

The change is only to provide Thrust more information when going into device_system.h, not to replace that header.

esseivaju

On the rocmThrust readme they recommend using hipcc to compile cc files but I guess this work around works as long as we include device_runtime_api.h before any Thrust headers.

sethrj · 2024-04-28T21:12:42Z

OLCF recommends using their wacky Cray compiler wrappers... and those guys forward to llvm directly apparently

* Fix thrust build with rocm 5.7.1 * Fix non-agnostic test name * Update frontier environment * Load miniforge for python * Ignore pr workflow for unrelated scripts * Fix loaded data and cmake flags * Use more cores * Use conda path * Unload darshan

sethrj added 3 commits April 26, 2024 15:46

Fix thrust build with rocm 5.7.1

ec035d0

Fix non-agnostic test name

e74f8b1

Update frontier environment

d0eb89b

sethrj added documentation Improvements or additions to documentation, examples, and tests core Software engineering infrastructure labels Apr 26, 2024

sethrj requested a review from esseivaju April 26, 2024 20:24

Clang format

f7a2474

sethrj added a commit to sethrj/celeritas that referenced this pull request Apr 26, 2024

Update Frontier installation (celeritas-project#1208)

ec10cb3

sethrj added 3 commits April 26, 2024 17:32

Load miniforge for python

52dcb21

Ignore pr workflow for unrelated scripts

9553557

Fix loaded data and cmake flags

e337842

sethrj added 3 commits April 28, 2024 08:15

Use more cores

928c3f5

Use conda path

4114ded

Merge remote-tracking branch 'upstream/develop' into frontier-update

1d0fc39

esseivaju approved these changes Apr 28, 2024

View reviewed changes

Unload darshan

48305b9

sethrj merged commit 69cdb1a into celeritas-project:develop Apr 29, 2024
28 checks passed

sethrj deleted the frontier-update branch April 29, 2024 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Frontier installation #1208

Update Frontier installation #1208

sethrj commented Apr 26, 2024

esseivaju commented Apr 27, 2024

sethrj commented Apr 28, 2024

esseivaju left a comment

sethrj commented Apr 28, 2024

Update Frontier installation #1208

Update Frontier installation #1208

Conversation

sethrj commented Apr 26, 2024

esseivaju commented Apr 27, 2024

sethrj commented Apr 28, 2024

esseivaju left a comment

Choose a reason for hiding this comment

sethrj commented Apr 28, 2024