Problems while Deepspeed building from scratch on AMD MI300a APU

Hello, I am trying to build deepspeed ops from source on AMD MI300A. I am using `rocm-6.3.3` on MI300A.

deepspeed env variables. These variables are sourced before building. I am using the amdclang compiler
```
# Point to ROCm
export ROCM_PATH=/opt/rocm-6.3.3/
export cc=amdclang
export CC=amdclang++

export PYTORCH_ROCM_ARCH="gfx942"
export GPU_ARCHS="gfx942"

# Configure DeepSpeed Build Flags
export DS_BUILD_OPS=1
export DS_BUILD_CUTLASS_OPS=0
export DS_BUILD_EVOFORMER_ATTN=0
export DS_BUILD_FP_QUANTIZER=0
export DS_BUILD_GDS=0
export DS_BUILD_RAGGED_DEVICE_OPS=0
export DS_BUILD_SPARSE_ATTN=0
export DS_BUILD_DEEP_COMPILE=0
```

Commands for building:
```
cd /path/to/Deepspeed/
source venv/bin/activate
pip install --no-build-isolation -e .
```

Error message:
```
            |   ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/c10/util/Deprecated.h:24:43: note: expanded from macro 'C10_DEPRECATED_MESSAGE'
         24 | #define C10_DEPRECATED_MESSAGE(message) [[deprecated(message)]]
            |                                           ^
      csrc/lamb/fused_lamb_hip_kernel.hip:437:27: warning: 'data' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        437 |                         g.data<scalar_t>(),
            |                           ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:7: note: 'data' has been explicitly marked deprecated here
        247 |   T * data() const {
            |       ^
      csrc/lamb/fused_lamb_hip_kernel.hip:437:27: warning: 'data<float>' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        437 |                         g.data<scalar_t>(),
            |                           ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:246:3: note: 'data<float>' has been explicitly marked deprecated here
        246 |   C10_DEPRECATED_MESSAGE("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.")
            |   ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/c10/util/Deprecated.h:24:43: note: expanded from macro 'C10_DEPRECATED_MESSAGE'
         24 | #define C10_DEPRECATED_MESSAGE(message) [[deprecated(message)]]
            |                                           ^
      csrc/lamb/fused_lamb_hip_kernel.hip:446:32: warning: 'data' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        446 |                         w_l2_i.data<scalar_t>(),
            |                                ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:7: note: 'data' has been explicitly marked deprecated here
        247 |   T * data() const {
            |       ^
      csrc/lamb/fused_lamb_hip_kernel.hip:446:32: warning: 'data<float>' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        446 |                         w_l2_i.data<scalar_t>(),
            |                                ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:246:3: note: 'data<float>' has been explicitly marked deprecated here
        246 |   C10_DEPRECATED_MESSAGE("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.")
            |   ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/c10/util/Deprecated.h:24:43: note: expanded from macro 'C10_DEPRECATED_MESSAGE'
         24 | #define C10_DEPRECATED_MESSAGE(message) [[deprecated(message)]]
            |                                           ^
      csrc/lamb/fused_lamb_hip_kernel.hip:447:32: warning: 'data' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        447 |                         u_l2_i.data<scalar_t>());
            |                                ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:7: note: 'data' has been explicitly marked deprecated here
        247 |   T * data() const {
            |       ^
      csrc/lamb/fused_lamb_hip_kernel.hip:447:32: warning: 'data<float>' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        447 |                         u_l2_i.data<scalar_t>());
            |                                ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:246:3: note: 'data<float>' has been explicitly marked deprecated here
        246 |   C10_DEPRECATED_MESSAGE("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.")
            |   ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/c10/util/Deprecated.h:24:43: note: expanded from macro 'C10_DEPRECATED_MESSAGE'
         24 | #define C10_DEPRECATED_MESSAGE(message) [[deprecated(message)]]
            |                                           ^
      csrc/lamb/fused_lamb_hip_kernel.hip:451:44: warning: 'data' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        451 |                         num_blocks, w_l2_i.data<scalar_t>(), u_l2_i.data<scalar_t>());
            |                                            ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:7: note: 'data' has been explicitly marked deprecated here
        247 |   T * data() const {
            |       ^
      csrc/lamb/fused_lamb_hip_kernel.hip:451:44: warning: 'data<float>' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        451 |                         num_blocks, w_l2_i.data<scalar_t>(), u_l2_i.data<scalar_t>());
            |                                            ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:246:3: note: 'data<float>' has been explicitly [[deprecated(message)]]
            |                                           ^
      csrc/lamb/fused_lamb_hip_kernel.hip:471:32: warning: 'data' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        471 |                         u_l2_i.data<scalar_t>(),
            |                                ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:247:7: note: 'data' has been explicitly marked deprecated here
        247 |   T * data() const {
            |       ^
      csrc/lamb/fused_lamb_hip_kernel.hip:471:32: warning: 'data<float>' is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
        471 |                         u_l2_i.data<scalar_t>(),
            |                                ^
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/ATen/core/TensorBody.h:246:3: note: 'data<float>' has been explicitly marked deprecated here
        246 |   C10_DEPRECATED_MESSAGE("Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead.")
            |   ^
      /lustrds_benchmarks/DeepSpeed/csrc/transformer/inference/includes -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/csrc/includes -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-DBF16_AVAILABLE -DROCM_WAVEFRONT_SIZE=64 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1 --offload-arch=gfx942 -fno-gpu-rdc
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:367:12: error: use of undeclared identifier '__ll2bfloat16_rn'; did you mean '__ll2float_rn'?
        367 |     return __ll2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~
            |            __ll2float_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:576:32: note: '__ll2float_rn' declared here
        576 | __device__ static inline float __ll2float_rn(long long int x) { return (float)x; }
            |                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:372:12: error: use of undeclared identifier '__int2bfloat16_rn'; did you mean '__int2float_rn'?
        372 |     return __int2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~~
            |            __int2float_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:545:32: note: '__int2float_rn' declared here
        545 | __device__ static inline float __int2float_rn(int x) { return (float)x; }
            |                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:377:12: error: use of undeclared identifier '__short2bfloat16_rn'; did you mean '__float22bfloat162_rn'?
        377 |     return __short2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~~~~
            |            __float22bfloat162_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bf16.h:574:45: note: '__float22bfloat162_rn' declared here
        574 | __BF16_HOST_DEVICE_STATIC__ __hip_bfloat162 __float22bfloat162_rn(const float2 a) {
            |                                             ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:377:32: error: no viable conversion from 'int16_t' (aka 'short') to 'float2' (aka 'HIP_vector_type<float, 2>')
        377 |     return __short2bfloat16_rn(val);
            |                                ^~~
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:471:9: note: candidate constructor not viable: no known conversion from 'int16_t' (aka 'short') to 'const HIP_vector_type<float, 2> &' for 1st argument
        471 |         HIP_vector_type(const HIP_vector_type&) = default;
            |         ^               ~~~~~~~~~~~~~~~~~~~~~~
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:474:9: note: candidate constructor not viable: no known conversion from 'int16_t' (aka 'short') to 'HIP_vector_type<float, 2> &&' for 1st argument
        474 |         HIP_vector_type(HIP_vector_type&&) = default;
            |         ^               ~~~~~~~~~~~~~~~~~
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:466:9: note: candidate template ignored: requirement 'sizeof...(Us) == 2U' was not satisfied [with Us = <int16_t>]
        466 |         HIP_vector_type(Us... xs) noexcept
            |         ^
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:457:9: note: explicit constructor is not a candidate
        457 |         HIP_vector_type(U x_) noexcept
            |         ^
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bf16.h:574:80: note: passing argument to parameter 'a' here
        574 | __BF16_HOST_DEVICE_STATIC__ __hip_bfloat162 __float22bfloat162_rn(const float2 a) {
            |                                                                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:382:12: error: use of undeclared identifier '__int2bfloat16_rn'; did you mean '__int2float_rn'?
        382 |     return __int2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~~
            |            __int2float_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:545:32: note: '__int2float_rn' declared here
        545 | __device__ static inline float __int2float_rn(int x) { return (float)x; }
            |                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:387:12: error: use of undeclared identifier '__ull2bfloat16_rn'; did you mean '__ull2float_rn'?
        387 |     return __ull2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~~
            |            __ull2float_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:629:32: note: '__ull2float_rn' declared here
        629 | __device__ static inline float __ull2float_rn(unsigned long long int x) { return (float)x; }
            |                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:392:12: error: use of undeclared identifier '__uint2bfloat16_rn'; did you mean '__uint2float_rn'?
        392 |     return __uint2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~~~
            |            __uint2float_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:598:32: note: '__uint2float_rn' declared here
        598 | __device__ static inline float __uint2float_rn(unsigned int x) { return (float)x; }
            |                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:397:12: error: use of undeclared identifier '__ushort2bfloat16_rn'; did you mean '__float22bfloat162_rn'?
        397 |     return __ushort2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~~~~~
            |            __float22bfloat162_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bf16.h:574:45: note: '__float22bfloat162_rn' declared here
        574 | __BF16_HOST_DEVICE_STATIC__ __hip_bfloat162 __float22bfloat162_rn(const float2 a) {
            |                                             ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:397:33: error: no viable conversion from 'uint16_t' (aka 'unsigned short') to 'float2' (aka 'HIP_vector_type<float, 2>')
        397 |     return __ushort2bfloat16_rn(val);
            |                                 ^~~
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:471:9: note: candidate constructor not viable: no known conversion from 'uint16_t' (aka 'unsigned short') to 'const HIP_vector_type<float, 2> &' for 1st argument
        471 |         HIP_vector_type(const HIP_vector_type&) = default;
            |         ^               ~~~~~~~~~~~~~~~~~~~~~~
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:474:9: note: candidate constructor not viable: no known conversion from 'uint16_t' (aka 'unsigned short') to 'HIP_vector_type<float, 2> &&' for 1st argument
        474 |         HIP_vector_type(HIP_vector_type&&) = default;
            |         ^               ~~~~~~~~~~~~~~~~~
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:466:9: note: candidate template ignored: requirement 'sizeof...(Us) == 2U' was not satisfied [with Us = <uint16_t>]
        466 |         HIP_vector_type(Us... xs) noexcept
            |         ^
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_vector_types.h:457:9: note: explicit constructor is not a candidate
        457 |         HIP_vector_type(U x_) noexcept
            |         ^
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bf16.h:574:80: note: passing argument to parameter 'a' here
        574 | __BF16_HOST_DEVICE_STATIC__ __hip_bfloat162 __float22bfloat162_rn(const float2 a) {
            |                                                                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:402:12: error: use of undeclared identifier '__uint2bfloat16_rn'; did you mean '__uint2float_rn'?
        402 |     return __uint2bfloat16_rn(val);
            |            ^~~~~~~~~~~~~~~~~~
            |            __uint2float_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:598:32: note: '__uint2float_rn' declared here
        598 | __device__ static inline float __uint2float_rn(unsigned int x) { return (float)x; }
            |                                ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:416:12: error: use of undeclared identifier '__float2bfloat162_rn'; did you mean '__float22bfloat162_rn'?
        416 |     return __float2bfloat162_rn(val);
            |            ^~~~~~~~~~~~~~~~~~~~
            |            __float22bfloat162_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_bf16.h:574:45: note: 
            |            __float2int_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:473:30: note: '__float2int_rn' declared here
        473 | __device__ static inline int __float2int_rn(float x) { return (int)__ocml_rint_f32(x); }
            |                              ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:556:12: error: use of undeclared identifier '__bfloat162ull_rn'; did you mean '__float2ull_rn'?
        556 |     return __bfloat162ull_rn(val);
            |            ^~~~~~~~~~~~~~~~~
            |            __float2ull_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:502:49: note: '__float2ull_rn' declared here
        502 | __device__ static inline unsigned long long int __float2ull_rn(float x) {
            |                                                 ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:583:12: error: use of undeclared identifier '__bfloat162uint_rn'; did you mean '__float2uint_rn'?
        583 |     return __bfloat162uint_rn(val);
            |            ^~~~~~~~~~~~~~~~~~
            |            __float2uint_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:491:39: note: '__float2uint_rn' declared here
        491 | __device__ static inline unsigned int __float2uint_rn(float x) {
            |                                       ^
      In file included from deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip:10:
      /lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes/conversion_utils_hip.h:610:12: error: use of undeclared identifier '__bfloat162uint_rn'; did you mean '__float2uint_rn'?
        610 |     return __bfloat162uint_rn(val);
            |            ^~~~~~~~~~~~~~~~~~
            |            __float2uint_rn
      /opt/rocm-6.3.3/lib/llvm/bin/../../../include/hip/amd_detail/amd_device_functions.h:491:39: note: '__float2uint_rn' declared here
        491 | __device__ static inline unsigned int __float2uint_rn(float x) {
            |                                       ^
      fatal error: too many errors emitted, stopping now [-ferror-limit=]
      20 errors generated when compiling for gfx942.
      failed to execute:/opt/rocm-6.3.3/lib/llvm/bin/clang++  --offload-arch=gfx942  -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/core_ops/bias_activations -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/core_ops/blas_kernels -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/core_ops/cuda_layer_norm -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/core_ops/cuda_rms_norm -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/core_ops/gated_activations -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/core_ops/cuda_linear -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/DeepSpeed/deepspeed/inference/v2/kernels/includes -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.3.3/include -I/lustre/hpe/ws13/ws13.a/ws/hpchpate-ds_benchmarks/venv-deepspeed/include -I/opt/cray/pe/python/3.11.7/include/python3.11 -c -x hip deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.hip -o "/tmp/tmpvj8fusoa.build-temp/deepspeed/inference/v2/kernels/core_ops/bias_activations/bias_activation_hip.o" -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -U__HIP_NO_HALF_OPERATORS__ -U__HIP_NO_HALF_CONVERSIONS__ -U__HIP_NO_HALF2_OPERATORS__ -DROCM_VERSION_MAJOR=6 -DROCM_VERSION_MINOR=3 -DBF16_AVAILABLE -DROCM_WAVEFRONT_SIZE=64 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=kernelsinference_core_ops -D_GLIBCXX_USE_CXX11_ABI=1 -fno-gpu-rdc
      error: command '/opt/rocm-6.3.3/bin/hipcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building editable for deepspeed
Failed to build deepspeed

[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: pip install --upgrade pip
error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects
╰─> deepspeed

```
There seems to be some issue with some variable types that have maybe different naming convention for hip. Could you please look into this and let me know possible solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems while Deepspeed building from scratch on AMD MI300a APU #7843

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems while Deepspeed building from scratch on AMD MI300a APU #7843

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions