Add `cuda::ptx::cp_reduce_async_bulk` #1445

ahendriksen · 2024-02-27T17:14:40Z

Description

closes #1444

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h

1. Add the ifdef 2. Add min, max support for f16 and bf16 (I overlooked this initially)

miscco · 2024-02-28T16:21:05Z

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h

+}
+#endif // __cccl_ptx_isa >= 800
+
+#ifdef _LIBCUDACXX_HAS_NVF16


Note, the PR that brings this in has not been merged, so that will currently always be off until we merge #1140

Okay.. This issue probably caused the tests to fail, so I have guarded the tests on this macro as well.

I am okay with the __half and bfloat16 variants not being available immediately. I have tested the generated PTX offline, so I know it works.

ahendriksen requested review from a team as code owners February 27, 2024 17:14

ahendriksen requested review from miscco and griwes February 27, 2024 17:14

miscco approved these changes Feb 28, 2024

View reviewed changes

miscco reviewed Feb 28, 2024

View reviewed changes

libcudacxx/include/cuda/std/detail/libcxx/include/__cuda/ptx.h Show resolved Hide resolved

ahendriksen added 2 commits February 28, 2024 17:01

Add cuda::ptx::cp_reduce_async_bulk

34b639c

Fix f16 and bf16 support

312c5b6

1. Add the ifdef 2. Add min, max support for f16 and bf16 (I overlooked this initially)

ahendriksen force-pushed the add-ptx-cp-reduce-async-bulk branch from 44aa1af to 312c5b6 Compare February 28, 2024 16:05

miscco reviewed Feb 28, 2024

View reviewed changes

cp.reduce.async.bulk: guard {b}f16 tests

d17a6c8

miscco enabled auto-merge (squash) February 29, 2024 07:15

auto-merge was automatically disabled February 29, 2024 07:25
Pull Request is not mergeable

miscco merged commit 4495154 into NVIDIA:main Mar 1, 2024
562 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `cuda::ptx::cp_reduce_async_bulk` #1445

Add `cuda::ptx::cp_reduce_async_bulk` #1445

ahendriksen commented Feb 27, 2024

miscco Feb 28, 2024

ahendriksen Feb 28, 2024

Add cuda::ptx::cp_reduce_async_bulk #1445

Add cuda::ptx::cp_reduce_async_bulk #1445

Conversation

ahendriksen commented Feb 27, 2024

Description

Checklist

miscco Feb 28, 2024

Choose a reason for hiding this comment

ahendriksen Feb 28, 2024

Choose a reason for hiding this comment

Add `cuda::ptx::cp_reduce_async_bulk` #1445

Add `cuda::ptx::cp_reduce_async_bulk` #1445