Update interfaces to CUB 1.16 (#192)

* test_hipcub_device_radix_sort.cpp Correctly test -NaN. * `test_utils::native_half` -NaN to `float` fix * `hipcub::WarpExchange` interface to `::rocprim::warp_exchange` * Fix after review * Default CUDA architecture is 53 to fix __half * Apply 1 suggestion(s) to 1 file(s) * Added NVGPU_TARGETS to gitlab-ci * Update .gitlab-ci.yml file * Changes from [PR346](NVIDIA/cub#346) * Add deprecation warnings. * Update of deprecated statement. * Adding constants from [PR418](NVIDIA/cub#418). * Fix deprecation warnings. * Fix a forgotten deprecation warnings. * Fix deprecation warnings. * Fix deprecation warnings for nvcc. * Replace '__host__ __device__' by 'HIPCUB_HOST_DEVICE' * Added Cuda standard * Bumped referenced CUB and thrust version to 1.16 * Download thrust in test/extra * Added the interface for UniqueByKey * Added test for UniqueByKey * Added benchmark for UniqueByKey * Add UniqueByKey interface * Fix alignment of UniqueByKey parameters * Use 'unsigned int' instead of a one element vector for selected_count_output in UniqueByKey benchmark * Update interface * Update tests, add test for int64_t size * Upde CUB interface * Apply 1 suggestion(s) to 1 file(s) * Add interfaces for subtract * Ignore deprecation warnings from rocPRIM for flags API * Add deprecation warnings for Flags API * Ignore deprecation warnings for Flags API tests * Fix Subtract interfaces * Fix SubtractRightPartial not using the right method * Add benchmark for AdjacentDifference (Subtract) * Add test for AdjacentDifference (Subtract) * Use 'HIPCUB_HOST_DEVICE' macro * Fix a typo * Fix interfaces of Subtract not matching the CUB one * Upadte the tests and benchmarks to the fixed interfaces of Subtract * Fix to use temp_storage_ in subtract call * Fix the tests of Subtract to work with the CUB interfaces * Add the macros to ignore warning in config.hpp and remove it from block_adjacent_difference file and the from the tests * Device adjacent difference CUB backend * New thread operators [skip ci] * Test device adjacent difference [skip ci] * Device adjacent difference rocPRIM backend * Added new headers to the hipcub.hpp-s * Benchmark for device adjacent difference * Added missing thread operators * Updated changelog for CUB 1.16 * Updating changelog for hipCUB 1.16 in next release Co-authored-by: Vince <vince@streamhpc.com> Co-authored-by: Gergely Mészáros <gergely@streamhpc.com> Co-authored-by: Théo Battrel <theo@streamhpc.com> Co-authored-by: Balint Soproni <balint@streamhpc.com> Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
ROCm · Apr 12, 2022 · 488b6c8 · 488b6c8
1 parent f8839ad
commit 488b6c8
Show file tree

Hide file tree

Showing 36 changed files with 2,791 additions and 252 deletions.
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
@@ -206,6 +206,7 @@ test:rocm_install:
 .nvcc:
   extends:
     - .deps:nvcc
+    - .gpus:nvcc-gpus
     - .deps:cmake-minimum
   before_script:
     - !reference [".deps:nvcc", before_script]
@@ -220,6 +221,7 @@ build:nvcc:
       -D CMAKE_BUILD_TYPE=Release
       -D BUILD_TEST=ON
       -D BUILD_EXAMPLE=ON
+      -D NVGPU_TARGETS="$GPU_TARGETS"
       -B build
       -S .
     - cmake --build build
@@ -251,6 +253,7 @@ build:nvcc-benchmark:
       -D BUILD_BENCHMARK=ON
       -D CMAKE_CXX_COMPILER=g++-8
       -D CMAKE_C_COMPILER=gcc-8
+      -D NVGPU_TARGETS="$GPU_TARGETS"
       -B build
       -S .
     - cmake --build build

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,10 +1,21 @@
 # Change Log for hipCUB
 
 See README.md on how to build the hipCUB documentation using Doxygen.
-## (Unreleased) hipCUB-2.11.1 for ROCm 5.2.0
+
+## (Unreleased) hipCUB-2.12.0 for ROCm 5.2.0
 ### Added
-- Packages for tests and benchmark executable on all supported OSes using CPack.
-## (Unreleased) hipCUB-2.11.0 for ROCm 5.1.0
+- UniqueByKey device algorithm
+- SubtractLeft, SubtractLeftPartialTile, SubtractRight, SubtractRightPartialTile overloads in BlockAdjacentDifference.
+  - The old overloads (FlagHeads, FlagTails, FlagHeadsAndTails) are deprecated.
+- DeviceAdjacentDifference algorithm.
+### Changed
+- Obsolated type traits defined in util_type.hpp. Use the standard library equivalents instead.
+- CUB backend references CUB and thrust version 1.16.0.
+- DeviceRadixSort's num_items parameter's type is now templated instead of being an int.
+  - If an integral type with a size at most 4 bytes is passed (i.e. an int), the former logic applies.
+  - Otherwise the algorithm uses a larger indexing type that makes it possible to sort input data over 2**32 elements.
+
+## (Released) hipCUB-2.11.0 for ROCm 5.1.0
 ### Added
 - Device segmented sort
 - Warp merge sort, WarpMask and thread sort from cub 1.15.0 supported in hipCUB

diff --git a/benchmark/CMakeLists.txt b/benchmark/CMakeLists.txt
@@ -68,6 +68,7 @@ endfunction()
 # ****************************************************************************
 # Benchmarks
 # ****************************************************************************
+add_hipcub_benchmark(benchmark_block_adjacent_difference.cpp)
 add_hipcub_benchmark(benchmark_block_discontinuity.cpp)
 add_hipcub_benchmark(benchmark_block_exchange.cpp)
 add_hipcub_benchmark(benchmark_block_histogram.cpp)
@@ -76,6 +77,7 @@ add_hipcub_benchmark(benchmark_block_radix_sort.cpp)
 add_hipcub_benchmark(benchmark_block_reduce.cpp)
 add_hipcub_benchmark(benchmark_block_run_length_decode.cpp)
 add_hipcub_benchmark(benchmark_block_scan.cpp)
+add_hipcub_benchmark(benchmark_device_adjacent_difference.cpp)
 add_hipcub_benchmark(benchmark_device_histogram.cpp)
 add_hipcub_benchmark(benchmark_device_partition.cpp)
 add_hipcub_benchmark(benchmark_device_radix_sort.cpp)