Skip to content

Commit

Permalink
Update interfaces to CUB 1.16 (#192)
Browse files Browse the repository at this point in the history
* test_hipcub_device_radix_sort.cpp Correctly test -NaN.

* `test_utils::native_half` -NaN to `float` fix

* `hipcub::WarpExchange` interface to `::rocprim::warp_exchange`

* Fix after review

* Default CUDA architecture is 53 to fix __half

* Apply 1 suggestion(s) to 1 file(s)

* Added NVGPU_TARGETS to gitlab-ci

* Update .gitlab-ci.yml file

* Changes from [PR346](NVIDIA/cub#346)

* Add deprecation warnings.

* Update of deprecated statement.

* Adding constants from [PR418](NVIDIA/cub#418).

* Fix deprecation warnings.

* Fix a forgotten deprecation warnings.

* Fix deprecation warnings.

* Fix deprecation warnings for nvcc.

* Replace '__host__ __device__' by 'HIPCUB_HOST_DEVICE'

* Added Cuda standard

* Bumped referenced CUB and thrust version to 1.16

* Download thrust in test/extra

* Added the interface for UniqueByKey

* Added test for UniqueByKey

* Added benchmark for UniqueByKey

* Add UniqueByKey interface

* Fix alignment of UniqueByKey parameters

* Use 'unsigned int' instead of a one element vector for selected_count_output in UniqueByKey benchmark

* Update interface

* Update tests, add test for int64_t size

* Upde CUB interface

* Apply 1 suggestion(s) to 1 file(s)

* Add interfaces for subtract

* Ignore deprecation warnings from rocPRIM for flags API

* Add deprecation warnings for Flags API

* Ignore deprecation warnings for Flags API tests

* Fix Subtract interfaces

* Fix SubtractRightPartial not using the right method

* Add benchmark for AdjacentDifference (Subtract)

* Add test for AdjacentDifference (Subtract)

* Use 'HIPCUB_HOST_DEVICE' macro

* Fix a typo

* Fix interfaces of Subtract not matching the CUB one

* Upadte the tests and benchmarks to the fixed interfaces of Subtract

* Fix to use temp_storage_ in subtract call

* Fix the tests of Subtract to work with the CUB interfaces

* Add the macros to ignore warning in config.hpp and remove it from block_adjacent_difference file and the from the tests

* Device adjacent difference CUB backend

* New thread operators [skip ci]

* Test device adjacent difference [skip ci]

* Device adjacent difference rocPRIM backend

* Added new headers to the hipcub.hpp-s

* Benchmark for device adjacent difference

* Added missing thread operators

* Updated changelog for CUB 1.16

* Updating changelog for hipCUB 1.16 in next release

Co-authored-by: Vince <vince@streamhpc.com>
Co-authored-by: Gergely Mészáros <gergely@streamhpc.com>
Co-authored-by: Théo Battrel <theo@streamhpc.com>
Co-authored-by: Balint Soproni <balint@streamhpc.com>
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
  • Loading branch information
6 people authored Apr 12, 2022
1 parent f8839ad commit 488b6c8
Show file tree
Hide file tree
Showing 36 changed files with 2,791 additions and 252 deletions.
3 changes: 3 additions & 0 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ test:rocm_install:
.nvcc:
extends:
- .deps:nvcc
- .gpus:nvcc-gpus
- .deps:cmake-minimum
before_script:
- !reference [".deps:nvcc", before_script]
Expand All @@ -220,6 +221,7 @@ build:nvcc:
-D CMAKE_BUILD_TYPE=Release
-D BUILD_TEST=ON
-D BUILD_EXAMPLE=ON
-D NVGPU_TARGETS="$GPU_TARGETS"
-B build
-S .
- cmake --build build
Expand Down Expand Up @@ -251,6 +253,7 @@ build:nvcc-benchmark:
-D BUILD_BENCHMARK=ON
-D CMAKE_CXX_COMPILER=g++-8
-D CMAKE_C_COMPILER=gcc-8
-D NVGPU_TARGETS="$GPU_TARGETS"
-B build
-S .
- cmake --build build
Expand Down
17 changes: 14 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
# Change Log for hipCUB

See README.md on how to build the hipCUB documentation using Doxygen.
## (Unreleased) hipCUB-2.11.1 for ROCm 5.2.0

## (Unreleased) hipCUB-2.12.0 for ROCm 5.2.0
### Added
- Packages for tests and benchmark executable on all supported OSes using CPack.
## (Unreleased) hipCUB-2.11.0 for ROCm 5.1.0
- UniqueByKey device algorithm
- SubtractLeft, SubtractLeftPartialTile, SubtractRight, SubtractRightPartialTile overloads in BlockAdjacentDifference.
- The old overloads (FlagHeads, FlagTails, FlagHeadsAndTails) are deprecated.
- DeviceAdjacentDifference algorithm.
### Changed
- Obsolated type traits defined in util_type.hpp. Use the standard library equivalents instead.
- CUB backend references CUB and thrust version 1.16.0.
- DeviceRadixSort's num_items parameter's type is now templated instead of being an int.
- If an integral type with a size at most 4 bytes is passed (i.e. an int), the former logic applies.
- Otherwise the algorithm uses a larger indexing type that makes it possible to sort input data over 2**32 elements.

## (Released) hipCUB-2.11.0 for ROCm 5.1.0
### Added
- Device segmented sort
- Warp merge sort, WarpMask and thread sort from cub 1.15.0 supported in hipCUB
Expand Down
2 changes: 2 additions & 0 deletions benchmark/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ endfunction()
# ****************************************************************************
# Benchmarks
# ****************************************************************************
add_hipcub_benchmark(benchmark_block_adjacent_difference.cpp)
add_hipcub_benchmark(benchmark_block_discontinuity.cpp)
add_hipcub_benchmark(benchmark_block_exchange.cpp)
add_hipcub_benchmark(benchmark_block_histogram.cpp)
Expand All @@ -76,6 +77,7 @@ add_hipcub_benchmark(benchmark_block_radix_sort.cpp)
add_hipcub_benchmark(benchmark_block_reduce.cpp)
add_hipcub_benchmark(benchmark_block_run_length_decode.cpp)
add_hipcub_benchmark(benchmark_block_scan.cpp)
add_hipcub_benchmark(benchmark_device_adjacent_difference.cpp)
add_hipcub_benchmark(benchmark_device_histogram.cpp)
add_hipcub_benchmark(benchmark_device_partition.cpp)
add_hipcub_benchmark(benchmark_device_radix_sort.cpp)
Expand Down
Loading

0 comments on commit 488b6c8

Please sign in to comment.