Migrate cuco HLL #6666

srinivasyadav18 · 2025-11-18T00:01:39Z

Description

closes #6506

Things to consider / TODO:

Current constructor's accept memory_resource defaulted to typecuda::device_memory_pool_ref instead of using old allocator's. However, user is always expected pass memory_resource object. There are possibilites the code could break, when only template parameter is passed and not the object.
Should we add strong types sketch_kb and standard_deviation
cuco uses sketch_kb a strong type aliased to double to represent the size of sketch (memory used to for the data-structure) in kb.
similar to sketch_kb, there is also standard_deviation a strong type aliased to double.
Add merge host APIs (blocked by proper usage of memory_resource's)
Add more tests
Add examples
Add benchmark

copy-pr-bot · 2025-11-18T00:01:43Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

srinivasyadav18 · 2025-11-20T02:24:03Z

pre-commit.ci autofix

copy-pr-bot · 2025-11-20T02:25:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

srinivasyadav18 · 2025-11-20T02:28:06Z

/ok to test 6862c49

srinivasyadav18 · 2025-11-20T03:51:54Z

/ok to test 2fe8809

fbusato

just started with one file. Please propagate the suggestions and refine the implementation. After that, I will review other files

cudax/include/cuda/experimental/__cuco/detail/hyperloglog/finalizer.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/finalizer.cuh

cudax/include/cuda/experimental/__cuco/detail/hyperloglog/finalizer.cuh

fbusato · 2025-11-20T17:21:01Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/finalizer.cuh

+  _CCCL_API constexpr _Finalizer(int __precision_)
+      : __precision{__precision_}
+      , __m{1 << __precision_}
+  {}


add an assertion to check the precision range

the assertion is still missing

cudax/include/cuda/experimental/__cuco/detail/hyperloglog/finalizer.cuh

- uses cuda::contiguous_iterator instead of thrust::* - uses `_CCCL_TRY_CUDA_API` instead of depending `stf::cuda_safe_call` - adds missing include thrust/raw_pointer_cast.h

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/tuning.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/kernels.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

cudax/include/cuda/experimental/__cuco/hyperloglog.cuh

cudax/include/cuda/experimental/__cuco/hyperloglog_ref.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

cudax/include/cuda/experimental/__cuco/hyperloglog.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

srinivasyadav18 · 2026-01-21T22:12:38Z

/ok to test 0e46dcd

cudax/include/cuda/experimental/__cuco/hyperloglog.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/kernels.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/kernels.cuh

cudax/include/cuda/experimental/__cuco/hyperloglog.cuh

cudax/test/cuco/hyperloglog/test_hyperloglog.cu

cudax/include/cuda/experimental/__cuco/__hyperloglog/kernels.cuh

cudax/include/cuda/experimental/__cuco/__utility/strong_type.cuh

sleeepyjack · 2026-01-23T00:55:14Z

FYI there's a bugfix PR in cuco to match Spark's HLL behavior. We should also integrate it into this PR. NVIDIA/cuCollections#792

srinivasyadav18 · 2026-01-23T01:36:56Z

/ok to test 76c8417

srinivasyadav18 · 2026-01-23T21:03:03Z

/ok to test 5153b0b

cudax/include/cuda/experimental/__cuco/__hash_functions/murmurhash3.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/finalizer.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

fbusato · 2026-01-23T21:20:43Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+  //! @param hash The hash function used to hash items
+  _CCCL_API constexpr _HyperLogLog_Impl(::cuda::std::span<::cuda::std::byte> sketch_span, const _Hash& hash)
+      : __hash{hash}
+      , __precision{::cuda::std::countr_zero(


I don't see the assertion

fbusato · 2026-01-23T21:21:18Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+  //! @brief Adds an item to the estimator.
+  //!
+  //! @param item The item to be counted
+  _CCCL_DEVICE constexpr void __add(const _Tp& __item) noexcept


I still see a mix of them

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/kernels.cuh

cudax/include/cuda/experimental/__cuco/__hash_functions/xxhash.cuh

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

miscco · 2026-01-26T07:32:44Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+    int __device = -1;
+    _CCCL_TRY_CUDA_API(::cudaGetDevice, "cudaGetDevice failed", &__device);


@davebayer I believe there are some new cudart functiosn for that?

miscco · 2026-01-26T07:36:15Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/tuning.cuh

+#ifndef CUDAX_CUCO_HLL_TUNING_ARR_DECL
+#  define CUDAX_CUCO_HLL_TUNING_ARR_DECL __device__ static constexpr ::cuda::std::array
+#endif


I believe this should just be an _CCCL_DEVICE static inline constexpr float __meow[] = {...}`

Note that AFAIK this will fail for windows, because it treats floating point constants differently, so there it would need to be _CCCL_GLOBAL_CONSTANT

See https://godbolt.org/z/5z9c63zTM

cudax/test/CMakeLists.txt

srinivasyadav18 · 2026-01-26T19:10:21Z

/ok to test 72b589f

srinivasyadav18 · 2026-01-26T20:31:46Z

/ok to test 507d169

github-actions · 2026-01-26T20:49:04Z

😬 CI Workflow Results

🟥 Finished in 15m 34s: Pass: 92%/39 | Total: 2h 34m | Max: 14m 05s | Hits: 99%/18089

See results here.

fbusato · 2026-01-26T23:02:56Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/finalizer.cuh

+  _CCCL_API constexpr _Finalizer(int __precision_)
+      : __precision{__precision_}
+      , __m{1 << __precision_}
+  {}


the assertion is still missing

fbusato · 2026-01-26T23:08:18Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+//! @tparam _Scope The scope in which operations will be performed by individual threads
+//! @tparam _Hash Hash function used to hash items
+template <class _Tp, ::cuda::thread_scope _Scope, class _Hash>
+class _HyperLogLog_Impl


what is the decision on the class name?

fbusato · 2026-01-26T23:09:38Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+      _CCCL_THROW(::std::invalid_argument{"Sketch storage has insufficient alignment"});
+    }
+
+    if (__precision < 4)


we also needs to check the upper-bound, at least internally with _CCCL_ASSERT

fbusato · 2026-01-26T23:10:28Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+#include <cuda/experimental/__cuco/hash_functions.cuh>
+#include <cuda/experimental/memory_resource.cuh>
+
+#include <cooperative_groups.h>


we need a final decision for CG

fbusato · 2026-01-26T23:16:11Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+    // https://github.com/apache/spark/blob/6a27789ad7d59cd133653a49be0bb49729542abe/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala#L43
+
+    auto const __precision_from_sd = static_cast<int>(
+      ::cuda::std::ceil(2.0 * ::cuda::std::log(1.106 / __standard_deviation) / ::cuda::std::numbers::ln2));


is it not equivalent to log2(x)?

fbusato · 2026-01-26T23:18:36Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/kernels.cuh

+    __idx += __loop_stride;
+  }
+  // a single thread processes the remaining items
+#if defined(CUDART_VERSION) && (CUDART_VERSION >= 12010)


could we use _CCCL_CTK macro here?

fbusato · 2026-01-26T23:20:31Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+  {
+    if (__other.__precision != __precision)
+    {
+      _CCCL_THROW(::std::invalid_argument{"Cannot merge estimators with different sketch sizes"});


we recently changed this to

Suggested change

_CCCL_THROW(::std::invalid_argument{"Cannot merge estimators with different sketch sizes"});

_CCCL_THROW(::std::invalid_argument, "Cannot merge estimators with different sketch sizes");

fbusato · 2026-01-26T23:21:36Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+  static constexpr auto __thread_scope = _Scope; ///< CUDA thread scope
+
+  template <::cuda::thread_scope _NewScope>
+  using with_scope = _HyperLogLog_Impl<_Tp, _NewScope, _Hash>; ///< Ref type with different thread scope


is it internal or external?

It's public facing. The idea is to have an easy way for the user to switch to a different thread scope, e.g., when migrating a HLL sketch from global to shared memory. In this case, getting the new type is as simple as using block_scope_type = device_scope_type::with_scope<cuda::thread_scope_block>.

fbusato · 2026-01-26T23:21:49Z

cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

+  friend struct _HyperLogLog_Impl;
+
+public:
+  static constexpr auto __thread_scope = _Scope; ///< CUDA thread scope


this is probably private

Migrate cuco HLL

48abca9

github-project-automation bot added this to CCCL Nov 18, 2025

github-project-automation bot moved this to Todo in CCCL Nov 18, 2025

cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Nov 18, 2025

srinivasyadav18 and others added 6 commits November 19, 2025 14:12

Add sketch_size strong type

b9b6987

add standard_deviation strong type and its constructors

ffa6bcb

__uglify detail strong typed names

4c6e3e9

Rename _MemoryResource to _MemoryResourceRef

c24d3c7

Rename template paramter _T to _Tp to avoid NASTY macros

4524aff

Merge branch 'main' into cuco_hll

25003ba

[pre-commit.ci] auto code formatting

6862c49

srinivasyadav18 added 3 commits November 19, 2025 18:53

Resolve merge conflict

34ab7cc

resolve code spell issues

ae3d1c6

rename usage of async_device_buffer_ device_buffer

2fe8809

srinivasyadav18 marked this pull request as ready for review November 20, 2025 03:20

srinivasyadav18 requested a review from a team as a code owner November 20, 2025 03:20

srinivasyadav18 requested a review from fbusato November 20, 2025 03:20

cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Nov 20, 2025

This comment has been minimized.

Sign in to view

fbusato requested changes Nov 20, 2025

View reviewed changes

github-project-automation bot moved this from In Review to In Progress in CCCL Nov 20, 2025

Minor Refactoring

eb70358

- uses cuda::contiguous_iterator instead of thrust::* - uses `_CCCL_TRY_CUDA_API` instead of depending `stf::cuda_safe_call` - adds missing include thrust/raw_pointer_cast.h

This comment has been minimized.

Sign in to view

update license

10b23c4

fbusato requested changes Jan 16, 2026

View reviewed changes

srinivasyadav18 added 3 commits January 20, 2026 13:20

many more minor improvements from review

b8a1f9c

Add precision based contructor

c50318c

Add proper bound check for precision, sd, sketch_size in constructors

0264a78

fbusato requested changes Jan 21, 2026

View reviewed changes

clean up exception and throw logic

0e46dcd

srinivasyadav18 requested a review from fbusato January 21, 2026 22:12

fbusato requested changes Jan 21, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

PointKernel reviewed Jan 22, 2026

View reviewed changes

cudax/include/cuda/experimental/__cuco/__utility/strong_type.cuh Outdated Show resolved Hide resolved

more cleanups

76c8417

This comment has been minimized.

Sign in to view

srinivasyadav18 added 2 commits January 23, 2026 13:01

accept host mr for estimate

bda6fb3

nit

5153b0b

fbusato requested changes Jan 23, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

srinivasyadav18 added 2 commits January 23, 2026 15:51

more nits

e0a0063

more precise includes

36cc494

miscco reviewed Jan 26, 2026

View reviewed changes

more nits

72b589f

This comment has been minimized.

Sign in to view

experiment with __device__ static inline constexpr double

507d169

fbusato requested changes Jan 26, 2026

View reviewed changes

		int __device = -1;
		_CCCL_TRY_CUDA_API(::cudaGetDevice, "cudaGetDevice failed", &__device);

	_CCCL_THROW(::std::invalid_argument{"Cannot merge estimators with different sketch sizes"});
	_CCCL_THROW(::std::invalid_argument, "Cannot merge estimators with different sketch sizes");

Migrate cuco HLL #6666

Are you sure you want to change the base?

Migrate cuco HLL #6666

Conversation

srinivasyadav18 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Things to consider / TODO:

Uh oh!

copy-pr-bot bot commented Nov 18, 2025

Uh oh!

srinivasyadav18 commented Nov 20, 2025

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

srinivasyadav18 commented Nov 20, 2025

Uh oh!

This comment has been minimized.

srinivasyadav18 commented Nov 20, 2025

Uh oh!

This comment has been minimized.

fbusato left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srinivasyadav18 commented Jan 21, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

sleeepyjack commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srinivasyadav18 commented Jan 23, 2026

Uh oh!

This comment has been minimized.

srinivasyadav18 commented Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srinivasyadav18 commented Nov 18, 2025 •

edited

Loading

sleeepyjack commented Jan 23, 2026 •

edited

Loading