-
Notifications
You must be signed in to change notification settings - Fork 107
[ENHANCEMENT]: Add count_if and retrieve_if APIs to static_multiset #800
Copy link
Copy link
Open
Labels
helps: rapidsHelps or needed by RAPIDSHelps or needed by RAPIDStype: feature requestNew feature requestNew feature request
Description
Is your feature request related to a problem? Please describe.
static_multiset currently has insert_if and contains_if with stencil/predicate support, but the count, count_outer, retrieve, and retrieve_outer APIs lack corresponding _if variants.
In cuDF's hash join, we want to use a bloom filter to pre-filter probe rows before counting/retrieving matches. The bloom filter produces a per-row boolean predicate. With count_if / retrieve_if, we could skip probe rows that the bloom filter rejects, avoiding unnecessary hash table lookups.
Describe the solution you'd like
Proposed API (following the existing insert_if / contains_if pattern):
// Count matches only for probe keys where pred(*(stencil + i)) is true.
// Keys where the predicate is false contribute 0 to the count (inner)
// or 1 (outer, for left/full join semantics).
size_type count_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);
size_type count_outer_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);
// Retrieve matches only for probe keys where pred(*(stencil + i)) is true.
retrieve_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...);
retrieve_outer_if(InputIt first, InputIt last, StencilIt stencil, Predicate pred, ...); Describe alternatives you've considered
No response
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
helps: rapidsHelps or needed by RAPIDSHelps or needed by RAPIDStype: feature requestNew feature requestNew feature request