Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Add multiset host-bulk retrieve APIs #465

Open
PointKernel opened this issue Apr 17, 2024 · 1 comment · May be fixed by #537
Open

[FEATURE]: Add multiset host-bulk retrieve APIs #465

PointKernel opened this issue Apr 17, 2024 · 1 comment · May be fixed by #537
Assignees
Labels
helps: rapids Helps or needed by RAPIDS In Progress Currently a work in progress topic: static_multiset Issue related to the static_multiset type: feature request New feature request

Comments

@PointKernel
Copy link
Member

PointKernel commented Apr 17, 2024

Is your feature request related to a problem? Please describe.

Add multiset host-bulk retrieve APIs

Describe the solution you'd like

The basic API to add:

  /**
   * @brief Retrieves the matched key in the multiset corresponding to all probe keys in the range
   * `[first, last)`
   *
   * For key `k = *(first + i)` has matches in the multiset, copies `k` and each of its associated values to
   * unspecified locations in `output_probe + *` and `output_match + *`. Else, does nothing.
   *
   * @note This function synchronizes the given stream. For asynchronous execution use
   * `retrieve_async`.
   * @note Behavior is undefined if the size of the output range exceeds
   * `std::distance(output_*_begin, output_*_end)`.
   *
   * @tparam InputIt Device accessible input iterator
   * @tparam OutputIt1 Device accessible output iterator whose `value_type` can be constructed from
   * probe key type
   * @tparam OutputIt2 Device accessible output iterator whose `value_type` can be constructed from
   * `Key`
   *
   * @param first Beginning of the sequence of probe keys
   * @param last End of the sequence of probe keys
   * @param output_probe Beginning of the sequence of the probe keys that have a match
   * @param output_match Beginning of the sequence of the matched keys
   * @param stream CUDA stream used to retrieve
   *
   * @return Pair of the iterators indicating the last elements in the output
   */
  template <typename InputIt, typename OutputIt1, typename OutputIt2>
  std::pair<OutputIt1, OutputIt2> retrieve(InputIt first,
                                           InputIt last,
                                           OutputIt1 output_probe,
                                           OutputIt2 output_match,
                                           cuda_stream_ref stream = {}) const;

Two APIs desired by libcudf:

  /**
   * @brief Asynchronously retrieves the matched key in the set corresponding to all probe keys in
   * the range `[first, last)`
   *
   * For key `k = *(first + i)` has matches in the multiset, copies `k` and each of its associated values to
   * unspecified locations in `output_probe + *` and `output_match + *`. Else, does nothing.
   *
   * @note Behavior is undefined if the size of the output range exceeds
   * `std::distance(output_*_begin, output_*_end)`.
   *
   * @tparam InputIt Device accessible input iterator
   * @tparam OutputIt1 Device accessible output iterator whose `value_type` can be constructed from
   * probe key type
   * @tparam OutputIt2 Device accessible output iterator whose `value_type` can be constructed from
   * `Key`
   * @tparam ProbeKeyEqual Binary callable equal type
   * @tparam ProbeHash Unary callable hasher type that can be constructed from
   * an integer value
   *
   * @param first Beginning of the sequence of probe keys
   * @param last End of the sequence of probe keys
   * @param probe_key_equal The binary function to compare set keys and probe keys for equality
   * @param probe_hash The unary function to hash probe keys
   * @param output_probe Beginning of the sequence of the probe keys that have a match
   * @param output_match Beginning of the sequence of the matched keys
   * @param stream CUDA stream used to retrieve
   *
   * @return Pair of the iterators indicating the last elements in the output
   */
  template <typename InputIt, typename OutputIt, typename ProbeKeyEqual, typename ProbeHash>
  std::pair<OutputIt1, OutputIt2>  retrieve(InputIt first,
                                            InputIt last,
                                            ProbeKeyEqual const& probe_key_equal,
                                            ProbeHash const& probe_hash,
                                            OutputIt1 output_probe,
                                            OutputIt2 output_match,
                                            cuda_stream_ref stream = {}) const;

  /**
   * @brief Asynchronously retrieves the matched key in the set corresponding to all probe keys in
   * the range `[first, last)`
   *
   * For key `k = *(first + i)` has matches in the multiset, copies `k` and each of its associated
   * values to unspecified locations in `output_probe + *` and `output_match + *`. Otherwise,
   * copies `key` and `empty_key_sentinel`.
   *
   * @note Behavior is undefined if the size of the output range exceeds
   * `std::distance(output_*_begin, output_*_end)`.
   *
   * @tparam InputIt Device accessible input iterator
   * @tparam OutputIt1 Device accessible output iterator whose `value_type` can be constructed from
   * probe key type
   * @tparam OutputIt2 Device accessible output iterator whose `value_type` can be constructed from
   * `Key`
   * @tparam ProbeKeyEqual Binary callable equal type
   * @tparam ProbeHash Unary callable hasher type that can be constructed from
   * an integer value
   *
   * @param first Beginning of the sequence of probe keys
   * @param last End of the sequence of probe keys
   * @param probe_key_equal The binary function to compare set keys and probe keys for equality
   * @param probe_hash The unary function to hash probe keys
   * @param output_probe Beginning of the sequence of the probe keys that have a match
   * @param output_match Beginning of the sequence of the matched keys
   * @param stream CUDA stream used to retrieve
   *
   * @return Pair of the iterators indicating the last elements in the output
   */
  template <typename InputIt, typename OutputIt, typename ProbeKeyEqual, typename ProbeHash>
  std::pair<OutputIt1, OutputIt2>  retrieve_outer(InputIt first,
                                                  InputIt last,
                                                  ProbeKeyEqual const& probe_key_equal,
                                                  ProbeHash const& probe_hash,
                                                  OutputIt1 output_probe,
                                                  OutputIt2 output_match,
                                                  cuda_stream_ref stream = {}) const;

Describe alternatives you've considered

  • Can the _outer variant be represented in a more natural expression?

Can we use cub::DeviceSelect in this case?

Additional context

  • retrieve APIs are always synchronous
  • The most challenging PR requires extensive profiling and benchmarking work: launching parameter (choosing optimal block size, shared memory sizes, etc), buffer size, probing/flushing CG sizes to tune
@PointKernel PointKernel added type: feature request New feature request helps: rapids Helps or needed by RAPIDS topic: static_multiset Issue related to the static_multiset labels Apr 17, 2024
@PointKernel PointKernel added this to the static_multiset milestone Apr 17, 2024
@PointKernel
Copy link
Member Author

The return pair should be std::pair instead of cuda::std::pair since they host APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS In Progress Currently a work in progress topic: static_multiset Issue related to the static_multiset type: feature request New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants