changed is_partitioned impl to make it similar to pstl's is_partitioned#8130
changed is_partitioned impl to make it similar to pstl's is_partitioned#8130dhruvilmehta wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
|
Thx for the PR! We need a benchmark that shows the benefits of the new approach. Can you please add the benchmark from #8084 to Thrust and measure the performance before and after your changes, and report the nvbench diff? Instructions for benchmarking are in our documentation: https://nvidia.github.io/cccl/unstable/cub/benchmarking.html |
| auto result = thrust::transform_reduce( | ||
| exec, first, last, detail::is_partitioned_unary_op<Predicate>{pred}, identity, detail::partition_binary_op()); | ||
|
|
There was a problem hiding this comment.
I believe this is not the right approach here.
The reason being that transform_reduce does not short-circuit on a negative match, so especially for larger arrays you are spending a considerable time searching for a result you already know.
in #8084 I used our DeviceFind::If machinery, because that stops searching after the first hit
|
I made the implementation similar to pstl's implementation. There'e no major difference in the benchmark results which I got. |
Description
Changed the implementation of thrust::is_partitioned to make it similar to libcudacxx::is_partitioned(#8084 ).
closes #8085
Made the implementation similar to pstl's implementation
Checklist