Closed
Description
Thank you @SergeyKopienko for pointing this out in #2200 (comment).
We have two implementations of __pattern_min_element_reduce_fn
. The first is here in algorithm_ranges_impl_hetero.h
and the second here in algorithm_impl_hetero.h
The definitions are different with the second having different codepaths for SPIR-V and non SPIR-V targets. The implementation of the underlying reduce implementation has changed since this was added. We should reevaluate the need for these separate implementations and consolidate __pattern_min_element_reduce_fn
into a single instance.
Metadata
Metadata
Assignees
Labels
No labels
Activity
[-]Remove duplicated `__pattern_min_element_reduce_fn` function[/-][+]Remove duplicated `__pattern_min_element_reduce_fn` functor[/+]mmichel11 commentedon Apr 22, 2025
After experimentation, the non-commutative specialization still performs best on Intel GPU Series Max 1550, so it should remain as is. Coalescing is needed to achieve the best memory access pattern on other architectures, so the non SPIR-V path should also remain.
__pattern_min_element_reduce_fn
implementations #2205