-
Notifications
You must be signed in to change notification settings - Fork 345
Description
thrust::reduce's documentation states:
Note that reduce also assumes that the binary reduction operator (in this case binary_op) is commutative. If the reduction operator is not commutative then thrust::reduce should not be used. Instead, one could use inclusive_scan (which does not require commutativity) and select the last element of the output array.
In practice, for the CUDA backend, thrust::reduce will happen to respect non-commutativity operators - e.g. the global order of the reduction will be respected, from left to right. This happens because:
- Within a thread, all items are reduced from left to right
- Within a warp, all aggregates are reduced from left to right (is this true?)
- Within a block, all aggregates are reduced from left to right (is this true?)
- Across the grid, all aggregates are reduced from left to right because blocks are rasterized in linear monotonically increasing order.
I am not sure what is true for the other backends. std::reduce does not guarantee non-commutativity and in practice there are implementations that do not respect it (such as libstdc++'s default non-execution-policy overload, I believe).
A reduction that works for non-commutative operators is quite useful. Conor and I have discussed this on ADSP in detail.
C++ has recently added std::ranges::fold*, a range-based reduction that specifies the accumulation order but does not prescribe it in a way that forces serial execution like std::accumulate. So instead of changing the docs for thrust::reduce, we could just add thrust::fold* and say those work for non-commutative operators.
At the very least, we should probably not advise people to use inclusive_scan as an alternative, as that's pretty heavyweight.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status