https://github.com/NVIDIA/cccl/issues/7741 highlights need in load balancing solutions in segmented scan. We should benchmark [tmol](https://github.com/uw-ipd/tmol) use case.