Will require a bit of restructuring of the rollout logic internals, especially if we want to allow interleaving rollout and reward computation, which I think is a good feature to add (this isn't fully compatible with some pairwise reward strategies, so will have to be optional).