* Allocating across multiple gpus [and ideally auto sharding] * Ensuring multiple GPUs are used for computation x/ref https://github.com/EnzymeAD/Reactant.jl/issues/380