If a param_group has fewer parameter elements than the size of data parallelism, then certain GPUs will have empty partitions causing it to crash