Model take a long time for generating moderately long sequences. How can I make use of multiple GPUs for this case?
from torch.nn.parallel import DistributedDataParallel also didn't help.
12 Minutes for generating 16,384 long sequence.
Is there any way to improve this using multiple GPUs?

Model take a long time for generating moderately long sequences. How can I make use of multiple GPUs for this case?
from torch.nn.parallel import DistributedDataParallelalso didn't help.12 Minutes for generating 16,384 long sequence.
Is there any way to improve this using multiple GPUs?