Split one model's different parts on different gpus #7162
Replies: 8 comments 3 replies
-
PyTorch Lightning has support for self.model = nn.Sequential(Bert(), Linear(10, 20)) # __init__()
...
...
self.model(x) # forward()
...
plugin = RPCSequentialPlugin(balance=[1, 1])
trainer = Trainer() |
Beta Was this translation helpful? Give feedback.
-
Hey @dalek-who, I won't recommend to use Instead, you can DeepSpeed Integration: https://pytorch-lightning.readthedocs.io/en/stable/advanced/multi_gpu.html?highlight=deepspeed#deepspeed. We managed to scale crazy large model. It can also be used on only 1 gpu with Give it a try and give us feedback. Best, |
Beta Was this translation helpful? Give feedback.
-
@tchaton Can you provide a simple |
Beta Was this translation helpful? Give feedback.
-
Oh, I wasn't aware of the deprecation. Sorry about that. |
Beta Was this translation helpful? Give feedback.
-
Hey guys :) Regarding the deprecation of the DeepSpeed Stage 3 offers the same practice which we already have within Lightning. A minimal example of how all this can work can be found here: https://github.com/SeanNaren/minGPT/tree/stage3 Regarding a layer (in this case We are planning on a refresh in the documentation to make it easier to find these tidbits, as things have become a bit complex in the ecosystem. For a small example: class MyLargeModel(pl.LightningModule):
def __init__(self):
super().__init__()
# a large backbone like bert
self.bert = Bert()
def configure_sharded_model(self):
# a very very large classifier layer with 6 million classes, is now sharded instantly onto all GPUs
# Using DeepSpeed Stage 3
self.classifier = nn.Linear(768, 6_000_000)
def forward(x):
emb = self.bert(x)
score = self.classifier(emb)
return score
trainer = pl.Trainer(
gpus=4,
plugins='deepspeed_stage_3'
)
trainer.fit(model) DeepSpeed Stage 3 shards the model across all GPUs, but |
Beta Was this translation helpful? Give feedback.
-
@SeanNaren which torch and pytorch-lightning version should I use? |
Beta Was this translation helpful? Give feedback.
-
Dear @dalek-who, You should you PyTorch 1.3.0rc1 and latest PyTorch. Best, |
Beta Was this translation helpful? Give feedback.
-
@tchaton I use pl-1.3.0rc1 and torch-1.8.1. Some problems of this solution:
File "/home/projects/long_tail_link/link_main.py", line 479, in main
trainer.test(model=pl_module, verbose=False)
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 956, in test
results = self.fit(model)
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 485, in fit
self.pre_dispatch()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 512, in pre_dispatch
self.accelerator.pre_dispatch(self)
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 105, in pre_dispatch
self.training_type_plugin.pre_dispatch()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 234, in pre_dispatch
self.init_deepspeed()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 239, in init_deepspeed
self._format_config()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 395, in _format_config
self._format_batch_size_and_grad_accum_config()
File "/home/anaconda3/envs/conda-long-tail-link/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/deepspeed.py", line 407, in _format_batch_size_and_grad_accum_config
batch_size = self.lightning_module.train_dataloader().batch_sampler.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_sampler'
|
Beta Was this translation helpful? Give feedback.
-
🚀 Feature
Motivation
In my case, I have a simplified large model like this:
self.classifier
is so large that it must be on another gpu.However, if I simply set
pl.Trainer
:It will copy the model on two gpus (and both will raise
CUDA out of memory
), rather than split it on two gpus.Pitch
A easy way to manually split one model on different device like the tutorial above.
Alternatives
Additional context
Beta Was this translation helpful? Give feedback.
All reactions