-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingexamplegood first issueGood for newcomersGood for newcomersloopsRelated to the Loop APIRelated to the Loop API
Description
🐛 Bug
When using the cross validation loop from the example pl_examples/loop_examples/kfold.py with ddp_spawn strategy, it encounter a SIGABRT Exception and the program crash.
To Reproduce
import os
from pytorch_lightning import seed_everything, Trainer
from pl_examples.loop_examples.kfold import KFoldLoop, LitImageClassifier, MNISTKFoldDataModule
def run():
seed_everything(42)
model = LitImageClassifier()
datamodule = MNISTKFoldDataModule()
trainer = Trainer(
default_root_dir=os.getcwd(),
limit_train_batches=1,
limit_val_batches=1,
limit_test_batches=1,
num_sanity_val_steps=0,
max_epochs=1,
enable_model_summary=False,
strategy="ddp_spawn",
)
internal_fit_loop = trainer.fit_loop
trainer.fit_loop = KFoldLoop(5, export_path="./")
trainer.fit_loop.connect(internal_fit_loop)
trainer.fit(model, datamodule=datamodule)
if __name__ == "__main__":
run()Expected behavior
Training without exception
Environment
- CUDA:
- GPU:
- Tesla P100-PCIE-16GB
- Tesla P100-PCIE-16GB
- Tesla P100-PCIE-16GB
- Tesla P100-PCIE-16GB
- Tesla P100-PCIE-16GB
- Tesla P100-PCIE-16GB
- Tesla P100-PCIE-16GB
- Tesla P100-PCIE-16GB
- available: True
- version: 10.2
- GPU:
- Packages:
- numpy: 1.20.2
- pyTorch_debug: False
- pyTorch_version: 1.9.0
- pytorch-lightning: 1.6.0rc0
- tqdm: 4.60.0
- System:
- OS: Linux
- architecture:
- 64bit
- processor:
- python: 3.8.9
- version: Proposal for help #1 SMP Fri Jan 14 13:59:45 UTC 2022
Additional context
I am working on a fix.
cc @Borda @carmocca @justusschock @ananthsub @ninginthecloud @rohitgr7
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingexamplegood first issueGood for newcomersGood for newcomersloopsRelated to the Loop APIRelated to the Loop API