-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
accelerator: tpuTensor Processing UnitTensor Processing UnitbugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onwaiting on authorWaiting on user action, correction, or updateWaiting on user action, correction, or update
Milestone
Description
🐛 Bug
After #2016 was fixed with PR #2033 the code is running perfectly on single tpu core and a specific tpu core but now not working with 8 tpu cores. After the training is complete getting RuntimeError: Cannot replicate if number of devices (1) is different from 8.
To Reproduce
Expected behavior
Should train with 8 tpu cores with no error just like it works in case of a single core.
Environment
- pytorch/xla: nightly
- pytorch-lightning: master
- PyTorch Version (e.g., 1.0): 1.5
- OS (e.g., Linux): Linux
- How you installed PyTorch (
conda,pip, source): pip - Python version: 3.7
Metadata
Metadata
Assignees
Labels
accelerator: tpuTensor Processing UnitTensor Processing UnitbugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onwaiting on authorWaiting on user action, correction, or updateWaiting on user action, correction, or update