[TPU-Colab] RuntimeError: Cannot replicate if number of devices (1) is different from 8

## 🐛 Bug

When I run `trainer.test(model)` on a pre-trained model using a Colab TPU instance, the following exception is thrown.
> NB: `trainer.train(model)` works.

### Stack trace
```
Traceback (most recent call last):
  File "run_pl_ged.py", line 217, in <module>
    trainer.test(model)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 958, in test
    self.fit(model)
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/trainer/trainer.py", line 777, in fit
    xmp.spawn(self.tpu_train, args=(model,), nprocs=self.num_tpu_cores, start_method=start_method)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 182, in spawn
    start_method=start_method)
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 116, in _start_fn
    _setup_replication()
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 109, in _setup_replication
    xm.set_replication(str(device), [str(device)])
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 194, in set_replication
    replication_devices = xla_replication_devices(devices)
  File "/usr/local/lib/python3.6/dist-packages/torch_xla/core/xla_model.py", line 181, in xla_replication_devices
    .format(len(local_devices), len(kind_devices)))
RuntimeError: Cannot replicate if number of devices (1) is different from 8
```

#### Code sample

```python
import pytorch_lightning as pl

model = MyLightningModule()
trainer = pl.Trainer(num_tpu_cores=8)
model = model.load_from_checkpoint(checkpoint)
model.prepare_data() # See https://github.com/PyTorchLightning/pytorch-lightning/issues/1562
trainer.test(model)
```

### Environment

Colab TPU instance with XLA 1.5

```
* CUDA:
	- GPU:
	- available:         False
	- version:           None
* Packages:
	- numpy:             1.18.3
	- pyTorch_debug:     False
	- pyTorch_version:   1.5.0a0+ab660ae
	- pytorch-lightning: 0.7.5
	- tensorboard:       2.2.1
	- tqdm:              4.38.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.6.9
	- version:           #1 SMP Wed Feb 19 05:26:34 PST 2020
```

Possibly related: https://github.com/PyTorchLightning/pytorch-lightning/pull/1019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TPU-Colab] RuntimeError: Cannot replicate if number of devices (1) is different from 8 #1703

🐛 Bug

Stack trace

Code sample

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[TPU-Colab] RuntimeError: Cannot replicate if number of devices (1) is different from 8 #1703

Description

🐛 Bug

Stack trace

Code sample

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions