Torchvision model state inconsistency in multi-gpu mode (distributed training)

## 🐛 Bug

I noticed that, when trying to train on multiple GPUs using Lightning 0.10.0, the outputs of the torchvision model `fasterrcnn_resnet50_fpn` were [those described here](https://pytorch.org/docs/stable/torchvision/models.html#torchvision.models.detection.fasterrcnn_resnet50_fpn) for inference (eval mode). However, when using a single GPU on lightning 0.10, or multi-gpu on lightning 0.9, this behaviour stopped, and the model was in training mode as expected, with outputs of a forward pass in my training loop reflecting those documented for this mode. 

A manual additional call to `.train()` set things right, but I then started seeing errors in my eval process, which leads me to believe that something is interfering with model state under circumstances which haven't changed except for the number of GPUs in use, like eval mode is being activated by default.
 
### To Reproduce
- Run a forward pass for the fasterrcnn_resnet50_fpn torchvision model (I'm not sure whether this extends to other torchvision or wider PyTorch models), using a single GPU, with lightning 0.10
- Run a forward pass for the fasterrcnn_resnet50_fpn torchvision model, selecting multiple GPUs with lightning 0.10
- Notice that model state appears to change between these circumstances, and in the case of this model, this causes the outputs of a forward pass to change in accordance with the torchvision documentation (linked above).

### Expected behavior

E.g. for fasterrcnn_resnet50_fpn:

Expected output from forward pass at train time (taken from use of single GPU, lightning 0.10.0):

```
{
    'loss_classifier': tensor(0.5534, device='cuda:0', grad_fn=<NllLossBackward>),
    'loss_box_reg': tensor(0.1873, device='cuda:0', grad_fn=<DivBackward0>),
    'loss_objectness': tensor(0.1856, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>),
    'loss_rpn_box_reg': tensor(0.0244, device='cuda:0', grad_fn=<DivBackward0>)
}
```

Actual output from forward pass at train time (taken from use of multi-GPU, lightning 0.10.0):


```
{
    'boxes': tensor([[176.8907, 198.9963, 252.4170, 248.9736],
    [392.0787, 185.2145, 426.4388, 229.0993],
    [178.8200, 207.2800, 213.0467, 251.0375],
    [421.2797, 180.9008, 523.6691, 237.5523],
    [...]], device='cuda:1'),
    'labels': tensor([1, 1, 1, ...], device='cuda:1'),
    'scores': tensor([0.7446, 0.7164, 0.6861, 0.6852, 0.6635, 0.6364, 0.6237, 0.6221, 0.6205,
    0.6201, 0.6193, 0.6137, 0.6082, 0.6064, 0.6020, 0.6018, 0.6008, 0.5992,
    0.5986, 0.5983, 0.5903, 0.5840, 0.5731, 0.5713, 0.5696, 0.5636, 0.5627,
    ...], device='cuda:1')
}
```

These are the intended outputs from inference (eval mode) according to the documentation.

And when printing `model.training` I got the following: 

For Lightning 0.10.0, single GPU:

`Is the model in training mode?: True`

For Lightning 0.10.0, multi-GPU:

`Is the model in training mode?: False`

🤔

### Environment

* CUDA:
	- GPU:
		- Tesla V100-PCIE-16GB
		- Tesla V100-PCIE-16GB
	- available:         True
	- version:           10.2
* Packages:
	- numpy:             1.19.2
	- pyTorch_debug:     False
	- pyTorch_version:   1.6.0
	- pytorch-lightning: 0.10.0
	- tqdm:              4.50.2
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- ELF
	- processor:         x86_64
	- python:            3.8.5
	- version:           #32~18.04.1-Ubuntu SMP Tue Oct 6 10:03:22 UTC 2020


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torchvision model state inconsistency in multi-gpu mode (distributed training) #4312

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Torchvision model state inconsistency in multi-gpu mode (distributed training) #4312

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions