rework dataloader reset logic in Trainer #8435

awaelchli · 2021-07-15T21:33:23Z

🐛 Bug

The reset_{train,val,test}_dataloader in Trainer does not work as intended and leads to silent errors and side effects.

The problematic lines of code are here: https://github.com/PyTorchLightning/pytorch-lightning/blob/176df202e4e1e5f5101914929f0f3a3608c41f94/pytorch_lightning/trainer/data_loading.py#L447
where a None check prevents attaching the new dataloader.

Please reproduce using the BoringModel

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    train_data_0 = DataLoader(RandomDataset(32, 128), batch_size=2)
    train_data_1 = DataLoader(RandomDataset(32, 64), batch_size=2)

    model = BoringModel()
    trainer = Trainer(max_epochs=1, weights_summary=None)
    trainer.fit(model, train_dataloaders=train_data_0)
    assert trainer.train_dataloader.loaders is train_data_0
    # trainer.train_dataloader = None

    trainer.fit_loop.max_epochs = 2
    # here, fit() does not reset the dataloader, the old one is still attached
    trainer.fit(model, train_dataloaders=train_data_1)
    
    # this assertion fails
    assert trainer.train_dataloader.loaders is train_data_1


if __name__ == '__main__':
    run()

Expected behavior

Assertion does not fail. Second fit attaches correctly the dataloader.

Additional context

Reported here by user @sid-sundrani

Related to #6030

The text was updated successfully, but these errors were encountered:

awaelchli · 2021-07-16T08:59:00Z

Looks like the bug was introduced here #7207 while trying to fix something else.

A simple fix could be to detach all loaders from trainer (by setting to None) when fit() etc. ends.
cc @ananthsub

carmocca · 2022-03-01T14:30:30Z

Closing, working in master.

awaelchli added bug Something isn't working help wanted Open to be worked on labels Jul 15, 2021

This was referenced Jul 15, 2021

Use two separate dataloaders #3336

Closed

Trainer.validate(model, dataloader=dl) does not use passed dataloader #8369

Closed

awaelchli self-assigned this Jul 15, 2021

awaelchli added the priority: 0 High priority task label Jul 15, 2021

awaelchli added this to the v1.4.x milestone Jul 15, 2021

awaelchli mentioned this issue Jul 16, 2021

Clear dataloader references before attaching new dataloaders to Trainer #8442

Merged

11 tasks

awaelchli removed the priority: 0 High priority task label Jul 21, 2021

awaelchli added this to To be approved in Sprint Q3-6: 6 Sep - 17 Sep via automation Sep 9, 2021

tchaton added the let's do it! approved to implement label Sep 10, 2021

tchaton moved this from To be approved to To do in Sprint Q3-6: 6 Sep - 17 Sep Sep 10, 2021

tchaton added this to To do in Sprint Q3-7: 20 Sep - 1 Oct Sep 20, 2021

awaelchli modified the milestones: v1.4.x, 1.5.x Nov 3, 2021

carmocca closed this as completed Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rework dataloader reset logic in Trainer #8435

rework dataloader reset logic in Trainer #8435

awaelchli commented Jul 15, 2021 •

edited by carmocca

awaelchli commented Jul 16, 2021 •

edited

carmocca commented Mar 1, 2022

rework dataloader reset logic in Trainer #8435

rework dataloader reset logic in Trainer #8435

Comments

awaelchli commented Jul 15, 2021 • edited by carmocca

🐛 Bug

Please reproduce using the BoringModel

Expected behavior

Additional context

awaelchli commented Jul 16, 2021 • edited

carmocca commented Mar 1, 2022

awaelchli commented Jul 15, 2021 •

edited by carmocca

awaelchli commented Jul 16, 2021 •

edited