Skip to content

_check_eval_shuffling issues warning when sampler is a NumPy array #12648

@LucaButera

Description

@LucaButera

🐛 Bug

The function pytorch_lightning.trainer.data_loading._check_eval_shuffling issues a warning if the sampler of a Map-style Dataset is defined and is not a SequentialSampler.
The warning however, says that the issue is that the val_dataset has shuffle=True, which is not necessarily true.
For instance, if you use a simple NumPy array as sampler, the warning is issued but the data loader is completely deterministic.

To Reproduce

import os

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer


class RandomDataset(Dataset):
    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2, sampler=list(range(64)))
    test_data = DataLoader(RandomDataset(32, 64), batch_size=2)

    model = BoringModel()
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=1,
        limit_val_batches=1,
        limit_test_batches=1,
        num_sanity_val_steps=0,
        max_epochs=1,
        enable_model_summary=False,
    )
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
    trainer.test(model, dataloaders=test_data)


if __name__ == "__main__":
    run()

Expected behavior

I see three possible solutions:

  • The warning is not issued if the sampler is an Iterable whose order is fixed.
  • The warning is clearer in stating what is triggering it.
  • The warning specifies it can be ignored if you are using a list/array as sampler.

Environment

  • PyTorch Lightning Version: 1.5.9
  • PyTorch Version: 1.10.2
  • Python version: 3.8.3
  • OS: MacOS
  • How you installed PyTorch: poetry

cc @justusschock @awaelchli @ninginthecloud @rohitgr7 @carmocca @Borda @ananthsub @jjenniferdai

Metadata

Metadata

Assignees

Labels

data handlingGeneric data-related topic

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions