Several datamodules ignoring batch_size #334

hecoding · 2020-11-04T15:17:29Z

🐛 Bug

Realized MNISTDataModule was ignoring batch_size parameter. I found a closed issue (#171) referring to that without a fix.

While fixing it myself (PR #331), I found more datamodules had this problem too - MNISTDataModule, BinaryMNISTDataModule, FashionMNISTDataModule, SklearnDataModule, SSLImagenetDataModule.

I could take care of that. My question is, is this signature from MNIST in use anymore?:

def train_dataloader(self, batch_size=64, transforms=None):

That is basically where the bug comes from. Other datamodules working fine, like CIFAR10DataModule, are simply:

def train_dataloader(self):

Same thing goes for val_dataloader and test_dataloader.

To Reproduce

Steps to reproduce the behavior:

Set up MNISTDataModule to use a batch size anything else than 32.
Run an experiment
Check the batch sizes are 32 regardless

Code sample

from pytorch_lightning.core.lightning import LightningModule
from pytorch_lightning import Trainer
from pl_bolts.datamodules.mnist_datamodule import MNISTDataModule

batch_size = 64


class SampleModel(LightningModule):
    def configure_optimizers(self):
        pass

    def training_step(self, batch, batch_idx):
        x, y = batch
        assert x.shape[0] == batch_size
        print('fine')


model = SampleModel()
dm = MNISTDataModule(data_dir='~/Datasets/', batch_size=batch_size)

trainer = Trainer()
trainer.fit(model, dm)

Expected behavior

Batch sizes of 64.
Printing fine instead of assert error.

Environment

PyTorch Version (e.g., 1.0): 1.7
OS (e.g., Linux): Ubuntu 18.04
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source):
Python version: 3.8
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information: pytorch-lightning-bolts 0.2.5

Additional context

The text was updated successfully, but these errors were encountered:

github-actions · 2020-11-04T15:18:10Z

Hi! thanks for your contribution!, great first issue!

ananyahjha93 · 2020-11-04T18:31:32Z

@hecoding send in the PR.

the signature is def train_dataloader(self):. The datamodule class init should take the batch size params and not the train_dataloader.

hecoding · 2020-11-05T14:59:24Z

Cool, I fixed the signatures too. Please @ananyahjha93 have a look at the PR here #331
Should I assign somebody for review myself? I'm a bit lost on that.

hecoding · 2020-11-05T15:02:22Z

I can fix the rest of the data modules listed too. Let me know.

hecoding added the help wanted Extra attention is needed label Nov 4, 2020

ananyahjha93 assigned hecoding Nov 4, 2020

hecoding mentioned this issue Nov 5, 2020

bugfix: batch_size for MNISTDataModule #331

Merged

8 tasks

Borda closed this as completed in #331 Nov 6, 2020

hecoding mentioned this issue Nov 6, 2020

bugfix: batch_size parameter for DataModules remaining #344

Merged

8 tasks

akihironitta mentioned this issue Dec 1, 2020

All data modules don't support providing batch size in the constructor. #417

Closed

Borda added this to the v0.3 milestone Jan 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several datamodules ignoring batch_size #334

Several datamodules ignoring batch_size #334

hecoding commented Nov 4, 2020

github-actions bot commented Nov 4, 2020

ananyahjha93 commented Nov 4, 2020

hecoding commented Nov 5, 2020

hecoding commented Nov 5, 2020

Several datamodules ignoring batch_size #334

Several datamodules ignoring batch_size #334

Comments

hecoding commented Nov 4, 2020

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Additional context

github-actions bot commented Nov 4, 2020

ananyahjha93 commented Nov 4, 2020

hecoding commented Nov 5, 2020

hecoding commented Nov 5, 2020