make DataModule compatible with Python dataclass #8272

awaelchli · 2021-07-04T10:21:28Z

🚀 Feature

Support the following:

@dataclass
class MyDataModule(LightningDataModule):
    pass

Motivation

To reduce boilerplate code is at the core of philosophy in Lightning. It should be compatible with dataclasses.

Code sample

Here is an example. It currently does not work as we have some internal attributes that don't play well with the dataclass.

import os
from dataclasses import dataclass

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, Trainer, LightningDataModule


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


@dataclass
class BoringDataModule(LightningDataModule):

    batch_size: int = 2

    def train_dataloader(self):
        return DataLoader(RandomDataset(32, 64), batch_size=self.batch_size)


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def run():
    model = BoringModel()
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=1,
        limit_val_batches=1,
        num_sanity_val_steps=0,
        max_epochs=1,
        weights_summary=None,
    )
    trainer.fit(model, datamodule=BoringDataModule())


if __name__ == '__main__':
    run()

Alternatives

#3792 introduces save_hyperparameters() for the datamodule. However, I believe the dataclass approach here is not in conflict with that because both could be useful at the same time.

The text was updated successfully, but these errors were encountered:

QueshAnmak · 2021-07-04T18:43:03Z

Hi, I would like to work on this issue, could you please assign it to me?

awaelchli · 2021-07-04T19:08:11Z

Hey! Yes, you can take it if you want. I haven't really had the time to look why it is not working. I don't know how difficult it would be.
Give it a try. We appreciate the help!

QueshAnmak · 2021-07-04T19:37:03Z

Hey, Is there any slack or discord server I could join to ask my queries?

awaelchli · 2021-07-04T19:38:23Z

Yes, feel free to join the PyTorch Lightning slack: https://join.slack.com/t/pytorch-lightning/shared_invite/zt-pw5v393p-qRaDgEk24~EjiZNBpSQFgQ

QueshAnmak · 2021-07-09T21:27:54Z

I set init to False (which by default is set to True). This seems to fix the issue. I think the "@Property" decorator on the attributes might be the cause of the issue.

How should I proceed?

awaelchli · 2021-07-14T23:21:33Z

Oh yes, but actually we want dataclass to generate an init for us. Otherwise we don't get any great value of a dataclass here.
But actually, all we have to do is this:

    def __post_init__(self):
        super().__init__()

Because the init generated by dataclass does not call super(), and that's required by the datamodule.

awaelchli added feature Is an improvement or enhancement help wanted Open to be worked on good first issue Good for newcomers labels Jul 4, 2021

awaelchli added this to the v1.5 milestone Jul 4, 2021

awaelchli added the data handling Generic data-related topic label Jul 4, 2021

awaelchli assigned QueshAnmak Jul 4, 2021

himanshu-dutta mentioned this issue Aug 22, 2021

DataModule compatiblity with Python dataclass #9039

Merged

12 tasks

Borda closed this as completed in #9039 Sep 2, 2021

dinhanhx mentioned this issue Mar 29, 2022

LightningModule should support Dataclass #12506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make DataModule compatible with Python dataclass #8272

make DataModule compatible with Python dataclass #8272

awaelchli commented Jul 4, 2021 •

edited

QueshAnmak commented Jul 4, 2021

awaelchli commented Jul 4, 2021

QueshAnmak commented Jul 4, 2021

awaelchli commented Jul 4, 2021

QueshAnmak commented Jul 9, 2021

awaelchli commented Jul 14, 2021

make DataModule compatible with Python dataclass #8272

make DataModule compatible with Python dataclass #8272

Comments

awaelchli commented Jul 4, 2021 • edited

🚀 Feature

Motivation

Code sample

Alternatives

QueshAnmak commented Jul 4, 2021

awaelchli commented Jul 4, 2021

QueshAnmak commented Jul 4, 2021

awaelchli commented Jul 4, 2021

QueshAnmak commented Jul 9, 2021

awaelchli commented Jul 14, 2021

awaelchli commented Jul 4, 2021 •

edited