Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataModule] PyTorch datasets as DataModules out of the box #2749

Closed
InCogNiTo124 opened this issue Jul 29, 2020 · 9 comments
Closed

[DataModule] PyTorch datasets as DataModules out of the box #2749

InCogNiTo124 opened this issue Jul 29, 2020 · 9 comments
Assignees
Labels
discussion In a discussion stage feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@InCogNiTo124
Copy link
Contributor

🚀 Feature

PyTorch already has datasets (MNIST, CIFAR, etc). It would be very convenient to provide those datasets out of the box as DataModules

Motivation

To reduce the boilerplate. I mean, if I had the possibility not to reimplement / copy-paste the same code again, I would rather not do that, and I'd use the already implemented solutions. The entire PyTorchLightning was built with this in mind, so this is only natural.

Pitch

To have the ability to write something along the lines of

import pytorch_lightning as pl
import pytorch_lightning.datasets as pld

# implementation of model and trainer instantiation
trainer.fit(model, pld.MNIST())

Alternatives

Alternatively, it could be implemented as a PyTorchLightning Bolt, instead of here.

Additional context

None

@InCogNiTo124 InCogNiTo124 added feature Is an improvement or enhancement help wanted Open to be worked on labels Jul 29, 2020
@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@Borda Borda added discussion In a discussion stage Important labels Jul 29, 2020
@Borda
Copy link
Member

Borda commented Jul 29, 2020

I think it is a good suggestion, @PyTorchLightning/core-contributors

@williamFalcon
Copy link
Contributor

yes. this is i think what we have already started in bolts!

want to add the missing torchvision datasets to it?

@InCogNiTo124
Copy link
Contributor Author

I'd like to, but I'm unsure if I have the time to do it in case this is very important. I could probably slowly do it over 2-3 weeks, if that's not an issue :)

@williamFalcon
Copy link
Contributor

no problem. Maybe create GH issues for each dataset?

and do one at a time?

@williamFalcon
Copy link
Contributor

(gh issues in bolts)
fyi @nateraw

@InCogNiTo124
Copy link
Contributor Author

Makes sense. I'll open a separate issue per dataset in Bolts.

Also, what do you thinkk about leaving this issue open until everything is implemented?

@edenlightning
Copy link
Contributor

Not sure what the value is in having duplicate tickets, but we can leave this open for now until new issues are opened in Bolts. Make sense?

@nateraw nateraw self-assigned this Jul 29, 2020
@nateraw nateraw changed the title PyTorch datasets as DataModules out of the box [DataModule] PyTorch datasets as DataModules out of the box Jul 29, 2020
@nateraw
Copy link
Contributor

nateraw commented Jul 29, 2020

@InCogNiTo124 lets move the discussion to bolts repo for now. We're building out all sorts of support for different datasets there.

The datasets you mentioned aren't fromtorch, to my understanding. They're from torchvision, which isn't included as a requirement here. If we want to support torchvision or sklearn datasets directly in lightning, we can have that in a future PR.

Thanks for the feedback on the new LightningDataModule - Looking forward to hearing your thoughts on the bolts datamodules we've built out 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion In a discussion stage feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants