Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset handling and refactoring #126

Closed
kadri-nizam opened this issue Jan 31, 2023 · 3 comments
Closed

Dataset handling and refactoring #126

kadri-nizam opened this issue Jan 31, 2023 · 3 comments
Assignees

Comments

@kadri-nizam
Copy link
Contributor

kadri-nizam commented Jan 31, 2023

Is your feature request related to a problem or opportunity? Please describe.

  • GriddedDataset requires a device parameter input which can be cumbersome.
  • KFoldCrossValidator might benefit from inheriting from PyTorch data set where it will be easier to segment the boolean indices as different slices of a PyTorch dataset
  • For multi-execution block/multi-GPU workflows, we might want to explore using DataLoader

Describe the solution you'd like
To stay idiomatic with PyTorch, we can implement a .to(device) method that encapsulates the process of transferring the required tensors to the correct device.

IC: Let's focus this issue specifically on the (potential) redesign of dataset-focused classes, like GriddedDataset or UVDataset and how inference or cross-validation loops will interact with them. Issue #154 will track progress related to the Gridder and methods to create a GriddedDataset in the first place.

@iancze
Copy link
Collaborator

iancze commented Feb 16, 2023

#18 and #31 are relevant in the design for possible solutions that will also address spectral line datasets or image cubes with large numbers of channels. Closed issues #17 and #32 may provide additional things to think about in this context.

@iancze
Copy link
Collaborator

iancze commented Feb 18, 2023

We have broken out the issues @kadri-nizam raised in the original post into more bite-sized issues.

Notes about splitting the imaging and averaging functions of the Gridder are now in #154, and have been implemented by #156, closing that issue.

Topics related to UVDataset are in #162, topics related to GriddedDataset are in #163. It probably makes more sense to try #162 first, since that will help us get a better grasp on the Pytorch dataset idioms.

@iancze
Copy link
Collaborator

iancze commented Apr 4, 2023

Now that #163 and #157 and #154 are implemented, this issue is sufficiently diffuse that I don't think it warrants being open anymore. Issues related to UVDataset are now in #162 whereas issues related to Cross Validation are partially covered in places like #166 , #182 , #133 and #135 (and could probably be redeveloped/condensed).

@iancze iancze closed this as completed Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants