Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DataLoaders with missing arguments in replace_sampler #8519

Merged
merged 13 commits into from Jul 26, 2021

Conversation

carmocca
Copy link
Member

@carmocca carmocca commented Jul 22, 2021

What does this PR do?

Passing a DataLoader that did not define all the vars(DataLoader) arguments produced an error. For example:

    dataloader = type(dataloader)(**dl_args)
TypeError: __init__() got an unexpected keyword argument 'generator'

(dataloader.generator exists but generator is not in this custom DataLoader __init__ signature)

This is not ideal as we should not error in this case if the existing attribute name is the default one. In this case, passing it should be optional.

Additionally, the error message is not clear about the problem.

This PR fixes it by rewriting the replace_sampler function and improving our coverage for this.

Additional improvements:

  • Setting *args, **kwargs with different names is now allowed.

Fixes #8518

Does your PR introduce any breaking changes? If yes, please list them.

No

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n/a] Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)
  • Did you list all the breaking changes introduced by this pull request?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

@carmocca carmocca added the bug Something isn't working label Jul 22, 2021
@carmocca carmocca added this to the v1.4.x milestone Jul 22, 2021
@carmocca carmocca self-assigned this Jul 22, 2021
@codecov
Copy link

codecov bot commented Jul 22, 2021

Codecov Report

Merging #8519 (5122c74) into master (d3ed472) will increase coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #8519   +/-   ##
======================================
  Coverage      92%     92%           
======================================
  Files         218     218           
  Lines       14397   14385   -12     
======================================
- Hits        13286   13277    -9     
+ Misses       1111    1108    -3     

@carmocca carmocca added the data handling Generic data-related topic label Jul 22, 2021
@carmocca carmocca changed the title [WIP] Support DataLoaders with missing arguments in replace_sampler Support DataLoaders with missing arguments in replace_sampler Jul 22, 2021
@carmocca carmocca marked this pull request as ready for review July 22, 2021 15:24
@carmocca carmocca modified the milestones: v1.4.x, v1.4 Jul 22, 2021
@mergify mergify bot added the ready PRs ready to be merged label Jul 23, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but I wonder if we can't do something simpler for modifying the DataLoader.

@tchaton tchaton merged commit 6dbdf43 into master Jul 26, 2021
@tchaton tchaton deleted the bugfix/8518 branch July 26, 2021 08:04
@tchaton
Copy link
Contributor

tchaton commented Jul 26, 2021

Hey @carmocca @awaelchli @justusschock,

What do you think of this approach ?

Currently, the custom DataLoader assumes the arguments are set as attributes which isn't correct to assume for all custom dataloaders.
With the approach below, they are available within the context.

def test_context_manager_dataloader(tmpdir):

    from contextlib import contextmanager
    from torch.utils.data.dataloader import DataLoader

    def get_all_subclasses(cls):
        subclass_list = []

        def recurse(cl):
            for subclass in cl.__subclasses__():
                subclass_list.append(subclass)
                recurse(subclass)

        recurse(cls)
        return set(subclass_list)

    def _enable_class(cls):
        cls._old_init = cls.__init__
        cls.__init__ = inject_fn(cls.__init__)

    def _disable_class(cls):
        cls.__init__ = cls._old_init
        del cls._old_init

    @contextmanager
    def replace_sampler():
        subclasses = get_all_subclasses(DataLoader)
        for subclass in subclasses:
            _enable_class(subclass)
        yield
        for subclass in subclasses:
            _disable_class(subclass)

    class TestDataLoader(DataLoader):

        def __init__(self, num_features: int, *args, **kwargs):
            super().__init__(range(num_features), *args, **kwargs)

    def inject_fn(f):
        @functools.wraps(f)
        def wrapper(self: DataLoader, *args, **kwargs):
            f(self, *args, **kwargs)
            self.__class__.__init__ = self.__class__._old_init
            kwargs["batch_size"] = 2
            self.dataloader = self.__class__(*args, **kwargs)
        return wrapper

    with replace_sampler():
        dl = TestDataLoader(10)

    assert dl.batch_size == 1
    assert dl.dataloader.batch_size == 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data handling Generic data-related topic ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generator not an optional argument
4 participants