Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazyload submodules python37+ #1007

Merged
merged 1 commit into from Jul 16, 2021
Merged

Conversation

orsonadams
Copy link
Contributor

@orsonadams orsonadams commented Feb 22, 2020

Summary of changes

Allow submodule lazy loading for python version3.7+
Addresses #771 using PEP: 562

Pull Request Checklist

  • Changes have tests
    Need help with testing
  • Authors have been added to AUTHORS.md
  • News fragment added in changelog.d. See CONTRIBUTING.md for details

This is fairly straightforward given the introduction of __getattr__ for modules. The challenge is testing the import functionality for two subsets of versions: >= 3.7 & < 3.7.

I added what I thought would be reasonable tests if I used tox. Trouble is import dateutil loads the submodules regardless of versions - ignoring __init__.py , Which when I think about it maybe is expected behavior.

How to test this bad boy?

@jbrockmendel
Copy link
Contributor

@ParseThis im guessing this speeds up import time if we only import e.g. dateutil.parser? can you ballpark the size of the improvement?

@pganssle
Copy link
Member

@jbrockmendel That's already how the library is designed:

>>> import dateutil
>>> dateutil.parser
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-2740e4456f59> in <module>
----> 1 dateutil.parser

AttributeError: module 'dateutil' has no attribute 'parser

The idea here is that in Python 3.7, this will work:

import dateutil
dateutil.parser.parse  # This imports `dateutil.parser` the first time you load it

But import dateutil by itself won't actually get any slower.

@orsonadams
Copy link
Contributor Author

orsonadams commented Feb 22, 2020

@jachen20 Yeah, its more about expectations. For instance, I would like the below to work.

import dateutil
dateutil.tz 

Instead of throwing an AttributeError as it would today. But accomplish this, while still not loading the other submodules.

@jbrockmendel
Copy link
Contributor

got it, thanks

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import-based tests are some of the trickiest to come up with because it's easy to modify global state by accident and then have your tests rely on the order they were executed in.

I don't know of any way to tell pytest that it should run each of these tests in their own separate process with their own separate sys.modules variable.

I think for the moment we should abandon the concept of thread safety and use a pytest fixture to ensure that each of these import tests gets its own copy of sys.modules. I have not tried this, but here's what I'm thinking:

  @pytest.fixture(scope="function")
def clean_imports():
    """
    Fixture for providing a clean-ish environment for testing import behavior.

    No effort has been made to make this thread safe, as it directly modifies
    the sys.modules dictionary.
    """
    # Stash all the existing dateutil modules for later restoration
    du_modules = {mod_name: mod for mod_name, mod in sys.modules.items()
                  if mod_name.startswith("dateutil")}

    # Keep a list of what was in sys.modules before the test so we can delete
    # stuff outside of dateutil that was imported indirectly
    other_modules = {mod_name for mod_name in sys.modules
                     if mod_name not in du_modules}

    for mod_name in du_modules:
        del sys.modules[mod_name]

    yield

    # Delete anything that wasn't in the original list
    for mod_name in list(sys.modules):
        if mod_name not in other_modules:
            del sys.modules[mod_name]

    # Now restore the original dateutil modules we stashed
    for mod_name, mod in du_modules.items():
        sys.modules[mod_name] = mod

Then your test_lazy_import (and, in a later PR maybe, all the other import tests) just needs to take the clean_imports fixture and it should Just Work.


One other thing: we can do it as a separate PR if you want, but it is usually good to implement both __getattr__ and __dir__, so that dir(dateutil) reflects all the available modules.

I know this is a lot of comments for one fairly simple PR, thanks for doing this!

dateutil/__init__.py Outdated Show resolved Hide resolved
dateutil/__init__.py Outdated Show resolved Hide resolved
dateutil/test/test_imports.py Outdated Show resolved Hide resolved
dateutil/test/test_imports.py Outdated Show resolved Hide resolved
dateutil/test/test_imports.py Outdated Show resolved Hide resolved
dateutil/test/test_imports.py Outdated Show resolved Hide resolved
dateutil/test/test_imports.py Outdated Show resolved Hide resolved
dateutil/__init__.py Outdated Show resolved Hide resolved
@orsonadams
Copy link
Contributor Author

orsonadams commented Feb 25, 2020

Interesting, running tests for py2.7, this happens.

import dateutil, importlib
RuntimeWarning: Parent module 'dateutil.test' not found while handling absolute import
dateutil/test/test_imports.py:37: RuntimeWarning

Which I get, we're removing dateutil.test from submodules.bnBut why not throw this for Python 3.6, 3.7 tests? Although I would argue that since we're not testing the import on test that it shouldn't be a module we stash in the fixture.

This is fixed by checking to make sure the test submodule in deleted from sys.modules in the clean_import fixture.

du_modules = {mod_name: mod for mod_name, mod in sys.modules.items()               
                        if mod_name.startswith('dateutil')                                
                        and not 'dateutil.test' in mod_name} 

Instead of,

du_modules = {mod_name: mod for mod_name, mod in sys.modules.items()               
                        if mod_name.startswith('dateutil') }

@pganssle
Copy link
Member

@ParseThis In this case I think it's fair to call pytest.xfail in the fixture itself.

At some point soon-ish I will restructure the repository to put the tests in their own directory (not a subdirectory of dateutil), and that will solve the problem, and at the moment the only thing the fixture is being used for is a test that was going to xfail on Python 2.7 anyway, so NBD.

@pganssle
Copy link
Member

Oh wait, I just realized, this is a RuntimeWarning that's causing it to fail. These tests are already xfailing anyway.

I think you can just suppress the warning, either by using pytest.warns and asserting that the number of warnings is 1 for Python 2.7 and 0 for Python 3 or by conditionally defining a decorator like pytest.mark.filterwarnings, see: https://docs.pytest.org/en/latest/warnings.html

In the second case you'd do something like:

# Some comment about why this is OK
if six.PY2:
    filter_import_warning = pytest.mark.filterwarnings("RuntimeWarning")
else:
    def filter_import_warning(f):
        return f

...

@filter_import_warning
@pytest.mark.parametrize(...)
...

dateutil/__init__.py Outdated Show resolved Hide resolved
@pganssle pganssle added this to the 2.9.0 milestone Apr 24, 2020
@pganssle
Copy link
Member

@ParseThis I see this is still marked as a draft - do you know the current status here?

@orsonadams
Copy link
Contributor Author

orsonadams commented Apr 28, 2020

@pganssle ready to move to this to a proper PR. WIll have a look at the handling dir in a couple of days. I have an intermediate solution, would like it reviewed.

@orsonadams orsonadams marked this pull request as ready for review April 28, 2020 00:37
@orsonadams
Copy link
Contributor Author

@pganssle Curious about those windows-latest failures; seem like some cancellations on the coverage side reporting.

@pganssle pganssle force-pushed the lazyload-py37 branch 2 times, most recently from 7853ec5 to e532459 Compare May 24, 2021 14:26
This uses PEP 562 to implement lazy loading of submodules in dateutil
(dateutilGH-771).
@mariocj89 mariocj89 self-assigned this Jul 16, 2021
@mariocj89 mariocj89 changed the title WIP: should lazyload submodules python37+ Lazyload submodules python37+ Jul 16, 2021
@mariocj89 mariocj89 merged commit 28da62d into dateutil:master Jul 16, 2021
@mariocj89
Copy link
Member

Thanks a lot @orsonadams !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants