Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cached attribute/class imported in import_string(). #14815

Closed
wants to merge 3 commits into from
Closed

Cached attribute/class imported in import_string(). #14815

wants to merge 3 commits into from

Conversation

piaoxue1949
Copy link
Contributor

Cache loaded modules to solve the problem of module loading (lock) too slow when multiple threads are concurrent

@github-actions
Copy link

Hello @piaoxue1949! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

Copy link
Contributor Author

@piaoxue1949 piaoxue1949 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1630405093(1)

@felixxm felixxm changed the title 缓存已加载的模块,解决多线程并发时加载模块(会取锁)过慢的问题 Cached attribute/class imported in import_string(). Sep 2, 2021
Copy link
Member

@felixxm felixxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piaoxue1949 Thanks for this patch 👍 Please create a new ticket in Trac.

module = module_cache[module_path]
else:
module = import_module(module_path)
module_cache[module_path] = module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be enough to use lru_cache instead, e.g.:

@functools.lru_cache(maxsize=128)
def import_string(dotted_path):
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is one Andrew suggested to me in django/asgiref#288 (which I need to get back to) which is small and elegant enough that it might be useful to Django-proper, possibly here in import_string and more generally where imports happen inside functions/methods (to avoid circular refs) in hot loops, and it looks like:

def cached_import(module_name, item_name):
    modules = sys.modules
    if module_name not in modules:
        importlib.import_module(module_name)
    return getattr(sys.modules[module_name], item_name)

A benefit of it would be not having another separate cache (LRU) with a finite hit-rate, and it still avoids going through the import machinery for hits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kezabelle Looks nice 🐎 Is there a reason to have a local modules variable and use it only in not in check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kezabelle This method is very good, but there are currently many Django-related modules that reference import_string during operation. If you add a cache_import, you must modify and adjust the module code that references import_string to improve performance.
1630577078(1)
But we can refer to this and modify the import_string function, replace module_cache with sys.modules.like this :
1630578829(1)

@felixxm Using lru_cache is a problem that has another separate cache (LRU) with a finite hit-rate (proposed by Kezabelle) .Especially if there are too many modules loaded, exceeding the number of lru, there may be a significant performance degradation problem, unless the number is set to unlimited.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kezabelle This method is very good, but there are currently many Django-related modules that reference import_string during operation. If you add a cache_import, you must modify and adjust the module code that references import_string to improve performance.

You can add a new hook and use it in import_string(), e.g.

def cached_import(module_name, item_name):
    modules = sys.modules
    if module_name not in modules:
        import_module(module_name)
    return getattr(sys.modules[module_name], item_name)


def import_string(dotted_path):
    """
    Import a dotted module path and return the attribute/class designated by the
    last name in the path. Raise ImportError if the import failed.
    """
    try:
        module_path, class_name = dotted_path.rsplit('.', 1)
    except ValueError as err:
        raise ImportError("%s doesn't look like a module path" % dotted_path) from err

    try:
        return cached_import(module_path, class_name)
    except AttributeError as err:
        raise ImportError('Module "%s" does not define a "%s" attribute/class' % (
            module_path, class_name)
        ) from err

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kezabelle Is there a reason to have a local modules variable and use it only in not in check?

Err, only because I suck :) Notionally it was to avoid LOAD_GLOBAL followed by LOAD_ATTR twice (once for the key check, once for the getattr) to try and make it As Fast As Possible (in the context of the aforementioned PR) ... but that only has value if you then use it in both places 🤦 Guess I'll have to re-run my benchmarks again when fixing that PR. Good catch ;)

FWIW, I'm not tied to it being a separate function (vs just inlining the if x in sys.modules) for the purposes of this, but as alluded to on that other thread, I can see there being hot paths in Django which do hit the import machinery every time, which might be a good reason to introduce such a function. However, I've not actually bench'd any of those places, so I can't say for certain that it's of any further conseqence. Mostly I just wanted to surface the peeking into sys.modules as an option, whatever the implementation ends up being.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14830 This is the newly proposed optimized version of sys.modules, but why turn it off ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14830 This is the newly proposed optimized version of sys.modules, but why turn it off ...

It's exactly the same as this PR. We don't need multiple PRs with the same proposition.

@felixxm
Copy link
Member

felixxm commented Sep 9, 2021

@piaoxue1949 Thanks 👍 Welcome aboard ⛵

I moved the optimization to a separate hook as proposed by Keryn, and opened #14850 because I couldn't push edits here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants