-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached attribute/class imported in import_string(). #14815
Conversation
Hello @piaoxue1949! Thank you for your contribution 💪 As it's your first contribution be sure to check out the patch review checklist. If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket! If you have any design or process questions then you can ask in the Django forum. Welcome aboard ⛵️! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piaoxue1949 Thanks for this patch 👍 Please create a new ticket in Trac.
django/utils/module_loading.py
Outdated
module = module_cache[module_path] | ||
else: | ||
module = import_module(module_path) | ||
module_cache[module_path] = module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be enough to use lru_cache
instead, e.g.:
@functools.lru_cache(maxsize=128)
def import_string(dotted_path):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is one Andrew suggested to me in django/asgiref#288 (which I need to get back to) which is small and elegant enough that it might be useful to Django-proper, possibly here in import_string
and more generally where imports happen inside functions/methods (to avoid circular refs) in hot loops, and it looks like:
def cached_import(module_name, item_name):
modules = sys.modules
if module_name not in modules:
importlib.import_module(module_name)
return getattr(sys.modules[module_name], item_name)
A benefit of it would be not having another separate cache (LRU) with a finite hit-rate, and it still avoids going through the import machinery for hits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kezabelle Looks nice 🐎 Is there a reason to have a local modules
variable and use it only in not in
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kezabelle This method is very good, but there are currently many Django-related modules that reference import_string during operation. If you add a cache_import, you must modify and adjust the module code that references import_string to improve performance.
But we can refer to this and modify the import_string function, replace module_cache with sys.modules.like this :
@felixxm Using lru_cache is a problem that has another separate cache (LRU) with a finite hit-rate (proposed by Kezabelle) .Especially if there are too many modules loaded, exceeding the number of lru, there may be a significant performance degradation problem, unless the number is set to unlimited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kezabelle This method is very good, but there are currently many Django-related modules that reference import_string during operation. If you add a cache_import, you must modify and adjust the module code that references import_string to improve performance.
You can add a new hook and use it in import_string()
, e.g.
def cached_import(module_name, item_name):
modules = sys.modules
if module_name not in modules:
import_module(module_name)
return getattr(sys.modules[module_name], item_name)
def import_string(dotted_path):
"""
Import a dotted module path and return the attribute/class designated by the
last name in the path. Raise ImportError if the import failed.
"""
try:
module_path, class_name = dotted_path.rsplit('.', 1)
except ValueError as err:
raise ImportError("%s doesn't look like a module path" % dotted_path) from err
try:
return cached_import(module_path, class_name)
except AttributeError as err:
raise ImportError('Module "%s" does not define a "%s" attribute/class' % (
module_path, class_name)
) from err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kezabelle Is there a reason to have a local
modules
variable and use it only innot in
check?
Err, only because I suck :) Notionally it was to avoid LOAD_GLOBAL
followed by LOAD_ATTR
twice (once for the key check, once for the getattr) to try and make it As Fast As Possible (in the context of the aforementioned PR) ... but that only has value if you then use it in both places 🤦 Guess I'll have to re-run my benchmarks again when fixing that PR. Good catch ;)
FWIW, I'm not tied to it being a separate function (vs just inlining the if x in sys.modules
) for the purposes of this, but as alluded to on that other thread, I can see there being hot paths in Django which do hit the import machinery every time, which might be a good reason to introduce such a function. However, I've not actually bench'd any of those places, so I can't say for certain that it's of any further conseqence. Mostly I just wanted to surface the peeking into sys.modules
as an option, whatever the implementation ends up being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#14830 This is the newly proposed optimized version of sys.modules, but why turn it off ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#14830 This is the newly proposed optimized version of sys.modules, but why turn it off ...
It's exactly the same as this PR. We don't need multiple PRs with the same proposition.
@piaoxue1949 Thanks 👍 Welcome aboard ⛵ I moved the optimization to a separate hook as proposed by Keryn, and opened #14850 because I couldn't push edits here. |
Cache loaded modules to solve the problem of module loading (lock) too slow when multiple threads are concurrent