Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache entrypoints in group #3622

Merged
merged 2 commits into from
Dec 10, 2019

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Dec 9, 2019

Entry point loading is already cached at the reentry level but getting
all entry points within a group can still take significant amount of
time.
This commit introduces a simple cache at the AiiDA level. An alternative
would be to add a cache at the reentry level.

To provide some context: The timings on a query for 300 Dict nodes are
as follows:

  • No cache: ~110 ms
  • Cache: 67ms
  • Cache at load_node_class level: 58ms

@muhrin I would say the savings are significant and probably worth reaping. Of course, one could also try to do this at the reentry side...
Let me know. If you think we keep the cache here, I'll add the proper docstrings etc.

@@ -50,6 +50,28 @@ class EntryPointFormat(enum.Enum):
MINIMAL = 3


class EntryPointCache():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be equivalent to adding the @functools.lru_cache(maxsize=None) decorator to the get_entry_points function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @greschd , yes this would be a better way to do it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that seems to be much more elegant

@ltalirz ltalirz force-pushed the cache-entrypoints-in-group branch 2 times, most recently from 60c9114 to 3f0a43b Compare December 9, 2019 20:48
Entry point loading is already cached at the `reentry` level but getting
all entry points within a group can still take significant amount of
time.
This commit introduces a simple cache at the AiiDA level. An alternative
would be to add a cache at the reentry level.

To provide some context: The timings on a query for 300 Dict nodes are
as follows:

 * No cache: ~110 ms
 * Cache: 67ms
 * Cache at `load_node_class` level: 58ms
@ltalirz
Copy link
Member Author

ltalirz commented Dec 9, 2019

The lru_cache seems to be a bit slower for some reason

In [4]: %timeit qb2.all()  # with project='*'
10 loops, best of 5: 82 ms per loop

There is a C implementation of the cache: https://pypi.org/project/fastcache/
Anyhow, I think it doesn't really matter that much here - we have some speedup from the lru_cache which adds two lines of code. I'd be fine with that.

@muhrin
Copy link
Contributor

muhrin commented Dec 9, 2019

So you want to go with the lru_cache? If not there are is a small change I would make to the PR as it stands

Copy link
Contributor

@muhrin muhrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@ltalirz ltalirz merged commit fcfbf3a into aiidateam:develop Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants