Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate entrypoint processing on importlib.metadata #6591

Merged
merged 1 commit into from
Mar 29, 2022

Conversation

mih
Copy link
Member

@mih mih commented Mar 25, 2022

This approach is now an offiical part of the standard lib (since 3.10), and is available for earlier version in a provisional module and also a separate package.

This change introduces datalad.support.entrypoints with iter_entrypoints() that is now used as a uniform implementation.

As previously shown in #6584 this brings significant performance improvements for internal processing. Moreover, they translate to outward-facing speed gains. On a system with 10 extensions installed a datalad -h now completes in ~300ms, whereas current mainline after the CLI update needs ~650ms, and the latest release 0.15.6 needs ~1200ms -- so overall a 4x speed-up.

Importantly, the new iter_entrypoints() helper does not return complex entrypoint instances anymore, which previously contaminated DataLad code with the internals of pkg_resources. This is also the reason for one occurrence of a pkg_resources-based entrypoint iteration being left in the code (in search and its tests). It specifically relies on these
internals and even mocks them in the tests.

This code is likely to change entirely soon, and is also slow enough to not make the require effort to switch it over felt by users.

This change implies no necessary modifications of any extensions.

Changelog

馃挮 Enhancements and new features

馃彔 Internal

  • Entrypoint processing for extensions and metadata extractors has been consolidated on a uniform helper that is about twice as fast as the previous implementations. Fixes iter_entrypoints helpers聽#6585

This approach is now an offiical part of the standard lib (since 3.10),
and is available for earlier version in a provisional module and
also a separate package.

This change introduces `datalad.support.entrypoints` with
`iter_entrypoints()` that is now used as a uniform implementation.

As previously shown in datalad#6584
this brings significant performance improvements for internal
processing. Moreover, they translate to outward-facing speed gains.
On a system with 10 extensions installed a `datalad -h` now completes
in ~300ms, whereas current mainline after the CLI update needs ~650ms,
and the latest release 0.15.6 needs ~1200ms -- so overall a 4x speed-up.

Importantly, the new `iter_entrypoints()` helper does not return complex
entrypoint instances anymore, which previously contaminated DataLad code
with the internals of `pkg_resources`. This is also the reason for one
occurrence of a `pkg_resources`-based entrypoint iteration being left in
the code (in `search` and its tests). It specifically relies on these
internals and even mocks them in the tests.

This code is likely to change entirely soon, and is also slow enough to
not make the require effort to switch it over felt by users.

This change implies no necessary modifications of any extensions.

Fixes datalad#6584
Fixes datalad#6585
@mih mih added performance Improve performance of an existing feature semver-performance Changes only improve performance, no impact on version team-core core API/commands (https://github.com/datalad/datalad/issues/6365) labels Mar 25, 2022
@codeclimate
Copy link

codeclimate bot commented Mar 25, 2022

Code Climate has analyzed commit 5f015ec and detected 4 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 2
Security 2

View more on Code Climate.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wholeheartedly support this PR: faster, less code duplication etc. I first time mentioned that deprecated being failing, but it seems to not be the first time if I look at https://github.com/datalad/datalad/actions/workflows/test_extensions.yml -- started to fail some time within past 2 days -- run 2 days ago was green and then run on this (and then other PRs) PR 8 hours ago was red :-/ And failure doesn't look related to this PR changes.

I am not 100% certain though on what @mih you wanted to say by

This code is likely to change entirely soon, and is also slow enough to not make the require effort to switch it over felt by users.

Also it seems that there is no test coverage at all for the load=True case which is odd - we do not send out coverage from extensions testing runs or how else could we make it cover that (crucial piece of) code?

# 3.10 is when it was no longer provisional
from importlib_metadata import entry_points
else:
from importlib.metadata import entry_points
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is not covered, which is odd since we do have 3.10 run in travis

yield ep.name, ep.module, ep.load
continue

try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no test/tests matrix run which triggers this with load=True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how much these coverage reports can be trusted. This code is exercises by both the construction of the Python API (datalad.api) and the CLI.

The latter is chronically undertested -- something that this PR won't fix. I started using a seemingly attractive pattern for such tests in #6555 for testing the proposed credentials command.

@yarikoptic
Copy link
Member

ok, still a bit concerned about coverage, but let's proceed -- will be tested in real life ;) Thanks @mih !

@yarikoptic yarikoptic merged commit 8bf53dd into datalad:master Mar 29, 2022
@mih mih deleted the bf-6584 branch April 15, 2022 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improve performance of an existing feature semver-performance Changes only improve performance, no impact on version team-core core API/commands (https://github.com/datalad/datalad/issues/6365)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

iter_entrypoints helpers Entrypoint discovery comparison: switch to importlib!!
2 participants