Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make XMP extractor faster (fixes gh-3328) #3329

Merged
merged 2 commits into from Apr 14, 2019
Merged

Make XMP extractor faster (fixes gh-3328) #3329

merged 2 commits into from Apr 14, 2019

Conversation

@mih
Copy link
Member

@mih mih commented Apr 13, 2019

Massive performance boost by not testing EVERY file in a dataset for XMP metadata, but instead use a file name extension based pre-filter.

This halves(!) the time to aggregate metadata for openfmri (where pretty much all datasets have this extractor enabled.

Timing in datalad/datalad-revolution#84 (comment)

@mih mih changed the title Enh xmp Make XMP extractor faster Apr 13, 2019
@mih mih changed the title Make XMP extractor faster Make XMP extractor faster (fixes gh-3328) Apr 13, 2019
@codecov
Copy link

@codecov codecov bot commented Apr 13, 2019

Codecov Report

Merging #3329 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3329      +/-   ##
==========================================
+ Coverage   91.14%   91.15%   +0.01%     
==========================================
  Files         263      263              
  Lines       34246    34250       +4     
==========================================
+ Hits        31212    31222      +10     
+ Misses       3034     3028       -6
Impacted Files Coverage Δ
datalad/metadata/extractors/xmp.py 92.59% <100%> (-1.41%) ⬇️
datalad/downloaders/http.py 86.5% <0%> (+2.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94aadb2...42d1389. Read the comment docs.

@mih mih merged commit bd1e7b6 into datalad:master Apr 14, 2019
5 checks passed
@mih mih deleted the enh-xmp branch Apr 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

1 participant