Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make XMP extractor faster (fixes gh-3328) #3329

Merged
merged 2 commits into from
Apr 14, 2019
Merged

Make XMP extractor faster (fixes gh-3328) #3329

merged 2 commits into from
Apr 14, 2019

Conversation

mih
Copy link
Member

@mih mih commented Apr 13, 2019

Massive performance boost by not testing EVERY file in a dataset for XMP metadata, but instead use a file name extension based pre-filter.

This halves(!) the time to aggregate metadata for openfmri (where pretty much all datasets have this extractor enabled.

Timing in datalad/datalad-revolution#84 (comment)

@mih mih added performance Improve performance of an existing feature TERRIFIC! labels Apr 13, 2019
@mih mih changed the title Enh xmp Make XMP extractor faster Apr 13, 2019
@mih mih changed the title Make XMP extractor faster Make XMP extractor faster (fixes gh-3328) Apr 13, 2019
@codecov
Copy link

codecov bot commented Apr 13, 2019

Codecov Report

Merging #3329 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3329      +/-   ##
==========================================
+ Coverage   91.14%   91.15%   +0.01%     
==========================================
  Files         263      263              
  Lines       34246    34250       +4     
==========================================
+ Hits        31212    31222      +10     
+ Misses       3034     3028       -6
Impacted Files Coverage Δ
datalad/metadata/extractors/xmp.py 92.59% <100%> (-1.41%) ⬇️
datalad/downloaders/http.py 86.5% <0%> (+2.77%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94aadb2...42d1389. Read the comment docs.

@mih mih merged commit bd1e7b6 into datalad:master Apr 14, 2019
@mih mih deleted the enh-xmp branch April 14, 2019 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improve performance of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant