Use --include=* or --anything instead of --copies 0 to speed up get_content_annexinfo#7230
Conversation
4603ccc to
710b692
Compare
|
the mac os failure is due to an outdated mac build image |
…t_annexinfo Apparently `--copies 0` could result in up to 10 (or more) penalty of running find or findref. --include=* was recommended by Joey in https://git-annex.branchable.com/todo/add_--all___40__or_alike__41___to_find_and_findref/ Closes datalad#7038
…support Since there is still no release interim the dates, I think such comparison - should be safe - would allow us to immediately take advantage of this OPT while testing datalad/git-annex against datalad master. Here are some timings of all 3 possible options ❯ pwd /home/yoh/datalad/dandi/dandisets/000026 ❯ time git annex find --copies 0 | wc -l 20575 git annex find --copies 0 12.67s user 1.17s system 120% cpu 11.513 total wc -l 0.01s user 0.10s system 0% cpu 11.513 total ❯ time git annex find --include='*' | wc -l 20575 git annex find --include='*' 1.18s user 0.14s system 134% cpu 0.984 total wc -l 0.01s user 0.02s system 3% cpu 0.984 total ❯ time git annex find --anything | wc -l 20575 git annex find --anything 0.71s user 0.18s system 157% cpu 0.567 total wc -l 0.02s user 0.01s system 5% cpu 0.566 total So --anything leads to almost twice faster performance than --include=*, so worth it.
710b692 to
a57f461
Compare
great -- thanks! I rebased. The main point is that "it works" so we should proceed with this PR. I also added now support for new I think we are ready, taking out of the draft -- your feedback is very welcome! I hope it makes cut for 0.18.0. |
Codecov ReportBase: 88.73% // Head: 88.73% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #7230 +/- ##
==========================================
- Coverage 88.73% 88.73% -0.01%
==========================================
Files 325 325
Lines 44124 44184 +60
Branches 5867 5880 +13
==========================================
+ Hits 39154 39205 +51
- Misses 4955 4964 +9
Partials 15 15
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
|
hm, crippled one freaked out: which has no "track" in our issue tracker besides it used to fail and was disabled on windows: #5126 . rerunning that job now |
bpoldrack
left a comment
There was a problem hiding this comment.
I'm a bit confused. The PR title and the diff don't quite fit - there's no --largerthan.
Changed approach or incomplete PR, @yarikoptic ?
bpoldrack
left a comment
There was a problem hiding this comment.
Other than that confusion, looks good to me!
So, if you feel no changelog required (which is fine with me), I'm ready to merge this.
|
Thank you @bpoldrack . Indeed changed approach and need changelog. |
|
Code Climate has analyzed commit ef96e8a and detected 0 issues on this pull request. View more on Code Climate. |
|
PR released in |
Apparently
--copies 0could result in up to 10 (or more) penalty of running find or findref.TODO
--anythingwhich was added in 10.20221212-17-g0b2dd374d on Dec 20 (so we can compare against10.20221220version) -- but yet to wait for datalad/git-annex get that build