Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Enable filter.annex.process on windows to speed things up #6245

Merged
merged 1 commit into from
Nov 25, 2021

Conversation

mih
Copy link
Member

@mih mih commented Nov 25, 2021

See https://git-annex.branchable.com/bugs/Windows__58___substantial_per-file_cost_for___96__add__96__/ and #5994 for the saga.

Let's see what effect it would have.

Pre:

image

Post:

image

Pretty interesting. A substantial improvement (16% overall), but not evenly across jobs (between 5% and 32%)... No idea what is happening here.

The job that saw the massive speed-up is running:

local/
test_add_archive_content.py
test_add_readme.py
test_addurls.py
test_check_dates.py
test_clean.py
test_copy_file.py
test_download_url.py
test_export_archive.py
test_no_annex.py
test_remove.py
test_rerun_merges.py
test_rerun.py
test_run_procedure.py
test_subdataset.py
test_unlock.py
test_wtf.py

support/
test_annexrepo.py
test_ansi_colors.py
test_cache.py
test_captured_exception.py
test_cookies.py
test_digests.py
test_due_utils.py
test_external_versions.py
test_fileinfo.py
test_gitrepo.py
test_globbedpaths.py
test_json_py.py
test_locking.py
test_network.py
test_parallel.py
test_path.py
test_repodates.py
test_repo_save.py
test_sshconnector.py
test_sshrun.py
test_stats.py
test_status.py
test_vcr_.py

ui/
test_base.py
test_dialog.py

@mih mih added the semver-tests Changes only affect tests, no impact on version label Nov 25, 2021
@mih
Copy link
Member Author

mih commented Nov 25, 2021

Hmm, initially no effect. It seems running git config --global in the appveyor setup has no effect at all. I guess, this is because we internally replace the config effective for the tests.

I will replace with --system for now, but if that works, it would make sense to customize the actual test config in setup_package()

@mih mih force-pushed the tst-filterprocess branch 2 times, most recently from 7020333 to 624a8c9 Compare November 25, 2021 08:24
@codecov
Copy link

codecov bot commented Nov 25, 2021

Codecov Report

Merging #6245 (f6f0f83) into master (2aa0e24) will decrease coverage by 54.25%.
The diff coverage is n/a.

❗ Current head f6f0f83 differs from pull request most recent head 164e2f5. Consider uploading reports for the commit 164e2f5 to get more accurate results
Impacted file tree graph

@@             Coverage Diff             @@
##           master    #6245       +/-   ##
===========================================
- Coverage   89.62%   35.36%   -54.26%     
===========================================
  Files         323      323               
  Lines       41979    41961       -18     
===========================================
- Hits        37624    14841    -22783     
- Misses       4355    27120    +22765     
Impacted Files Coverage Δ
datalad/version.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/wtf.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/addurls.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/tests/test_api.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/tests/test_cmd.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/interface/clean.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/interface/rerun.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/__init__.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/plugin/no_annex.py 0.00% <0.00%> (-100.00%) ⬇️
datalad/support/digests.py 0.00% <0.00%> (-100.00%) ⬇️
... and 280 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aa0e24...164e2f5. Read the comment docs.

@mih
Copy link
Member Author

mih commented Nov 25, 2021

I am taking this out of draft-mode. It is clearly beneficial. Also it is unclear how exactly it is manifesting itself.

@mih mih marked this pull request as ready for review November 25, 2021 10:36
@mih
Copy link
Member Author

mih commented Nov 25, 2021

I will rerun the appveyor jobs the get an idea of stability

@mih
Copy link
Member Author

mih commented Nov 25, 2021

Runtimes replicate

image

I double-checked that the comparison is to master as of yesterday. So there is really something to it.

I will now change the setup across all platforms.

@mih mih force-pushed the tst-filterprocess branch 2 times, most recently from 6dd2fe0 to 5b13493 Compare November 25, 2021 12:07
@mih
Copy link
Member Author

mih commented Nov 25, 2021

Ubuntu's git-annex is too old for this (8.20210903), but the Mac tests remain more or less, where they were (<5%). So setting this variable is not doing global magic to our tests, but specifically boosts the windows/adjusted mode performance. I will now revert the change back to appveyor-win only.

image

@mih mih merged commit a9f423a into datalad:master Nov 25, 2021
@mih mih deleted the tst-filterprocess branch November 25, 2021 20:09
@adswa
Copy link
Member

adswa commented Nov 26, 2021

The huge speed up in WinP39a2 is impressive, and just for the record, I can replicate it. On my Windows computer I see substantial speed ups with the config compared to without the config in support and local

Command With config Without config Speed-up by a factor of
support 829 / 834 sec 2500 / 1890 sec 2.2-3
local 931 / 1115/ 804 sec 1999 / 1740 sec 1.7-2.2
ui 2.6 / 2.6 sec 2.6 / 2.5 sec -

@mih
Copy link
Member Author

mih commented Nov 26, 2021

That is impressive. Also interesting is the range from 75min down to 30 min. Much more impact than on appveyor, it seems the per-process cost is much more expensive on that machine than on appveyor, but the overall performance is higher!

@adswa
Copy link
Member

adswa commented Nov 26, 2021

Another anecdote: I ran the complete test suite under windows in 106 minutes on this machine just now. It would not be an exaggeration to say that this easily took a full night on the previous system and without the new configuration

@yarikoptic
Copy link
Member

great to see the speedup. My only worry is that since such setting is not default on windows machines in the wild, it might result in us testing in a scenarios even more different than in the wild. I guess we could at least keep not tuned setup within datalad/git-annex , but would need to see #6237 resolved to get it green on master.

@mih
Copy link
Member Author

mih commented Nov 29, 2021

I am in favor of datalad setting this automatically on windows, unless it is explicitely set otherwise.... But not yet. First we need to get some more experience with it.

@adswa
Copy link
Member

adswa commented Nov 29, 2021

I was thinking to add it as an optional configuration into the handbooks installation instructions for now.

@mih
Copy link
Member Author

mih commented Nov 29, 2021

It would be hard to justify a default slowdown of 3x without any downsides.

@adswa
Copy link
Member

adswa commented Nov 29, 2021

It would be hard to justify a default slowdown of 3x without any downsides.

Is this a "yes" to include it in the handbook?

@mih
Copy link
Member Author

mih commented Nov 29, 2021

Sorry, yes, inclusion in the handbook is definitely useful. But on top of that we should aim for an optimized default. It is planned for git-annex anyways (v9 or late v8). We might want to jump earlier.

@adswa
Copy link
Member

adswa commented Nov 29, 2021

Sorry, yes, inclusion in the handbook is definitely useful.

datalad-handbook/book#791 done

But on top of that we should aim for an optimized default. It is planned for git-annex anyways (v9 or late v8). We might want to jump earlier.

I agree

@da5nsy
Copy link

da5nsy commented Jul 24, 2023

@mih

It is planned for git-annex anyways (v9 or late v8)

Did this happen? (I don't know the system well enough to know where to look)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-tests Changes only affect tests, no impact on version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants