-
Notifications
You must be signed in to change notification settings - Fork 120
Make SSH remote annex bundle sensing configurable and off by default #6533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, this feature was enabled unconditionally for any use of
`datalad sshrun`. We use this command as `GIT_SSH_COMMAND` and also
configure it to be used by git-annex. Consequently, it is used for
pretty much any SSH-related functionality.
This change introduces a new config switch to enable/disable this
feature, and the defaults of the underlying methods have been adjusted
to honor this switch.
As can be seen below, this has the potential to increase remote
execution performance substantially (here 500ms or 20+% for a call to
`uname`. The observed cost is per each call to `sshrun`.
```
===> multitime results
1: datalad -c datalad.ssh.try-use-annex-bundled-git=true sshrun judac.fz-juelich.de uname
Mean Std.Dev. Min Median Max
real 2.530 0.126 2.367 2.526 2.703
user 0.266 0.015 0.253 0.263 0.294
sys 0.081 0.008 0.071 0.081 0.094
===> multitime results
1: datalad -c datalad.ssh.try-use-annex-bundled-git=false sshrun judac.fz-juelich.de uname
Mean Std.Dev. Min Median Max
real 1.899 0.024 1.869 1.905 1.932
user 0.271 0.013 0.249 0.274 0.288
sys 0.064 0.018 0.041 0.064 0.093
```
The default of this config switch is OFF (in contrast to the previous
default behavior). Rational: In general it is nohow guaranteed that
- there will be a git-annex installation on the remote end there
- a potentially existing bundled Git is "better" (could be missing
system-provided security fixes)
Fixes datalad#6530
|
Code Climate has analyzed commit e7f800a and detected 1 issue on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
|
nice speed up. Overall I am for the change unless we again encounter too many cases where setting needs to be set (then I would vote to revert to opt-out). filed related #6536 which could be done after. sample of failing tests on travis -- might need that config used, heh======================================================================
FAIL: datalad.support.tests.test_annexrepo.test_annex_ssh
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
return t(*(arg + (filename,)), **kw)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_annexrepo.py", line 1229, in test_annex_ssh
ok_(exists(socket_1))
AssertionError: None
======================================================================
FAIL: datalad.support.tests.test_gitrepo.test_GitRepo_ssh_fetch
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
return t(*(arg + (filename,)), **kw)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
return t(*(arg + (filename,)), **kw)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_gitrepo.py", line 598, in test_GitRepo_ssh_fetch
assert_in(socket_path, list(map(str, ssh_manager._connections)))
AssertionError: '/home/travis/.cache/datalad/sockets/c09a210a' not found in ['/home/travis/.cache/datalad/sockets/30e8eaf7', '/home/travis/.cache/datalad/sockets/c194b5b0', '/home/travis/.cache/datalad/sockets/5cc59571']
======================================================================
FAIL: datalad.support.tests.test_gitrepo.test_GitRepo_ssh_push
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
return t(*(arg + (filename,)), **kw)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
return t(*(arg + (filename,)), **kw)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_gitrepo.py", line 637, in test_GitRepo_ssh_push
assert_in(socket_path, list(map(str, ssh_manager._connections)))
AssertionError: '/home/travis/.cache/datalad/sockets/c09a210a' not found in ['/home/travis/.cache/datalad/sockets/30e8eaf7', '/home/travis/.cache/datalad/sockets/c194b5b0', '/home/travis/.cache/datalad/sockets/5cc59571']
======================================================================
FAIL: datalad.support.tests.test_sshconnector.test_ssh_custom_identity_file
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 214, in _wrap_skip_if_on_windows
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_sshconnector.py", line 307, in test_ssh_custom_identity_file
ok_(exists(expected_socket))
AssertionError: None
-------------------- >> begin captured logging << --------------------
datalad.support.sshconnector: DEBUG: Opening MultiplexSSHConnection(ctrl_path=<<PosixPath('/ho++39 chars++c0')>>, sshri=SSHRI(hostname='datalad-test')) by calling ['ssh', '-i', '/tmp/dl-test-ssh-id', '-fN', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=15m', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test']
datalad.support.sshconnector: DEBUG: MultiplexSSHConnection(ctrl_path=<<PosixPath('/ho++39 chars++c0')>>, sshri=SSHRI(hostname='datalad-test')) is used to run ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test', 'echo blah']
datalad.runner.runner: DEBUG: Run ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test', 'echo blah'] (cwd=None)
datalad.runner.runner: DEBUG: Finished ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test', 'echo blah'] with status 0
--------------------- >> end captured logging << ---------------------
======================================================================
FAIL: datalad.support.tests.test_sshconnector.test_ssh_manager_close
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
return func(*args, **kwargs)
File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_sshconnector.py", line 166, in test_ssh_manager_close
ok_(exists(opj(str(manager.socket_dir),
AssertionError: None
|
|
This is all green, minus the "traditional" Mac failures. So, importantly, no changes to any CI setup were necessary. |
Previously, this feature was enabled unconditionally for any use of
datalad sshrun. We use this command asGIT_SSH_COMMANDand alsoconfigure it to be used by git-annex. Consequently, it is used for
pretty much any SSH-related functionality.
This change introduces a new config switch to enable/disable this
feature, and the defaults of the underlying methods have been adjusted
to honor this switch.
As can be seen below, this has the potential to increase remote
execution performance substantially (here 500ms or 20+% for a call to
uname. The observed cost is per each call tosshrun.The default of this config switch is OFF (in contrast to the previous
default behavior). Rational: In general it is nohow guaranteed that
system-provided security fixes)
Another reason is that in the case of non-multiplexed connections, even the auth-step is duplicated for sensing. See #6524 (comment)
Changelog
💫 Enhancements and new features
datalad.ssh.try-use-annex-bundled-git=yes|nocan be used to influence the default remote git-annex bundle sensing for SSH connections. This was previously done unconditionally for any call todatalad sshrun(which is also used for any SSH-related Git or git-annex functionality triggered by DataLad-internal processing) and could incur a substantial per-call runtime cost. The new default is to not perform this sensing, because for, e.g., use as GIT_SSH_COMMAND there is no expectation to have a remote git-annex installation, and even with an existing git-annex/Git bundle on the remote, it is not certain that the bundled Git version is to be preferred over any other Git installation in a user's PATH. Fixes Relevance of SSHuse_remote_annex_bundle#6530