Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SSH remote annex bundle sensing configurable and off by default #6533

Merged
merged 1 commit into from
Mar 11, 2022

Conversation

mih
Copy link
Member

@mih mih commented Mar 10, 2022

Previously, this feature was enabled unconditionally for any use of
datalad sshrun. We use this command as GIT_SSH_COMMAND and also
configure it to be used by git-annex. Consequently, it is used for
pretty much any SSH-related functionality.

This change introduces a new config switch to enable/disable this
feature, and the defaults of the underlying methods have been adjusted
to honor this switch.

As can be seen below, this has the potential to increase remote
execution performance substantially (here 500ms or 20+% for a call to
uname. The observed cost is per each call to sshrun.

===> multitime results
1: datalad -c datalad.ssh.try-use-annex-bundled-git=true sshrun judac.fz-juelich.de uname
            Mean        Std.Dev.    Min         Median      Max
real        2.530       0.126       2.367       2.526       2.703
user        0.266       0.015       0.253       0.263       0.294
sys         0.081       0.008       0.071       0.081       0.094

===> multitime results
1: datalad -c datalad.ssh.try-use-annex-bundled-git=false sshrun judac.fz-juelich.de uname
            Mean        Std.Dev.    Min         Median      Max
real        1.899       0.024       1.869       1.905       1.932
user        0.271       0.013       0.249       0.274       0.288
sys         0.064       0.018       0.041       0.064       0.093

The default of this config switch is OFF (in contrast to the previous
default behavior). Rational: In general it is nohow guaranteed that

  • there will be a git-annex installation on the remote end there
  • a potentially existing bundled Git is "better" (could be missing
    system-provided security fixes)

Another reason is that in the case of non-multiplexed connections, even the auth-step is duplicated for sensing. See #6524 (comment)

Changelog

馃挮 Enhancements and new features

  • A new configuration setting datalad.ssh.try-use-annex-bundled-git=yes|no can be used to influence the default remote git-annex bundle sensing for SSH connections. This was previously done unconditionally for any call to datalad sshrun (which is also used for any SSH-related Git or git-annex functionality triggered by DataLad-internal processing) and could incur a substantial per-call runtime cost. The new default is to not perform this sensing, because for, e.g., use as GIT_SSH_COMMAND there is no expectation to have a remote git-annex installation, and even with an existing git-annex/Git bundle on the remote, it is not certain that the bundled Git version is to be preferred over any other Git installation in a user's PATH. Fixes Relevance of SSH use_remote_annex_bundle聽#6530

Previously, this feature was enabled unconditionally for any use of
`datalad sshrun`. We use this command as `GIT_SSH_COMMAND` and also
configure it to be used by git-annex. Consequently, it is used for
pretty much any SSH-related functionality.

This change introduces a new config switch to enable/disable this
feature, and the defaults of the underlying methods have been adjusted
to honor this switch.

As can be seen below, this has the potential to increase remote
execution performance substantially (here 500ms or 20+% for a call to
`uname`. The observed cost is per each call to `sshrun`.

```
===> multitime results
1: datalad -c datalad.ssh.try-use-annex-bundled-git=true sshrun judac.fz-juelich.de uname
            Mean        Std.Dev.    Min         Median      Max
real        2.530       0.126       2.367       2.526       2.703
user        0.266       0.015       0.253       0.263       0.294
sys         0.081       0.008       0.071       0.081       0.094

===> multitime results
1: datalad -c datalad.ssh.try-use-annex-bundled-git=false sshrun judac.fz-juelich.de uname
            Mean        Std.Dev.    Min         Median      Max
real        1.899       0.024       1.869       1.905       1.932
user        0.271       0.013       0.249       0.274       0.288
sys         0.064       0.018       0.041       0.064       0.093
```

The default of this config switch is OFF (in contrast to the previous
default behavior). Rational: In general it is nohow guaranteed that

- there will be a git-annex installation on the remote end there
- a potentially existing bundled Git is "better" (could be missing
  system-provided security fixes)

Fixes datalad#6530
@codeclimate
Copy link

codeclimate bot commented Mar 10, 2022

Code Climate has analyzed commit e7f800a and detected 1 issue on this pull request.

Here's the issue category breakdown:

Category Count
Security 1

View more on Code Climate.

@mih mih added team-core core API/commands (https://github.com/datalad/datalad/issues/6365) semver-minor Increment the minor version when merged labels Mar 10, 2022
@mih
Copy link
Member Author

mih commented Mar 10, 2022

Need to rework after #6532 is merged. This will simplify going the last mile by quite a bit.

... Actually a simple rebase on #6532 already fixes things, so I will just wait for that merge to happen.

@yarikoptic
Copy link
Member

nice speed up. Overall I am for the change unless we again encounter too many cases where setting needs to be set (then I would vote to revert to opt-out). filed related #6536 which could be done after.

sample of failing tests on travis -- might need that config used, heh
======================================================================
FAIL: datalad.support.tests.test_annexrepo.test_annex_ssh
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_annexrepo.py", line 1229, in test_annex_ssh
    ok_(exists(socket_1))
AssertionError: None
======================================================================
FAIL: datalad.support.tests.test_gitrepo.test_GitRepo_ssh_fetch
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_gitrepo.py", line 598, in test_GitRepo_ssh_fetch
    assert_in(socket_path, list(map(str, ssh_manager._connections)))
AssertionError: '/home/travis/.cache/datalad/sockets/c09a210a' not found in ['/home/travis/.cache/datalad/sockets/30e8eaf7', '/home/travis/.cache/datalad/sockets/c194b5b0', '/home/travis/.cache/datalad/sockets/5cc59571']
======================================================================
FAIL: datalad.support.tests.test_gitrepo.test_GitRepo_ssh_push
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 874, in _wrap_with_tempfile
    return t(*(arg + (filename,)), **kw)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_gitrepo.py", line 637, in test_GitRepo_ssh_push
    assert_in(socket_path, list(map(str, ssh_manager._connections)))
AssertionError: '/home/travis/.cache/datalad/sockets/c09a210a' not found in ['/home/travis/.cache/datalad/sockets/30e8eaf7', '/home/travis/.cache/datalad/sockets/c194b5b0', '/home/travis/.cache/datalad/sockets/5cc59571']
======================================================================
FAIL: datalad.support.tests.test_sshconnector.test_ssh_custom_identity_file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 214, in _wrap_skip_if_on_windows
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_sshconnector.py", line 307, in test_ssh_custom_identity_file
    ok_(exists(expected_socket))
AssertionError: None
-------------------- >> begin captured logging << --------------------
datalad.support.sshconnector: DEBUG: Opening MultiplexSSHConnection(ctrl_path=<<PosixPath('/ho++39 chars++c0')>>, sshri=SSHRI(hostname='datalad-test')) by calling ['ssh', '-i', '/tmp/dl-test-ssh-id', '-fN', '-o', 'ControlMaster=auto', '-o', 'ControlPersist=15m', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test']
datalad.support.sshconnector: DEBUG: MultiplexSSHConnection(ctrl_path=<<PosixPath('/ho++39 chars++c0')>>, sshri=SSHRI(hostname='datalad-test')) is used to run ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test', 'echo blah']
datalad.runner.runner: DEBUG: Run ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test', 'echo blah'] (cwd=None)
datalad.runner.runner: DEBUG: Finished ['ssh', '-o', 'ControlPath=/home/travis/.cache/datalad/sockets/b6bbb9c0', 'datalad-test', 'echo blah'] with status 0
--------------------- >> end captured logging << ---------------------
======================================================================
FAIL: datalad.support.tests.test_sshconnector.test_ssh_manager_close
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 288, in _wrap_skip_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/tests/utils.py", line 307, in _wrap_skip_nomultiplex_ssh
    return func(*args, **kwargs)
  File "/tmp/dl-miniconda-f_dn50d1/lib/python3.9/site-packages/datalad/support/tests/test_sshconnector.py", line 166, in test_ssh_manager_close
    ok_(exists(opj(str(manager.socket_dir),
AssertionError: None

@mih
Copy link
Member Author

mih commented Mar 11, 2022

This is all green, minus the "traditional" Mac failures. So, importantly, no changes to any CI setup were necessary.

@mih mih merged commit eeb2ccc into datalad:master Mar 11, 2022
@mih mih deleted the bf-6530 branch March 11, 2022 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver-minor Increment the minor version when merged team-core core API/commands (https://github.com/datalad/datalad/issues/6365)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relevance of SSH use_remote_annex_bundle
2 participants