New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NF: Make multiplexed SSH connections an optional feature #5042
Conversation
Make room for a representation of a non-multiplexed SSH connection. Anticipates a leaner SSHManager too, which does not need to keep track of single-process SSH connections. Also streamline the handling of SSH args. Instead of keeping individual ones, let the constructor(s) lump them into categories "for opening" and "for any call". This seems to be sufficient. Ping datalad#2575
Off on windows, due to current platform limitations, and on anywhere else, by default.
RF MultiplexSSHConnection some more to make this new class absolutely minimal.
233409b
to
37142ea
Compare
Implemented via a central switch datalad.ssh.multiplex-connections.
37142ea
to
5df4fc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Didn't try yet, but from looking through the code - LGTM!
I guess currently missing things like not actually changing PATH for remote execution but directly calling git/annex and replacing Runner better be part of a follow up to keep changes conceivable. |
Codecov Report
@@ Coverage Diff @@
## maint #5042 +/- ##
==========================================
- Coverage 89.97% 89.88% -0.09%
==========================================
Files 292 292
Lines 40742 40799 +57
==========================================
+ Hits 36656 36674 +18
- Misses 4086 4125 +39
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would have been valuable to get at least some basic smoke test explicitly for each (where supported) of the managers... and/(or at least) I would have also made one of the SSH runs of travis to set multiplex-connections to 0 so we have a run through with that manager even on linux.
Left various comments/suggestions, didn't try to run yet
datalad/interface/common_cfg.py
Outdated
'ui': ('question', { | ||
'title': "Whether to use a single shared connection for multiple SSH processes aiming at the same target."}), | ||
'destination': 'global', | ||
'default': False if on_windows else True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'default': False if on_windows else True, | |
'default': not on_windows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to tune.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do now and will add travis run for non-multiplexed.
@@ -268,6 +269,13 @@ | |||
'destination': 'global', | |||
'default': None, | |||
}, | |||
'datalad.ssh.multiplex-connections': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be could also be useful for #4705 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And others, that is the reason why this is not implemented as a windows-specific fix.
return False | ||
|
||
def open(self): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no docstring here, only in the "original" implementation where it says
bool
True when SSH reports success opening the connection, False when
a ControlMaster for an open connection already exists.
so -- does it mean that we the ControlMaster always exists there? should docs be tuned up or should we introduce returning None
if there is no ControlMaster
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, but it works this way.
datalad/support/sshconnector.py
Outdated
"""Interface for an SSHManager | ||
""" | ||
def ensure_initialized(self): | ||
"""Assures that manager is initialized""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Assures that manager is initialized""" | |
"""Ensures that manager is initialized""" |
;)
------- | ||
BaseSSHConnection | ||
""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we start using abc and @abstractmethod
more? not that I foresee other subclasses to come but IMHO getting consistent use of abc
for interface/base classes would be of advantage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with them. But maybe not in the context of an urgent fix
datalad/support/sshconnector.py
Outdated
# by itself | ||
self.open() | ||
@property | ||
def _runner(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlikely anyone relies on it but especially since PR is against maint
-- what is the reason for making runner
protected ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It made no sense to expose it when building the base class. I you feel like this should be split, I am OK with that.
"""Assures that manager is initialized - knows socket_dir, previous connections | ||
""" | ||
if self._socket_dir is not None: | ||
return | ||
from ..config import ConfigManager | ||
cfg = ConfigManager() | ||
from datalad import cfg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note to myself: didn't check the logs etc, but may be that original code was on purpose to avoid circular imports? or probably be indeed no longer the case since ensure_initialized
is commented out in the constructor... (code coverage annotation is silent for me for this PR ATM for some reason :-( )
@@ -604,6 +796,11 @@ def close(self, allow_fail=True, ctrl_path=None): | |||
self._connections = dict() | |||
|
|||
|
|||
# retain backward compat with 0.13.4 and earlier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to move that check of cfg from __init__
here and bind those two accordingly on the cfg
setting right here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be stripped also. I don't know of any consumer.
# we presently do not support any interactive authentication | ||
# at the time of process execution | ||
'-o', 'PasswordAuthentication=no', | ||
'-o', 'KbdInteractiveAuthentication=no', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I know that it is "inefficient", "annoying", and might be just infeasible in proper real-life cases etc but why to disable completely ? Especially for maint
... I wonder if may be worth adding a dedicated config option, e.g. datalad.ssh.allow-password
. We could even set it to False right away since it would be hit only on Windows (by default)... That still would allow where/when really needed for password based authentication, even though demanding serial execution, multiple entries, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have discussed that in the call this week. It is not possible for most calls, because stdout/err is consumed by some kind of handler. I am sure somehow this could be made possible. But I don't know how, and I have no plans to implement it, in this context of this urgent fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
|
||
def close(self): | ||
# we perform blocking execution, we should not return from __call__ until | ||
# the connection is already closed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by this comment in this context of pass
content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no open connections (unless a remote process is still running), hence no need to close any, ever. Feel free to remove, if it confuses.
Thanks for the review @yarikoptic Feel free (anyone) to make any changes that you see fit, and still keep it working ;-) I will not be able to get back at this any time soon. Sorry. |
Re travis run comment: That was suggested in the top-comment, and there is a link to another issue suggesting a separate run is a good idea. But again, I think getting this out ASAP has higher prior than most of these aspects. |
On "urgency": FWIW trying it out in Win7 VM with anaconda and python 3.8.5. With or without this PR, if I do have an ssh key setup, I can `datalad clone`, just getting those mux error messages printed out(datalad-dev) c:\tmp>datalad clone smaug.datalad.org:/tmp/testds2
Clone attempt: 0%| | 0.00/1.00 [00:00<?, ? Candidate locations/s]mm_receive_fd: no message header
mux_master_process_new_session: failed to receive fd 0 from slave
mm_receive_fd: no message header
mux_master_process_new_session: failed to receive fd 0 from slave
[INFO] Detected a filesystem without fifo support.
[INFO] Disabling ssh connection caching.
[INFO] Detected a crippled filesystem.
[INFO] Scanning for unlocked files (this may take some time)
[INFO] Entering an adjusted branch where files are unlocked as this filesystem does not support locked files.
[INFO] Switched to branch 'adjusted/master(unlocked)'
mm_receive_fd: no message header
mux_master_process_new_session: failed to receive fd 0 from slave
install(ok): c:\tmp\testds2 (dataset) |
To minimize diff, and retain public API within maint
For consistency in naming with SSHManager and the fact that even with Multiplex someone might have just a single process, so imho NoMultiplex is more pertinent
So if we introduce some logging etc, it is just inherited
Ok, no urgency then. Win7 VMs and Python 3.8.5 for all ;-) And sunglasses. |
I took the invitation and pushed some changes, including to make tests work whenever a non-multiplex ssh is used. |
59df029
to
bb0ca3c
Compare
…as @skip_nomultiplex_ssh
test_ssh_custom_identity_file relies on captured log messages to see if -i is passed to the actual ssh invocation. With multiplexed ssh it would have obtained that log message from "open" which does nothing for non-multiplexed. Instead of adding a dummy log message to open of non-multiplexed, I decided that it might be useful to just log actual ssh command ran for any of them, and what connection had ran it. So I added another log message. Alternative fix could ha been: Used deeper to actually invoke ssh Runner logs only at level 5. Making swallow_logs swallow at level 5 works too, but I like log messages, so went for solution above Also, while at the code I spotted that lgr.debug for open was interpolating the log string, so fixed that as well
bb0ca3c
to
8717da1
Compare
Thx @yarikoptic ! |
FTR -- a few tests were still failing but seems unrelated appveyor - test_create.test_saving_prior======================================================================
ERROR: datalad.core.local.tests.test_create.test_saving_prior
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Miniconda35\envs\test-environment\lib\site-packages\nose\case.py", line 198, in runTest
self.test(*self.arg)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\tests\utils.py", line 573, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\core\local\tests\test_create.py", line 372, in test_saving_prior
ds1 = create(topdir, force=True)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 494, in eval_func
return return_func(generator_func)(*args, **kwargs)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 482, in return_func
results = list(results)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 413, in generator_func
allkwargs):
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\interface\utils.py", line 552, in _process_results
for res in results:
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\core\local\create.py", line 406, in __call__
fake_dates=fake_dates
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\repo.py", line 151, in __call__
instance = type.__call__(cls, *new_args, **new_kwargs)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 276, in __init__
self._init(version=version, description=description)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 1281, in _init
annex_options=opts)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\support\annexrepo.py", line 1104, in _run_annex_command
**kwargs)
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\cmd.py", line 144, in run_gitcommand_on_file_list_chunks
results.append(func(cmd, *args, **kwargs))
File "C:\Miniconda35\envs\test-environment\lib\site-packages\datalad\cmd.py", line 509, in run
**results,
datalad.support.exceptions.CommandError: CommandError: '"git" "annex" "init" "-c" "annex.dotfiles=true" "--version" "6"' failed with exitcode 1 under C:\Users\appveyor\AppData\Local\Temp\1\datalad_temp_tree_jqqixz23
-------------------- >> begin captured logging << --------------------
asyncio: DEBUG: Using proactor: IocpProactor
asyncio: DEBUG: taking long time to close proactor
--------------------- >> end captured logging << ---------------------
travis - nfs run test_archives.test_decompress_file('strip',)======================================================================
ERROR: datalad.tests.test_archives.test_decompress_file('strip',)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/utils.py", line 573, in _wrap_with_tree
return t(*(arg + (d,)), **kw)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/tests/test_archives.py", line 79, in check_decompress_file
leading_directories=leading_directories)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/support/archives.py", line 99, in decompress_file
rmdir(widow_dir)
File "/home/travis/virtualenv/python3.5.6/lib/python3.5/site-packages/datalad/utils.py", line 468, in rmdir
os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/tmp/nfsmount/datalad-rbhc9bsf/simple-extracted/ |;&%b5{}\'"ΔЙקم๗あ .dbtc ' |
The non-trivial conflicts are between master's a575224 (RF: SSH connector code uses WitlessRunner instead of Runner, 2020-07-16) and maint's c728888 (Merge pull request datalad#5042 from mih/enh-sshnomultiplex, 2020-10-16). Resolve these by taking the changes from c728888 and then reintegrating the Runner->WitlessRunner changes.
A new switch
datalad.ssh.multiplex-connections
is introduced (off by default on windows).A leaner manager and a different SSHConnection object are used in the non-multiplexed mode.
Manager and Connection classes have been RF'ed substantially.
Also streamlined the handling of SSH args. Instead of keeping individual ones, let the constructor(s) lump them into categories "for opening" and "for any call". This seems to be sufficient.
ATM this is targeting
maint
to have a chance to get SSH cloning working on windows prior 0.14.0 (e.g. #5031). but the necessary changes are substantial. So this will have to be more than a "fix".TODO
Related separate out SSH test builds #4166
Fixes #2575
Fixes #5031