New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: replace execnet and remoto with asyncssh #42051
Conversation
Ok, form my POV, this looks ok as it is. If we can get all the various checks to pass and get it through teuthology, this is ready. |
I think we need someone to start working on the asyncssh packaging for EL8 as this package doesn't exist at the moment. Debian and Ubuntu already have it [1][2] [1] https://packages.debian.org/search?keywords=python3-asyncssh |
i just created a copr repo at https://copr.fedorainfracloud.org/coprs/tchaikov/python3-asyncssh/, which offers
to integrate this package into the ceph image, we might need to create a change like ceph/ceph-container#1821. in the long run, probably a better way to address this dependency is probably to maintain these packages in EPEL8. |
30e19ce
to
ca2b23a
Compare
jenkins test make check |
make check failures appear related to #42425
|
jenkins test make check |
Thanks for taking care of this @tchaikov I'm preparing a PR for ceph-container. I just have a question about the python3-libnacl dependency. Currently the dependency isn't installed automatically when installing python3-asyncssh. Is this expected ? # dnf install python3-asyncssh
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 0:00:09 ago on Thu Jul 22 15:44:49 2021.
Dependencies resolved.
==============================================================================================================================================================================================================================================
Package Architecture Version Repository Size
==============================================================================================================================================================================================================================================
Installing:
python3-asyncssh noarch 2.7.0-1.el8 copr:copr.fedorainfracloud.org:tchaikov:python3-asyncssh 476 k
Installing dependencies:
python3-cffi x86_64 1.11.5-5.el8 baseos 237 k
python3-cryptography x86_64 3.2.1-4.el8 baseos 559 k
python3-ply noarch 3.9-9.el8 baseos 111 k
python3-pycparser noarch 2.14-14.el8 baseos 109 k
Transaction Summary
==============================================================================================================================================================================================================================================
Install 5 Packages
Total download size: 1.5 M
Installed size: 7.2 M
Is this ok [y/N]: n
Operation aborted. I can see that libnacl is optional and only required for some scenario [1] which I think we're not using. Am I wrong ? @melissa-kun-li are we using anything specific from [1] here ? If we need to install this dependency explicitly then this should be done at package level (spec file and debian/control) and we need to update the requirements.txt file with something like [1] https://asyncssh.readthedocs.io/en/latest/#optional-extras |
yes, it is expected. see https://src.fedoraproject.org/rpms/python-asyncssh/blob/rawhide/f/python-asyncssh.spec#_35 . it depends on if we want to support the Ed25519 key signing algorithm. although it's a recommended key signing algorithm nowadays, it might not be that popular. as but, if user uses Ed25519 for SSH key signing,
|
I might be wrong, but it seems openssl 1.1.1 already support ed25519 keys and libnacl is only required (as per the asyncssh doc) when using openssl prior 1.1.1b. CentOS 8 uses 1.1.1g https://asyncssh.readthedocs.io/en/stable/api.html#key-exchange-algorithms |
@dsavineau thank you for reading the document! indeed. if we have the luxury of using 1.1.1b and up, seems we are able to have access to the support of all the pub key types. so in our use case, where CentOS8 is the target platform, |
thanks @tchaikov and @dsavineau !
i don't think we need anything from the asyncssh optional extras. i think |
0c1e9e8
to
028d165
Compare
…Centos8 Disable GSSAPIAuthentication on the RHEL8 and Centos8 testnodes for this PR: ceph/ceph#42051 Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
… asyncssh connection errors Create asyncssh connection object in async `_remote_connection` function and cache in `self.cons` Create a handler for asyncssh log redirection and output ssh log if a connection error occurs Disable asyncssh logger from propagating because the asyncssh info messages are verbose Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
_execute_command will run commands over ssh using the asyncssh `run` method: https://asyncssh.readthedocs.io/en/latest/api.html#asyncssh.SSHClientConnection.run _check_execute_command will check the output of _execute_command and raise OrchestratorError if command fails on the remote host. All commands run over ssh are prepended with sudo in `_execute_command` and shell-escaped with shlex quote. If the cached ssh connection is closed or broken, the connection object will be removed from the cache, added to the `offline_hosts`, and an OrchestratorError will be raised. On the next call, the connection object will attempt to be recreated. Exceptions involving asyncssh methods should be handled otherwise errors like TypeError: __init__() missing 1 required positional argument: 'reason' could occur due to the asyncssh error interacting with `raise_if_exception` Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…on remote host _write_remote_file uses _check_execute_command in ssh.py which calls _execute_command which uses shlex quote. Thus, any commands with an int will need to be transformed into a str because shlex quote does not take int objects Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
… results of the async functions with get_result The EventLoopThread class starts a thread and an event loop which runs forever. Coroutines are scheduled on the event loop by the `get_result` method which uses `run_coroutine_threadsafe` to return a concurrent.futures.Future, and ultimately the result with .result() Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…s in ssh.py Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…ve.py with write_remote_file in ssh.py remove remotes.py because it is specific to execnet/remoto. _write_remote_file in ssh.py now fulfills the function of write_file in remotes.py and the old _write_remote_file in serve.py Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…k_execute_command in _run_cephadm remove _get_connection from module.py and _remote_connection in serve.py, replacing with _remote_connection in ssh.py. also, replace remoto.process.check with _execute_command and _check_execute_command in ssh.py Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…binary_conn _deploy_cephadm_binary can be simplified to call write_remote_file in ssh.py, and _deploy_cephadm_binary_conn is no longer needed Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
add `event_loop` and `tkey` object to with_cephadm_module, and create MockEventLoopThread in fixtures.py to test async functions of ssh.py. rewrite test_offline to be compatible with asyncssh Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
remove test_stale_connections because it applies to remoto Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
028d165
to
791d8be
Compare
Looks like asyncssh was attempting GSSAPI auth and failing somewhere, causing the two CancelledErrors for GSSAPI keyex and GSSAPI-with-mic : https://gist.github.com/melissa-kun-li/fd62b90c8c15e153bf7d9fed20d69f50. So I disabled it from attempting GSSAPI auth in the asyncssh.connect() call ( |
review was stale. Thank you Kefu!
asyncssh is headed to EPEL 8 in https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-9e9ea1117b |
This PR will replace
execnet
andremoto
withasyncssh
implemented in thessh
module. Theasyncssh
debug log will be redirected to the raised OrchestratorError if an ssh connection error occurs with the remote host.remotes.py
is also removed and replaced with_write_remote_file
inssh.py
A thread is started from the
EventLoopThread
class inssh.py
, which also starts the event loop for running the async functions_remote_connection
,_execute_command
,_check_execute_command
, and_write_remote_file
Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li li.melissa.kun@gmail.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox