Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/cephadm: replace execnet and remoto with asyncssh #42051

Merged
merged 14 commits into from Aug 23, 2021

Conversation

melissa-kun-li
Copy link
Contributor

@melissa-kun-li melissa-kun-li commented Jun 28, 2021

This PR will replace execnet and remoto with asyncssh implemented in the ssh module. The asyncssh debug log will be redirected to the raised OrchestratorError if an ssh connection error occurs with the remote host. remotes.py is also removed and replaced with _write_remote_file in ssh.py

A thread is started from the EventLoopThread class in ssh.py, which also starts the event loop for running the async functions _remote_connection, _execute_command, _check_execute_command, and _write_remote_file

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li li.melissa.kun@gmail.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

src/pybind/mgr/cephadm/module.py Show resolved Hide resolved
src/pybind/mgr/cephadm/ssh.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/module.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/serve.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/serve.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/serve.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/ssh.py Show resolved Hide resolved
src/pybind/mgr/cephadm/ssh.py Outdated Show resolved Hide resolved
src/pybind/mgr/cephadm/ssh.py Outdated Show resolved Hide resolved
@sebastian-philipp
Copy link
Contributor

Ok, form my POV, this looks ok as it is. If we can get all the various checks to pass and get it through teuthology, this is ready.

@dsavineau
Copy link
Contributor

I think we need someone to start working on the asyncssh packaging for EL8 as this package doesn't exist at the moment.

Debian and Ubuntu already have it [1][2]

[1] https://packages.debian.org/search?keywords=python3-asyncssh
[2] https://packages.ubuntu.com/search?keywords=python3-asyncssh

@tchaikov tchaikov self-assigned this Jul 19, 2021
@tchaikov
Copy link
Contributor

I think we need someone to start working on the asyncssh packaging for EL8 as this package doesn't exist at the moment.

Debian and Ubuntu already have it [1][2]

[1] https://packages.debian.org/search?keywords=python3-asyncssh
[2] https://packages.ubuntu.com/search?keywords=python3-asyncssh

i just created a copr repo at https://copr.fedorainfracloud.org/coprs/tchaikov/python3-asyncssh/, which offers python3-libnacl and python3-asyncssh. python3-libnacl is a build-time and run-time dependency of python3-asyncssh. to install python3-asyncssh, we need to

  1. sudo dnf -y copr enable tchaikov/python3-asyncssh epel-8-$(arch)
  2. sudo dnf -y install python3-asyncssh

to integrate this package into the ceph image, we might need to create a change like ceph/ceph-container#1821.

in the long run, probably a better way to address this dependency is probably to maintain these packages in EPEL8.

@melissa-kun-li melissa-kun-li force-pushed the asyncssh branch 2 times, most recently from 30e19ce to ca2b23a Compare July 21, 2021 05:11
@melissa-kun-li
Copy link
Contributor Author

jenkins test make check

@tchaikov tchaikov removed their assignment Jul 21, 2021
@mgfritch
Copy link
Contributor

make check failures appear related to #42425

../src/test/rgw/test_rgw_dmclock_scheduler.cc:425: Failure
Value of: context.stopped()
  Actual: false
Expected: true

@melissa-kun-li
Copy link
Contributor Author

jenkins test make check

@melissa-kun-li melissa-kun-li changed the title [WIP] mgr/cephadm: replace execnet and remoto with asyncssh mgr/cephadm: replace execnet and remoto with asyncssh Jul 21, 2021
@dsavineau
Copy link
Contributor

dsavineau commented Jul 22, 2021

i just created a copr repo at https://copr.fedorainfracloud.org/coprs/tchaikov/python3-asyncssh/, which offers python3-libnacl and python3-asyncssh. python3-libnacl is a build-time and run-time dependency of python3-asyncssh. to install python3-asyncssh, we need to

  1. sudo dnf -y copr enable tchaikov/python3-asyncssh epel-8-$(arch)
  2. sudo dnf -y install python3-asyncssh

to integrate this package into the ceph image, we might need to create a change like ceph/ceph-container#1821.

in the long run, probably a better way to address this dependency is probably to maintain these packages in EPEL8.

Thanks for taking care of this @tchaikov

I'm preparing a PR for ceph-container.

I just have a question about the python3-libnacl dependency. Currently the dependency isn't installed automatically when installing python3-asyncssh. Is this expected ?

# dnf install python3-asyncssh
Failed to set locale, defaulting to C.UTF-8
Last metadata expiration check: 0:00:09 ago on Thu Jul 22 15:44:49 2021.
Dependencies resolved.
==============================================================================================================================================================================================================================================
 Package                                               Architecture                            Version                                        Repository                                                                                 Size
==============================================================================================================================================================================================================================================
Installing:
 python3-asyncssh                                      noarch                                  2.7.0-1.el8                                    copr:copr.fedorainfracloud.org:tchaikov:python3-asyncssh                                  476 k
Installing dependencies:
 python3-cffi                                          x86_64                                  1.11.5-5.el8                                   baseos                                                                                    237 k
 python3-cryptography                                  x86_64                                  3.2.1-4.el8                                    baseos                                                                                    559 k
 python3-ply                                           noarch                                  3.9-9.el8                                      baseos                                                                                    111 k
 python3-pycparser                                     noarch                                  2.14-14.el8                                    baseos                                                                                    109 k

Transaction Summary
==============================================================================================================================================================================================================================================
Install  5 Packages

Total download size: 1.5 M
Installed size: 7.2 M
Is this ok [y/N]: n
Operation aborted.

I can see that libnacl is optional and only required for some scenario [1] which I think we're not using. Am I wrong ?

@melissa-kun-li are we using anything specific from [1] here ?

If we need to install this dependency explicitly then this should be done at package level (spec file and debian/control) and we need to update the requirements.txt file with something like asynssh[libnacl]

[1] https://asyncssh.readthedocs.io/en/latest/#optional-extras

@tchaikov
Copy link
Contributor

i just created a copr repo at https://copr.fedorainfracloud.org/coprs/tchaikov/python3-asyncssh/, which offers python3-libnacl and python3-asyncssh. python3-libnacl is a build-time and run-time dependency of python3-asyncssh. to install python3-asyncssh, we need to

  1. sudo dnf -y copr enable tchaikov/python3-asyncssh epel-8-$(arch)
  2. sudo dnf -y install python3-asyncssh

to integrate this package into the ceph image, we might need to create a change like ceph/ceph-container#1821.
in the long run, probably a better way to address this dependency is probably to maintain these packages in EPEL8.

Thanks for taking care of this @tchaikov

I'm preparing a PR for ceph-container.

I just have a question about the python3-libnacl dependency. Currently the dependency isn't installed automatically when installing python3-asyncssh. Is this expected ?

yes, it is expected. see https://src.fedoraproject.org/rpms/python-asyncssh/blob/rawhide/f/python-asyncssh.spec#_35 .

it depends on if we want to support the Ed25519 key signing algorithm. although it's a recommended key signing algorithm nowadays, it might not be that popular. as ssh-keygen generates an RSA key by default.

but, if user uses Ed25519 for SSH key signing, python3-asyncssh alone does not work. if we want to be safe, i'd suggest install python3-libnacl as well.

I can see that libnacl is optional and only required for some scenario [1] which I think we're not using. Am I wrong ?

@melissa-kun-li are we using anything specific from [1] here ?

If we need to install this dependency explicitly then this should be done at package level (spec file and debian/control) and we need to update the requirements.txt file with something like asynssh[libnacl]

[1] https://asyncssh.readthedocs.io/en/latest/#optional-extras

@dsavineau
Copy link
Contributor

it depends on if we want to support the Ed25519 key signing algorithm. although it's a recommended key signing algorithm nowadays, it might not be that popular. as ssh-keygen generates an RSA key by default.

but, if user uses Ed25519 for SSH key signing, python3-asyncssh alone does not work. if we want to be safe, i'd suggest install python3-libnacl as well.

I might be wrong, but it seems openssl 1.1.1 already support ed25519 keys and libnacl is only required (as per the asyncssh doc) when using openssl prior 1.1.1b. CentOS 8 uses 1.1.1g

https://asyncssh.readthedocs.io/en/stable/api.html#key-exchange-algorithms

@tchaikov
Copy link
Contributor

@dsavineau thank you for reading the document! indeed. if we have the luxury of using 1.1.1b and up, seems we are able to have access to the support of all the pub key types. so in our use case, where CentOS8 is the target platform, python3-asyncssh alone would suffice.

@melissa-kun-li
Copy link
Contributor Author

thanks @tchaikov and @dsavineau !

@melissa-kun-li are we using anything specific from [1] here ?

If we need to install this dependency explicitly then this should be done at package level (spec file and debian/control) and we need to update the requirements.txt file with something like asynssh[libnacl]

[1] https://asyncssh.readthedocs.io/en/latest/#optional-extras

i don't think we need anything from the asyncssh optional extras. i think python3-bcrypt is already installed in the ceph container

ceph.spec.in Show resolved Hide resolved
… asyncssh connection errors

Create asyncssh connection object in async `_remote_connection` function and cache in `self.cons`
Create a handler for asyncssh log redirection and output ssh log if a connection error occurs
Disable asyncssh logger from propagating because the asyncssh info messages are verbose

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
_execute_command will run commands over ssh using the asyncssh `run` method: https://asyncssh.readthedocs.io/en/latest/api.html#asyncssh.SSHClientConnection.run
_check_execute_command will check the output of _execute_command and raise OrchestratorError if command fails on the remote host.
All commands run over ssh are prepended with sudo in `_execute_command` and shell-escaped with shlex quote.
If the cached ssh connection is closed or broken, the connection object will be removed from the cache, added to the `offline_hosts`, and an OrchestratorError will be raised. On the next call, the connection object will attempt to be recreated.
Exceptions involving asyncssh methods should be handled otherwise errors like TypeError: __init__() missing 1 required positional argument: 'reason' could occur due to the asyncssh error interacting with `raise_if_exception`

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…on remote host

_write_remote_file uses _check_execute_command in ssh.py which calls _execute_command which uses shlex quote. Thus, any commands with an int will need to be transformed into a str because shlex quote does not take int objects

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
… results of the async functions with get_result

The EventLoopThread class starts a thread and an event loop which runs forever. Coroutines are scheduled on the event loop by the `get_result` method which uses `run_coroutine_threadsafe` to return a concurrent.futures.Future, and ultimately the result with .result()

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…s in ssh.py

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…ve.py with write_remote_file in ssh.py

remove remotes.py because it is specific to execnet/remoto.
_write_remote_file in ssh.py now fulfills the function of write_file in remotes.py and the old _write_remote_file in serve.py

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…k_execute_command in _run_cephadm

remove _get_connection from module.py and _remote_connection in serve.py, replacing with _remote_connection in ssh.py.
also, replace remoto.process.check with _execute_command and _check_execute_command in ssh.py

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
…binary_conn

_deploy_cephadm_binary can be simplified to call write_remote_file in ssh.py, and _deploy_cephadm_binary_conn is no longer needed

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
add `event_loop` and `tkey` object to with_cephadm_module, and create MockEventLoopThread in fixtures.py to test async functions of ssh.py.
rewrite test_offline to be compatible with asyncssh

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
remove test_stale_connections because it applies to remoto

Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Fixes: https://tracker.ceph.com/issues/44676
Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
@melissa-kun-li
Copy link
Contributor Author

https://pulpito.ceph.com/adking-2021-08-20_22:03:22-rados:cephadm-wip-adk-testing-2021-08-20-1531-distro-basic-smithi/

Looks like asyncssh was attempting GSSAPI auth and failing somewhere, causing the two CancelledErrors for GSSAPI keyex and GSSAPI-with-mic : https://gist.github.com/melissa-kun-li/fd62b90c8c15e153bf7d9fed20d69f50. So I disabled it from attempting GSSAPI auth in the asyncssh.connect() call ( preferred_auth=['publickey'] ) and now it looks like tests are passing for Centos 8.3. In this case, users can still have GSSAPIAuthentication yes in their sshd_config or ssh_config and use GSSAPI for other things on their machine, asyncssh / cephadm will just not attempt it. We don't need GSSAPI auth because we're using publickey

@sebastian-philipp sebastian-philipp dismissed tchaikov’s stale review August 23, 2021 12:28

review was stale. Thank you Kefu!

@sebastian-philipp sebastian-philipp merged commit fb7cb4b into ceph:master Aug 23, 2021
@ktdreyer
Copy link
Member

ktdreyer commented Sep 3, 2021

asyncssh is headed to EPEL 8 in https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-9e9ea1117b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants