Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/orchestrator: use deepcopy for copying exceptions #32881

Merged
merged 3 commits into from Feb 6, 2020

Conversation

tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Jan 26, 2020

mgr/orchestrator: use deepcopy for copying exceptions

since rexec module has been removed in python3, we cannot use it anymore.

Fixes: https://tracker.ceph.com/issues/43657
Signed-off-by: Kefu Chai kchai@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@tchaikov tchaikov changed the title mgr/orchestrator: load relative module also mgr/orchestrator: use deepcopy for copying exceptions Jan 29, 2020
@tchaikov
Copy link
Contributor Author

@liewegas that's a different issue. i update the PR to address it.

Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work. See my comment to https://tracker.ceph.com/issues/43913

@liewegas
Copy link
Member

see also https://tracker.ceph.com/issues/43913 ?

@tchaikov
Copy link
Contributor Author

tchaikov commented Jan 30, 2020

Doesn't work. See my comment to https://tracker.ceph.com/issues/43913

i think you are talking about a different issue which is not directly related to my fix.

@tchaikov
Copy link
Contributor Author

@liewegas i replied in the tracker.

@tchaikov
Copy link
Contributor Author

tchaikov commented Jan 31, 2020

2020-01-30T18:15:15.870 INFO:tasks.ceph.mgr.x.smithi042.stderr:Warning: Permanently added 'smithi042.front.sepia.ceph.com,172.21.15.42' (ECDSA) to the list of known hosts.
2020-01-30T18:15:15.925 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.932 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.939 INFO:tasks.ceph.mgr.x.smithi042.stderr:root@smithi042.front.sepia.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
2020-01-30T18:15:16.819 INFO:tasks.ceph.mgr.x.smithi042.stderr:2020-01-30T18:15:16.815+0000 7efe14962700 -1 mgr handle_command module 'orchestrator_cli' command handler threw exception: -F /tmp/cephadm-conf-3ttqeyxn root@smithi042.front.sepia.ceph.com
2020-01-30T18:15:16.821 INFO:tasks.ceph.mgr.x.smithi042.stderr:2020-01-30T18:15:16.816+0000 7efe14962700 -1 mgr.server reply reply (22) Invalid argument Traceback (most recent call last):
2020-01-30T18:15:16.821 INFO:tasks.ceph.mgr.x.smithi042.stderr:  File "/usr/share/ceph/mgr/mgr_module.py", line 1069, in _handle_command
2020-01-30T18:15:16.821 INFO:tasks.ceph.mgr.x.smithi042.stderr:    return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
2020-01-30T18:15:16.821 INFO:tasks.ceph.mgr.x.smithi042.stderr:  File "/usr/share/ceph/mgr/mgr_module.py", line 309, in call
2020-01-30T18:15:16.821 INFO:tasks.ceph.mgr.x.smithi042.stderr:    return self.func(mgr, **kwargs)
2020-01-30T18:15:16.821 INFO:tasks.ceph.mgr.x.smithi042.stderr:  File "/usr/share/ceph/mgr/orchestrator.py", line 141, in wrapper
2020-01-30T18:15:16.822 INFO:tasks.ceph.mgr.x.smithi042.stderr:    return func(*args, **kwargs)
2020-01-30T18:15:16.822 INFO:tasks.ceph.mgr.x.smithi042.stderr:  File "/usr/share/ceph/mgr/orchestrator_cli/module.py", line 164, in _add_host
2020-01-30T18:15:16.822 INFO:tasks.ceph.mgr.x.smithi042.stderr:    orchestrator.raise_if_exception(completion)
2020-01-30T18:15:16.822 INFO:tasks.ceph.mgr.x.smithi042.stderr:  File "/usr/share/ceph/mgr/orchestrator.py", line 638, in raise_if_exception
2020-01-30T18:15:16.822 INFO:tasks.ceph.mgr.x.smithi042.stderr:    raise e
2020-01-30T18:15:16.822 INFO:tasks.ceph.mgr.x.smithi042.stderr:execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-3ttqeyxn root@smithi042.front.sepia.ceph.com

mgr.x was not able establish an ssh connection to smithi042. see http://pulpito.ceph.com/kchai-2020-01-31_03:20:11-rados-wip-sage-testing-2020-01-29-1034-distro-basic-smithi/

@tchaikov
Copy link
Contributor Author

so one cannot jump to another test node using "root" as "ubuntu".

kchai@teuthology:~$ ssh ubuntu@smithi089
Warning: Permanently added 'smithi089,172.21.15.89' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Wed Dec 11 14:50:52 2019 from 172.21.0.51
[ubuntu@smithi089 ~]$ ssh root@smithi063
Warning: Permanently added 'smithi063,172.21.15.63' (ECDSA) to the list of known hosts.
root@smithi063's password:

[ubuntu@smithi089 ~]$ ssh ubuntu@smithi063
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Wed Dec 11 14:50:52 2019 from 172.21.0.51

one way to fix this is to push the pub key to the host to be added by "ceph orchestrator host add".

@sebastian-philipp
Copy link
Contributor

so one cannot jump to another test node using "root" as "ubuntu".

kchai@teuthology:~$ ssh ubuntu@smithi089
Warning: Permanently added 'smithi089,172.21.15.89' (ECDSA) to the list of known hosts.
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Wed Dec 11 14:50:52 2019 from 172.21.0.51
[ubuntu@smithi089 ~]$ ssh root@smithi063
Warning: Permanently added 'smithi063,172.21.15.63' (ECDSA) to the list of known hosts.
root@smithi063's password:

[ubuntu@smithi089 ~]$ ssh ubuntu@smithi063
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Wed Dec 11 14:50:52 2019 from 172.21.0.51

one way to fix this is to push the pub key to the host to be added by "ceph orchestrator host add".

So, this seems to be the root cause for the host_ls error?

@tchaikov
Copy link
Contributor Author

tchaikov commented Feb 3, 2020

yes. the default user of cephadm is root. and actually qa/tasks/cephadm.py does push the pubkey to all the managed hosts before testing cephadm.

@tchaikov tchaikov force-pushed the wip-43657 branch 2 times, most recently from abb64ea to 988fc9a Compare February 5, 2020 09:39
@sebastian-philipp
Copy link
Contributor

sebastian-philipp commented Feb 5, 2020

My impression is that the deepcopy change doesn't actually fix anything (mainly because this code is executed only in the error case). What about extracting the other two commits into a new PR and then run this through QA?

@tchaikov
Copy link
Contributor Author

tchaikov commented Feb 5, 2020

[20:02:37]  <kefu>	hi SebastianW deepcopy does address the failure to "my_cls = getattr(sys.modules[r_cls.__module__], r_cls.__name__)"
[20:03:05]  <kefu>	because "execnet.gateway_bootstrap" does not exist in sys.modules.
[20:03:34]  <kefu>	if you take a closer look at the description of https://tracker.ceph.com/issues/43657.
[20:03:41]  <kefu>	you will see what i am referencing.

Copy link
Contributor

@sebastian-philipp sebastian-philipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the failed test has been removed in the latest changeset.

yay!

@tchaikov
Copy link
Contributor Author

tchaikov commented Feb 5, 2020

@sebastian-philipp updated the commit message.

@tchaikov
Copy link
Contributor Author

tchaikov commented Feb 5, 2020

jenkins test make check

since rexec module has been removed in python3, we cannot use it
anymore.

Fixes: https://tracker.ceph.com/issues/43657
Signed-off-by: Kefu Chai <kchai@redhat.com>
this test will end with a failure like

```
2020-01-30T18:15:15.870 INFO:tasks.ceph.mgr.x.smithi042.stderr:Warning: Permanently added 'smithi042.front.sepia.ceph.com,172.21.15.42' (ECDSA) to the list of known hosts.
2020-01-30T18:15:15.925 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.932 INFO:tasks.ceph.mgr.x.smithi042.stderr:Permission denied, please try again.
2020-01-30T18:15:15.939 INFO:tasks.ceph.mgr.x.smithi042.stderr:root@smithi042.front.sepia.ceph.com: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
```

because mgr is not able to establish an ssh connection to that host with "root".
please note, the teuthology worker is acting using the "ubuntu" account on the
test node, and by default, "root" does not have its pubkey. and actually
`qa/tasks/cephadm.py` does push the pubkey to all the managed hosts before
testing cephadm.

since `qa/tasks/cephadm.py` is a better test for cephadm, let's just
drop this one.

as suites/rados/cephadm already covers cephadm

Signed-off-by: Kefu Chai <kchai@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
@tchaikov tchaikov merged commit 9805fee into ceph:master Feb 6, 2020
@tchaikov tchaikov deleted the wip-43657 branch February 6, 2020 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants