Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quincy: mon,auth,cephadm: support auth key rotation #48093

Merged
merged 27 commits into from Feb 8, 2023

Conversation

rzarzynski
Copy link
Contributor

This is the quincy backport of the PR #43655.

Backport ticket: https://tracker.ceph.com/issues/57541.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

liewegas and others added 27 commits September 14, 2022 16:15
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit fa8ad55)
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 6139bb4)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit d54c49d)
Add commands to create, clear, or commit pending_key.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 9ed2162)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit cb8c7f6)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit c3562e9)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 39da18b)
Only the async_call got this before.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 07ad8df)
Rotate the live auth key for a running daemon without restarting.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 5cf7944)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 9fc4dc1)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit be9020a)
Also, leave out the caps.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 34ba1a5)
This writes the key to the osd_key in the block device label.

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 0bf78de)
These messages are distracting.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit b723bd0)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 68abdc2)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 2eedae9)
Caveats:
- only works with osd, mds, mgr so far
- sometimes we have to restart the daemon

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 8ca919f)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 4916fd2)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 48f8c8a)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit ae45f1e)
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 772e426)
This is mostly pointless, *except* that after a key rotation it ensures
that the new key is used immediately (and the pending_key is committed by
the mon).

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 84c4562)
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 99d3a59)
Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit c3bed0a)
The issue came in 98b89120321059397798170f7ae2bf7c64e4f4b2.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit 708e5e8)
This is a fixup for: mgr/cephadm: add daemon 'rotate-key' action

The way we rotate mgr's secret require:

1) writing the new pending-key to mgr's file,
2) restarting the mgr via the `mgr fail` mon command.

Unfortunately, we might be doing the first step wrongly.
`_create_daemoan()` is a coroutine (Python's `async def`)
while we don't `wait` for it. IIUC the underlying doc
correctly, this has no effect, and thus the mgr is restarted
with old key.

  "Note that simply calling a coroutine will not schedule
  it to be executed"

See: https://docs.python.org/3/library/asyncio-task.html#id1

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit e9b9641)
this test needs to be updated to account for the
new keyring information being introduced

Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit b2085f0)
@rzarzynski rzarzynski requested review from a team as code owners September 14, 2022 16:49
@adk3798
Copy link
Contributor

adk3798 commented Dec 7, 2022

https://pulpito.ceph.com/adking-2022-12-07_15:13:51-orch:cephadm-wip-adk3-testing-2022-12-05-1317-quincy-distro-default-smithi/

  • 2 failures in test_cephad task due to a master -> main change not having been backported yet (backport is now open but was not
  • 1 odd failure where test_nfs timed out, seemingly because of an issue with deployment of an mds service
2022-12-07T00:00:14.679 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]: 2022-12-07T00:00:14.397+0000 7fe452257700 -1 log_channel(cephadm) log [ERR] : Failed to apply mds.a spec MDSSpec.from_json(yaml.safe_load('''service_type: mds
2022-12-07T00:00:14.679 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]: service_id: a
2022-12-07T00:00:14.680 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]: service_name: mds.a
2022-12-07T00:00:14.680 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]: placement:
2022-12-07T00:00:14.680 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   count: 2
2022-12-07T00:00:14.680 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]: ''')): cephadm exited with an error code: 1, stderr: ERROR: Daemon not found: mds.a.smithi055.ystyid. See `cephadm ls`
2022-12-07T00:00:14.681 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]: Traceback (most recent call last):
2022-12-07T00:00:14.681 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 508, in _apply_all_services
2022-12-07T00:00:14.681 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     if self._apply_service(spec):
2022-12-07T00:00:14.681 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 837, in _apply_service
2022-12-07T00:00:14.682 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     self._remove_daemon(d.name(), d.hostname)
2022-12-07T00:00:14.682 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1245, in _remove_daemon
2022-12-07T00:00:14.682 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     host, name, 'rm-daemon', args))
2022-12-07T00:00:14.683 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/usr/share/ceph/mgr/cephadm/module.py", line 605, in wait_async
2022-12-07T00:00:14.683 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     return self.event_loop.get_result(coro)
2022-12-07T00:00:14.683 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result
2022-12-07T00:00:14.683 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     return asyncio.run_coroutine_threadsafe(coro, self._loop).result()
2022-12-07T00:00:14.684 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in result
2022-12-07T00:00:14.684 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     return self.__get_result()
2022-12-07T00:00:14.684 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2022-12-07T00:00:14.684 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     raise self._exception
2022-12-07T00:00:14.685 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1391, in _run_cephadm
2022-12-07T00:00:14.685 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]:     f'cephadm exited with an error code: {code}, stderr: {err}')
2022-12-07T00:00:14.685 INFO:journalctl@ceph.mgr.a.smithi055.stdout:Dec 07 00:00:14 smithi055 conmon[119979]: orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr: ERROR: Daemon not found: mds.a.smithi055.ystyid. See `cephadm ls`

issue was not reproducible, other two test_nfs tests in the run + 5 reruns of the exact test that failed all passed (https://pulpito.ceph.com/adking-2022-12-07_15:13:51-orch:cephadm-wip-adk3-testing-2022-12-05-1317-quincy-distro-default-smithi/).

Overall, nothing here that should block merging I think. First two are expected and last failure looks unrelated to PRs in the run (and obviously doesn't happen in most runs). Will watch in case I see it again.

@adk3798
Copy link
Contributor

adk3798 commented Dec 7, 2022

jenkins retest this please

@adk3798
Copy link
Contributor

adk3798 commented Dec 7, 2022

@yuriw can you include this PR next time you do a RADOS run on quincy? The orch run looked good, so just needs RADOS now. Thanks.

@rzarzynski
Copy link
Contributor Author

jenkins retest this please

@ljflores
Copy link
Contributor

ljflores commented Feb 6, 2023

Rados suite review: https://pulpito.ceph.com/?branch=wip-yuri7-testing-2023-01-30-1510-quincy

Failures, unrelated:
1. https://tracker.ceph.com/issues/58585
2. https://tracker.ceph.com/issues/58146
3. https://tracker.ceph.com/issues/58560
4. https://tracker.ceph.com/issues/58265

Details:
1. rook: failed to pull kubelet image - Ceph - Orchestrator
2. test_cephadm.sh: Error: Error initializing source docker://quay.ceph.io/ceph-ci/ceph:master - Ceph - Orchestrator
3. test_envlibrados_for_rocksdb.sh failed to subscrib repo - Ceph
4. TestClsRbd.group_snap_list_max_read failure during upgrade/parallel tests - Ceph - RBD

@ljflores ljflores merged commit f693ee2 into ceph:quincy Feb 8, 2023
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants