mgr/cephadm: nvme-of follow up work #52691

adk3798 · 2023-07-28T21:35:46Z

The first piece of the nvme-of deployment work in cephadm was already merged in main. This PR is a follow up that fixes a handful of issues found in the original implementation.

This depends on having an nvme-of container with changes being proposed in ceph/ceph-nvmeof#166. For now, I have a temporary commit that points the default nvme-of image to an image on my personal quay with the changes present. That should be removed before merging (signed-off-by will fail as long as that commit is present.

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

phlogistonjohn

Generally LGTM, happy to approve once PR is out of WIP state. A few minor feedback items follow.

src/pybind/mgr/cephadm/services/nvmeof.py

src/pybind/mgr/cephadm/serve.py

baum · 2023-08-02T08:51:51Z

qa/suites/orch/cephadm/smoke-roleless/2-services/nvmeof.yaml

+- cephadm.shell:
+    host.a:
+      - ceph osd pool create foo
+      - rbd pool init foo


Just curious, what is the difference between the above and ceph osd pool application enable foo rbd?

Very little, if anything at all. One "initializes" a pool for rbd use and the other lets you "enable" the use of rbd on the pool (from their descriptions). I think we can use them interchangeably in this context.

phlogistonjohn · 2023-08-03T18:22:19Z

The current changes LGTM, modulo the temporary container image hack. Once there's a usable upstream build with the change and you drop that temporary commit please ping me for final approval!

qa/suites/orch/cephadm/upgrade/3-upgrade/simple.yaml

This is going to be used as the rados_id to be set when connecting to the cluster using the keyring we generate for the nvmeof daemon. The python librados library defaults the name to "client.admin" and so if we don't provide a name or rados_id, we'll only be able to use nvmeof with the "client.admin" keyring Signed-off-by: Adam King <adking@redhat.com>

Before, we were just using the client.admin keyring as a temporary workaround while we figured out how to get the keyring to work. We should swap over to using the keyring we actually generated for the nvmeof daemon. Signed-off-by: Adam King <adking@redhat.com>

The ok-to-stop function works for certain daemons by checking if there are at least a certain number (typically 1) daemon(s) that are actually running and saying it's not ok-to-stop if if that won't be true after the removals. This case breaks down when all the daemons are in error state, making it so cephadm will refuse to remove a set of daemons that aren't even working because they're not "ok to stop". Since ok-to-stop works in a yes or no fashion, something like this where we want to be willing to remove a certain subset (or potentially all currently deployed) daemons it's easier to keep this logic as part of applying the service Signed-off-by: Adam King <adking@redhat.com>

Similar to what is done for iscsi, basic deployment test to make sure we can deploy the daemon and it comes up in running state with no issue Signed-off-by: Adam King <adking@redhat.com>

Rather than giving full admin privileges, try to be a bit more strict by limiting it to profile rbd mon caps and full OSD privileges for rbd tagged pools. I also wanted to include an OSD cap like allow all pool="*" object_prefix "nvmeof.state" but this caused a failure in the nvme-of daemon RADOS permission error (Failed to operate write op for oid nvmeof.None.state) Signed-off-by: Adam King <adking@redhat.com>

This is the IP the nvmeof daemon will bind to, so it should be the IP of the host we're deploying the nvmeof daemon on, not the IP of the active mgr Signed-off-by: Adam King <adking@redhat.com>

"igw_id" was leftover from the nvmeof implementation being taken heavily from the iscsi implementation. "igw" means nothing in this context, so we can change the name. Signed-off-by: Adam King <adking@redhat.com>

0.0.2 includes a patch that allows the nvmeof daemon to use non-admin keyrings, so we should use it over 0.0.1 Signed-off-by: Adam King <adking@redhat.com>

adk3798 · 2023-08-08T14:55:18Z

The current changes LGTM, modulo the temporary container image hack. Once there's a usable upstream build with the change and you drop that temporary commit please ping me for final approval!

this is no longer pointing towards a temp image. An official nvmeof image 0.0.2 was made with the necessary changes (ceph/ceph-nvmeof#172) and we now point to that.

phlogistonjohn

LGTM

adk3798 · 2023-08-14T14:25:54Z

jenkins retest this please

adk3798 · 2023-08-15T19:19:18Z

https://pulpito.ceph.com/adking-2023-08-15_13:44:09-orch:cephadm-wip-adk-testing-2023-08-14-1902-distro-default-smithi/

7 failures, 3 dead (as of writing this 1 dead and 2 running but the 2 running will time out) jobs

4 failures were mds_upgrade_sequence upgrading from reef -> main. The upgrade from reef doesn't work yet as we need to implement something that can pull the latest reef build of the cephadm binary
2 failures were https://tracker.ceph.com/issues/62084, known issue
1 failure was https://tracker.ceph.com/issues/59704, known issue
2 dead jobs were mds upgrade sequence starting from quincy. They time out doing a umount command. These were broken the same way before when upgrading from pacific, and we wanted to see if upgrading from quincy would fix it somehow but it hasn't.
1 dead job was infra/teuthology issue: Error reimaging machines: Expected smithi172's OS to be centos 8 but found rhel 8.6

Overall, nothing related to the mds upgrade sequence or jaeger-tracing can be merged, but nothing here to block merging most other things in the run.

adk3798 requested a review from a team as a code owner July 28, 2023 21:35

github-actions bot added cephadm pybind labels Jul 28, 2023

phlogistonjohn reviewed Jul 29, 2023

View reviewed changes

src/pybind/mgr/cephadm/services/nvmeof.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/serve.py Show resolved Hide resolved

baum reviewed Aug 2, 2023

View reviewed changes

adk3798 force-pushed the nvme-of-cleanup branch from c08dff0 to b07e62f Compare August 2, 2023 19:57

baum mentioned this pull request Aug 3, 2023

Deploy the nvme-of gateway with Cephadm ceph/ceph-nvmeof#62

Closed

adk3798 added the wip-adk-testing label Aug 4, 2023

adk3798 commented Aug 8, 2023

View reviewed changes

qa/suites/orch/cephadm/upgrade/3-upgrade/simple.yaml Outdated Show resolved Hide resolved

adk3798 added 8 commits August 8, 2023 10:49

qa/cephadm: add basic nvmeof deployment test

f97f242

Similar to what is done for iscsi, basic deployment test to make sure we can deploy the daemon and it comes up in running state with no issue Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: configure nvmeof with ip of target host

25a9f5a

This is the IP the nvmeof daemon will bind to, so it should be the IP of the host we're deploying the nvmeof daemon on, not the IP of the active mgr Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: rename "igw_id" to "nvmeof_gw_id"

eac3ba4

"igw_id" was leftover from the nvmeof implementation being taken heavily from the iscsi implementation. "igw" means nothing in this context, so we can change the name. Signed-off-by: Adam King <adking@redhat.com>

mgr/cephadm: bump nvmeof image version to 0.0.2

3f0bece

0.0.2 includes a patch that allows the nvmeof daemon to use non-admin keyrings, so we should use it over 0.0.1 Signed-off-by: Adam King <adking@redhat.com>

adk3798 force-pushed the nvme-of-cleanup branch from b07e62f to 3f0bece Compare August 8, 2023 14:54

phlogistonjohn approved these changes Aug 8, 2023

View reviewed changes

adk3798 merged commit 307cc75 into ceph:main Aug 15, 2023
10 of 12 checks passed

adk3798 mentioned this pull request Aug 15, 2023

reef: cephadm: support for nvmeof #53013

Closed

14 tasks

adk3798 mentioned this pull request Aug 23, 2023

reef: cephadm batch backport Aug 23 #53124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr/cephadm: nvme-of follow up work #52691

mgr/cephadm: nvme-of follow up work #52691

adk3798 commented Jul 28, 2023

phlogistonjohn left a comment

baum Aug 2, 2023

adk3798 Aug 2, 2023

phlogistonjohn commented Aug 3, 2023

adk3798 commented Aug 8, 2023

phlogistonjohn left a comment

adk3798 commented Aug 14, 2023

adk3798 commented Aug 15, 2023

mgr/cephadm: nvme-of follow up work #52691

mgr/cephadm: nvme-of follow up work #52691

Conversation

adk3798 commented Jul 28, 2023

Contribution Guidelines

Checklist

phlogistonjohn left a comment

Choose a reason for hiding this comment

baum Aug 2, 2023

Choose a reason for hiding this comment

adk3798 Aug 2, 2023

Choose a reason for hiding this comment

phlogistonjohn commented Aug 3, 2023

adk3798 commented Aug 8, 2023

phlogistonjohn left a comment

Choose a reason for hiding this comment

adk3798 commented Aug 14, 2023

adk3798 commented Aug 15, 2023