mgr/cephadm: stable nfs-ganesha ranks; nfs + ingress support #41007

liewegas · 2021-04-23T19:38:55Z

Assign each ganesha daemon a stable rank id (0, 1, ...).
Ensure that no more than one daemon at a time is running on a single rank. For now, this relies on revoking the old daemon's auth key. In the future, we can additionally fence it, but that infrastructure isn't in tree yet.
Add NFS support for ingress: haproxy with consistent hashing
Update mgr/nfs to optionally deploy ingress (when VIP is provided)
Update mgr/nfs cluster ls output to include stable endpoint separate from backend server instances
Make everything default to pool nfs-ganesha and namespace matching the service_id
Update mgr/nfs commands to use rm instead of delete for consistency with the rest of the CLI

liewegas · 2021-04-23T19:40:28Z

@jtlayton Is this the right thing? 49e15c7#diff-b13a6b9b90f10d54533fcaa8b7d9db7f8d24ae39bb52cd611c28c9c39cb429ecR86

This goes into the template https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/templates/services/nfs/ganesha.conf.j2

jtlayton · 2021-04-23T20:38:34Z

@jtlayton Is this the right thing? 49e15c7#diff-b13a6b9b90f10d54533fcaa8b7d9db7f8d24ae39bb52cd611c28c9c39cb429ecR86

This goes into the template https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/templates/services/nfs/ganesha.conf.j2

Yeah, looks good. As long as you have unique nodeids for the daemons that are running, and their successors get the same name you should be fine.

mgfritch

couple of suggestions, but looks good overall!

src/pybind/mgr/cephadm/services/nfs.py

src/pybind/mgr/cephadm/schedule.py

src/pybind/mgr/cephadm/services/nfs.py

src/pybind/mgr/cephadm/inventory.py

sebastian-philipp · 2021-04-27T10:35:03Z

src/pybind/mgr/orchestrator/_interface.py

@@ -943,6 +943,7 @@ def to_json(self) -> dict:
        out: Dict[str, Any] = OrderedDict()
        out['daemon_type'] = self.daemon_type
        out['daemon_id'] = self.daemon_id
+        out['service_name'] = self._service_name


I'd remove osdspec_affinity a few lines below by now.

src/pybind/mgr/cephadm/inventory.py

sebastian-philipp · 2021-04-27T10:53:53Z

src/pybind/mgr/cephadm/schedule.py

+    def assign_rank(self, rank: int, gen: int) -> 'DaemonPlacement':
+        return DaemonPlacement(
+            self.daemon_type,
+            self.hostname,
+            self.network,
+            self.name,
+            self.ip,
+            self.ports,
+            rank,
+            gen,
+        )


Is this the best place to put that code. Would be great to have the rank assignment in the NFSService class.

sebastian-philipp · 2021-04-27T10:53:59Z

src/pybind/mgr/cephadm/schedule.py

+                d.rank is not None,            # ranked first, then non-ranked
+                d.rank,                        # low ranks
+                0 - (d.rank_generation or 0),  # newer generations first


Feels wrong to me. We're putting a lot of per-service logic into the scheduler here. Can we keep it generic in the scheduler?

sebastian-philipp · 2021-04-27T10:57:49Z

src/pybind/mgr/cephadm/tests/test_scheduling.py

    expected: List[str]
    expected_add: List[str]
    expected_remove: List[DaemonDescription]


-@pytest.mark.parametrize("service_type,placement,hosts,daemons,expected,expected_add,expected_remove",
+@pytest.mark.parametrize("service_type,placement,hosts,daemons,rank_map,post_rank_map,expected,expected_add,expected_remove",


Bad feeling about that. This is going to make the test matrix explode.

for the non-ranked test cases, these are just None, None. I thought about making a separate test but that would just duplicate the test code... or put it in a helper that just passed None, None for the older test cases.

I'm concerned that we essentially have to test

(different number of hosts) * (different existing placement on those hosts) * (different networks) * (different filter_new_host functions) * (allow_colo = yes and allow_colo = no) * (primary_daemon_type) * (per_host_daemon_type) * (different values for the rank_map).

And that scares me.

varshar16

Please update the pool name in nfs tests here https://github.com/ceph/ceph/blob/master/qa/tasks/cephfs/test_nfs.py

liewegas · 2021-05-04T17:13:40Z

@varshar16 I went back to nfs-ganesha for the default pool name.. still undecided whether we should change this to .nfs-ganesha or .nfs

src/pybind/mgr/cephadm/services/nfs.py

varshar16

If we are going ahead with option 1 then please update the doc and create a tracker ticket for the new update interface.

This command is very awkward to implement unless all service spec fields are always required. That will soon mean both the placement *and* virtual_ip (if any), making it much less useful for a human to make use of. Instead, let them update yaml, or adjust the nfs and/or ingress specs directly. I don't think this command is needed. Signed-off-by: Sage Weil <sage@newdream.net>

For 'nfs cluster create', optionally take a virtual_ip to deploy ingress. Signed-off-by: Sage Weil <sage@newdream.net>

- include the virtual_ip and port at top level - move backend server list into a sub-item - include (haproxy) monitoring port Signed-off-by: Sage Weil <sage@newdream.net>

- leave off pool/ns, since they should almost never be necessary. - add port Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

Apply nfs default pool (currently 'nfs-ganesha'), and default the namespace to the service_id. There is no practical reason for users to ever need to change this, and requiring them to provide this informaiton at config/apply time just complicates life. Signed-off-by: Sage Weil <sage@newdream.net>

haproxy's container image tells docker|podman to send SIGUSR1 for a "clean" shutdown. For NFS, the connections never close, so we will always hit the podman|docker 10s timeout and get a SIGKILL. That, in turn, causes haproxy to exit with 143, and puts the systemd unit in a failed state. This highlights a general problem(?) with stopping containers: if they don't do it quickly then we'll end up in this error state. We don't directly address that here. Avoid this problem by always stopping containers with SIGTERM. In the haproxy case, that means an immediate shutdown (no graceful drain of open connections). In theory we could do this only for haproxy with NFS, but we can easily imagine RGW connections that don't close in 10s either, and we don't want containers exiting in error state--we just want the proxy to stop quickly. Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

It is likely that the rook/k8s variation of ingress will not take a virtual_ip argument. We want to make sure that ingress yes/no can be specified independent of the virtual_ip. Signed-off-by: Sage Weil <sage@newdream.net>

Still missing a full client mount test, though! Signed-off-by: Sage Weil <sage@newdream.net>

Signed-off-by: Sage Weil <sage@newdream.net>

liewegas · 2021-05-25T14:19:01Z

If we are going ahead with option 1 then please update the doc and create a tracker ticket for the new update interface.

https://tracker.ceph.com/issues/50972
#41534

varshar16

Thanks @liewegas for addressing my comments.

liewegas requested a review from a team as a code owner April 23, 2021 19:38

github-actions bot added cephadm orchestrator pybind labels Apr 23, 2021

liewegas requested review from mgfritch and jtlayton April 23, 2021 19:39

liewegas force-pushed the cephadm-ha-nfs branch 2 times, most recently from 6ca36c6 to 5dc6ca2 Compare April 26, 2021 19:53

mgfritch reviewed Apr 27, 2021

View reviewed changes

src/pybind/mgr/cephadm/services/nfs.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/schedule.py Show resolved Hide resolved

src/pybind/mgr/cephadm/services/nfs.py Outdated Show resolved Hide resolved

src/pybind/mgr/cephadm/services/nfs.py Outdated Show resolved Hide resolved

sebastian-philipp reviewed Apr 27, 2021

View reviewed changes

liewegas force-pushed the cephadm-ha-nfs branch from 80fd72a to b10a60d Compare April 30, 2021 15:23

liewegas added the wip-sage-testing label Apr 30, 2021

github-actions bot added the core label Apr 30, 2021

liewegas changed the title ~~mgr/cephadm: stable nfs-ganesha ranks~~ mgr/cephadm: stable nfs-ganesha ranks; nfs + ingress support Apr 30, 2021

mgfritch mentioned this pull request Apr 30, 2021

mgr/cephadm: allow colocation of nfs daemons #40543

Closed

3 tasks

github-actions bot added the tests label May 2, 2021

liewegas force-pushed the cephadm-ha-nfs branch from 7a66404 to c3e568f Compare May 3, 2021 12:38

varshar16 suggested changes May 4, 2021

View reviewed changes

liewegas removed the wip-sage-testing label May 4, 2021

liewegas force-pushed the cephadm-ha-nfs branch from c3e568f to e3994fe Compare May 4, 2021 17:10

liewegas requested a review from varshar16 May 4, 2021 17:10

github-actions bot added cephfs Ceph File System documentation labels May 4, 2021

liewegas added the wip-sage3-testing label May 4, 2021

varshar16 reviewed May 5, 2021

View reviewed changes

src/pybind/mgr/cephadm/services/nfs.py Outdated Show resolved Hide resolved

varshar16 reviewed May 5, 2021

View reviewed changes

src/pybind/mgr/cephadm/services/nfs.py Show resolved Hide resolved

varshar16 reviewed May 5, 2021

View reviewed changes

src/pybind/mgr/cephadm/services/nfs.py Outdated Show resolved Hide resolved

varshar16 suggested changes May 25, 2021

View reviewed changes

liewegas force-pushed the cephadm-ha-nfs branch 2 times, most recently from d32996b to c651dd6 Compare May 25, 2021 13:31

liewegas added 19 commits May 25, 2021 10:15

mgr/nfs: take optional virtual_ip for deploying ingress

26cb491

For 'nfs cluster create', optionally take a virtual_ip to deploy ingress. Signed-off-by: Sage Weil <sage@newdream.net>

mgr/nfs: change 'nfs cluster info'

00aac5d

- include the virtual_ip and port at top level - move backend server list into a sub-item - include (haproxy) monitoring port Signed-off-by: Sage Weil <sage@newdream.net>

doc/cephadm/nfs: update

0e0f892

- leave off pool/ns, since they should almost never be necessary. - add port Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm: nfs: create pool if it doesn't yet exist

548f168

Signed-off-by: Sage Weil <sage@newdream.net>

common/options: enable nfs module for new clusters

8085a7e

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm: fix logging of config/placement errors

14af470

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm: ingress: fix log msg

d72d1b5

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/cephadm: adjust debug output for device refresh

feb741d

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/nfs: take --ingress argument to 'nfs cluster create'

81f0b69

It is likely that the rook/k8s variation of ingress will not take a virtual_ip argument. We want to make sure that ingress yes/no can be specified independent of the virtual_ip. Signed-off-by: Sage Weil <sage@newdream.net>

qa/suites/rados/cephadm/smoke-roleless: test nfs, nfs + ingress

e2db373

Still missing a full client mount test, though! Signed-off-by: Sage Weil <sage@newdream.net>

doc/cephadm/nfs: document nfs+ingress

e51c3dc

Signed-off-by: Sage Weil <sage@newdream.net>

PendingReleaseNotes: note breaking CLI changes

0ca1400

Signed-off-by: Sage Weil <sage@newdream.net>

PendingReleaseNotes: clarify deprecated

fe03c85

Signed-off-by: Sage Weil <sage@newdream.net>

mgr/nfs: move ingress vs virtual_ip check to cluster interface

f2e78bf

Signed-off-by: Sage Weil <sage@newdream.net>

doc/cephfs/fs-nfs-exports: document --ingress --virtual-ip

ada8560

Signed-off-by: Sage Weil <sage@newdream.net>

qa/tasks/cephfs/test_nfs: fix info test

218eec9

Signed-off-by: Sage Weil <sage@newdream.net>

liewegas force-pushed the cephadm-ha-nfs branch from c651dd6 to 218eec9 Compare May 25, 2021 14:16

varshar16 approved these changes May 25, 2021

View reviewed changes

varshar16 added the needs-qa label May 25, 2021

liewegas removed the wip-sage-testing label May 25, 2021

liewegas merged commit 9ab9cc2 into ceph:master May 25, 2021

liewegas mentioned this pull request Jun 4, 2021

pacific: cephadm: june batch 1 #41684

Merged

rkachach mentioned this pull request Jun 29, 2022

mgr/cephadm: fixing scheduler consistent hashing #46892

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mgr/cephadm: stable nfs-ganesha ranks; nfs + ingress support #41007

mgr/cephadm: stable nfs-ganesha ranks; nfs + ingress support #41007

liewegas commented Apr 23, 2021 •

edited

liewegas commented Apr 23, 2021

jtlayton commented Apr 23, 2021

mgfritch left a comment

sebastian-philipp Apr 27, 2021

sebastian-philipp Apr 27, 2021

sebastian-philipp Apr 27, 2021

sebastian-philipp Apr 27, 2021

liewegas Apr 27, 2021

sebastian-philipp Apr 28, 2021

varshar16 left a comment

liewegas commented May 4, 2021

varshar16 left a comment •

edited

liewegas commented May 25, 2021

varshar16 left a comment

mgr/cephadm: stable nfs-ganesha ranks; nfs + ingress support #41007

mgr/cephadm: stable nfs-ganesha ranks; nfs + ingress support #41007

Conversation

liewegas commented Apr 23, 2021 • edited

liewegas commented Apr 23, 2021

jtlayton commented Apr 23, 2021

mgfritch left a comment

Choose a reason for hiding this comment

sebastian-philipp Apr 27, 2021

Choose a reason for hiding this comment

sebastian-philipp Apr 27, 2021

Choose a reason for hiding this comment

sebastian-philipp Apr 27, 2021

Choose a reason for hiding this comment

sebastian-philipp Apr 27, 2021

Choose a reason for hiding this comment

liewegas Apr 27, 2021

Choose a reason for hiding this comment

sebastian-philipp Apr 28, 2021

Choose a reason for hiding this comment

varshar16 left a comment

Choose a reason for hiding this comment

liewegas commented May 4, 2021

varshar16 left a comment • edited

Choose a reason for hiding this comment

liewegas commented May 25, 2021

varshar16 left a comment

Choose a reason for hiding this comment

liewegas commented Apr 23, 2021 •

edited

varshar16 left a comment •

edited