New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/cephadm: stable nfs-ganesha ranks; nfs + ingress support #41007
Conversation
@jtlayton Is this the right thing? 49e15c7#diff-b13a6b9b90f10d54533fcaa8b7d9db7f8d24ae39bb52cd611c28c9c39cb429ecR86 This goes into the template https://github.com/ceph/ceph/blob/master/src/pybind/mgr/cephadm/templates/services/nfs/ganesha.conf.j2 |
Yeah, looks good. As long as you have unique nodeids for the daemons that are running, and their successors get the same name you should be fine. |
6ca36c6
to
5dc6ca2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of suggestions, but looks good overall!
@@ -943,6 +943,7 @@ def to_json(self) -> dict: | |||
out: Dict[str, Any] = OrderedDict() | |||
out['daemon_type'] = self.daemon_type | |||
out['daemon_id'] = self.daemon_id | |||
out['service_name'] = self._service_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove osdspec_affinity
a few lines below by now.
def assign_rank(self, rank: int, gen: int) -> 'DaemonPlacement': | ||
return DaemonPlacement( | ||
self.daemon_type, | ||
self.hostname, | ||
self.network, | ||
self.name, | ||
self.ip, | ||
self.ports, | ||
rank, | ||
gen, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the best place to put that code. Would be great to have the rank assignment in the NFSService
class.
d.rank is not None, # ranked first, then non-ranked | ||
d.rank, # low ranks | ||
0 - (d.rank_generation or 0), # newer generations first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels wrong to me. We're putting a lot of per-service logic into the scheduler here. Can we keep it generic in the scheduler?
expected: List[str] | ||
expected_add: List[str] | ||
expected_remove: List[DaemonDescription] | ||
|
||
|
||
@pytest.mark.parametrize("service_type,placement,hosts,daemons,expected,expected_add,expected_remove", | ||
@pytest.mark.parametrize("service_type,placement,hosts,daemons,rank_map,post_rank_map,expected,expected_add,expected_remove", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad feeling about that. This is going to make the test matrix explode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the non-ranked test cases, these are just None, None. I thought about making a separate test but that would just duplicate the test code... or put it in a helper that just passed None, None for the older test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that we essentially have to test
(different number of hosts) * (different existing placement on those hosts) * (different networks) * (different filter_new_host functions) * (allow_colo = yes and allow_colo = no) * (primary_daemon_type) * (per_host_daemon_type) * (different values for the rank_map).
And that scares me.
80fd72a
to
b10a60d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the pool name in nfs tests here https://github.com/ceph/ceph/blob/master/qa/tasks/cephfs/test_nfs.py
@varshar16 I went back to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d32996b
to
c651dd6
Compare
This command is very awkward to implement unless all service spec fields are always required. That will soon mean both the placement *and* virtual_ip (if any), making it much less useful for a human to make use of. Instead, let them update yaml, or adjust the nfs and/or ingress specs directly. I don't think this command is needed. Signed-off-by: Sage Weil <sage@newdream.net>
For 'nfs cluster create', optionally take a virtual_ip to deploy ingress. Signed-off-by: Sage Weil <sage@newdream.net>
- include the virtual_ip and port at top level - move backend server list into a sub-item - include (haproxy) monitoring port Signed-off-by: Sage Weil <sage@newdream.net>
- leave off pool/ns, since they should almost never be necessary. - add port Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Apply nfs default pool (currently 'nfs-ganesha'), and default the namespace to the service_id. There is no practical reason for users to ever need to change this, and requiring them to provide this informaiton at config/apply time just complicates life. Signed-off-by: Sage Weil <sage@newdream.net>
haproxy's container image tells docker|podman to send SIGUSR1 for a "clean" shutdown. For NFS, the connections never close, so we will always hit the podman|docker 10s timeout and get a SIGKILL. That, in turn, causes haproxy to exit with 143, and puts the systemd unit in a failed state. This highlights a general problem(?) with stopping containers: if they don't do it quickly then we'll end up in this error state. We don't directly address that here. Avoid this problem by always stopping containers with SIGTERM. In the haproxy case, that means an immediate shutdown (no graceful drain of open connections). In theory we could do this only for haproxy with NFS, but we can easily imagine RGW connections that don't close in 10s either, and we don't want containers exiting in error state--we just want the proxy to stop quickly. Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
It is likely that the rook/k8s variation of ingress will not take a virtual_ip argument. We want to make sure that ingress yes/no can be specified independent of the virtual_ip. Signed-off-by: Sage Weil <sage@newdream.net>
Still missing a full client mount test, though! Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liewegas for addressing my comments.
nfs-ganesha
and namespace matching the service_idrm
instead ofdelete
for consistency with the rest of the CLI