Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find_namespace_and_bdev_name #449

Closed
baum opened this issue Feb 19, 2024 · 9 comments · Fixed by #787
Closed

find_namespace_and_bdev_name #449

baum opened this issue Feb 19, 2024 · 9 comments · Fixed by #787
Assignees

Comments

@baum
Copy link
Collaborator

baum commented Feb 19, 2024

Background

find_namespace_and_bdev_name is used to associate between existing NVME namespace and its SPDK bdev. Currently implemented in terms of SPDK JSON RPC nvmf_get_subsystems sometimes holding omap_lock. The function is used in the following contexts:

  1. namespace_change_load_balancing_group_safe ( with omap lock )
  2. namespace_get_io_stats ( with rpc lock )
  3. namespace_set_qos_limits_safe ( with rpc lock )
  4. namespace_resize ( gRPC function with NO rpc lock )
  5. namespace_delete_safe ( with OMAP lock )

Suggestion

  1. remove find_namespace_and_bdev_name function, implement in terms of subsystem_nsid_bdev dictionary
  2. namespace_change_load_balancing_group could be removed from gRPC protobuf interface, CLI could be implemented by list the namespace, remove the namespace and recreate it with a new anagrp.
  3. namespace methods could use nsid as identifier which should be defined mandatory, dropping optional uuid (protobuf messages: namespace_resize_req, namespace_get_io_stats_req, namespace_set_qos_req and namespace_delete_req)
@caroav
Copy link
Collaborator

caroav commented Feb 19, 2024

@baum I have few comments:

  1. About "subsystem_nsid_bdev" in grpc.py. Isn't it strange that we are not using the local state for such things? Can't we refer to the local state as our single source of truth? Why adding another place to manage that?
  2. namespace_change_load_balancing_group - you are right that we could use ns delete and then ns add with different grp id. But that is less convenient to the end user. He also need to see that the uuid will be the same, so it will not confuse the hosts. I prefer to leave this CLI method.
  3. I tend to agree that uuid could be dropped from the interface.

@baum
Copy link
Collaborator Author

baum commented Feb 20, 2024

@baum I have few comments:

  1. About "subsystem_nsid_bdev" in grpc.py. Isn't it strange that we are not using the local state for such things? Can't we refer to the local state as our single source of truth? Why adding another place to manage that?

subsystem_nsid_bdev dictionary represents the SPDK runtime state, as oppose to the OMAP prescribed/desired configuration. The invariant is that subsystem_nsid_bdev is updated when SPDK RPC confirmed runtime state change, so we're sure that the runtime namespace - bdev association and the underlying bdev actually exist.

My intention is code simplification:

bdev = self.subsystem_nsid_bdev[nqn][nsid]

seems easier than find_namespace_and_bdev_name

WDYT?

  1. namespace_change_load_balancing_group - you are right that we could use ns delete and then ns add with different grp id. But that is less convenient to the end user. He also need to see that the uuid will be the same, so it will not confuse the hosts. I prefer to leave this CLI method.

👍

  1. I tend to agree that uuid could be dropped from the interface.

👍

@caroav
Copy link
Collaborator

caroav commented Feb 21, 2024

I think we can do it. Currently I see that the find_namespace_and_bdev_name is also getting uuid as an argument. If we do this change, it means that the lookup will always be by uuid?

@gbregman do you see any reason not to make this change? If we already have the information cached (after recent changes), then we can save these extra calls?

@gbregman
Copy link
Contributor

@caroav , I don't see any reason to do this change.

@caroav
Copy link
Collaborator

caroav commented Feb 21, 2024

@gbregman every call to find_ns method is a call to the SPDK get_subsystems, while this change saves this call. That's the reason that I see.

@gbregman gbregman self-assigned this Feb 22, 2024
@gbregman
Copy link
Contributor

The call in change_load_balancing group is just in case the user didn't pass the namespace NSID and wanted to use UUID instead. We can drop it and force the usage of NSID but I don't see the real benefit from this. In all other places we use this function we need the bdev details as we need to pass them to SPDK. This is not something we currently have in the local state. I don't think it's worth it to write new code to take care of that locally as these CLIs are not common and not time critical. Calling the SPDK is not such a big deal as it uses a local socket and doesn't go out to the network. Holding the OMAP lock is less of a problem then holding the RPC lock and we do that when we call get_subsystems every 2 seconds for the monitor client. This is a much bigger issue.

@caroav
Copy link
Collaborator

caroav commented Jul 10, 2024

Why holding the OMAP lock is less than holding the RPC lock?

@gbregman
Copy link
Contributor

The only place we call the find function with the OMAP lock taken is in namespace delete. The get stats, get/set QOS and namespace resize do not change the OMAP so they don't take the OMAP lock. In principle we can also call find in change load balancing group but this is only if the user chose to call the command without NSID.

@caroav
Copy link
Collaborator

caroav commented Jul 10, 2024

@baum @gbregman IMHO the important thing here is mainly to take care about namespace_get_io_stats which is being called very frequently. The other functions mentioned above are called very rarely IMO.

gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Jul 31, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Jul 31, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Jul 31, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 1, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 5, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 5, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 5, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 5, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 5, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 5, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
gbregman added a commit to gbregman/ceph-nvmeof that referenced this issue Aug 6, 2024
Fixes ceph#449

Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
3 participants