-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: RPC call to trigger a re-discovery of a deleted raid1 bdev #3341
Comments
I've been wondering how to add this functionality and I think that a good solution would be to introduce two new RPCs: With this, we could easily enable two other functionalities:
|
Does raid_bdev_stop would keep a volume in a "claimed" state ? Also, if I detach a remote NVME part of this stopped raid, is there a risk that it would remove the base_bdev in the same time (I remember this happened if I removed the subsystem, maybe it won't do that if it's just a detach nvme) ? (Until now, I was always taking care that the raid was removed to safely remove attached base_bdev). |
|
If this would allow to disassemble all the base_bdev of a raid until the last one to be able to eventually expose one of the base_bdev with nvmeof to assemble the raid in an other node. Then, it should works for my use cases. However, let's imagine, the raid1 got assembled in an other node (nvmeof) and a new base_bdev got added then we for x raison, with move back the raid assembly in the previous node that already have knowledge of the raid1 that existed before. May it create issue and do not know/discover about the new base_bdev ? Regarding the other functionalities it may allow:
|
I'm not sure I understand but I think that the situation you described is comparable to this:
That's because the superblock has a sequence number, incremented on every update, and the superblock version that has the highest sequence number is considered as the correct one. There are tests in
It's not possible now, there is no direct way to force a raid to start without all the expected base bdevs. We definitely need to be able to do this somehow.
Yes, exactly. I'm not sure yet if this should be the default behavior or an option for |
If the superblock is always using the last updated seq number, that should be good then ! Thanks
I don't handled this situation case in my csi yet but I will probably be happy to have it.. In case of we would lose a node for X reason, I could start the raid and replace the lost replicas. That mean, I would need to be able to remove the lost replicas with Something magical would be that.. once a superblock is removed, being able to create a new raid1 and keep the existing data would be.. amazing for cloning purpose but I do not immediate application for this.. but I like the idea it could be possible if the need arrive. But maybe I should keep that for a future feature request ! The previous point of being able to start a raid with a missing replicas seems more important :p. |
Suggestion
If we have a raid1 bdev composed of the following base_bdevs on host 1:
If I remove the remote base_bdev from the raid (bdev_raid_remove_base_bdev) then delete the raid on host 1, the raid can't be re-discovered without restarting spdk and running a bdev_examine. We may miss one rpc call to force a rediscover if the raid1 have only one disk left.
The current "workaround", if the raid1 is having only one remaining device, we need to attach it to an other spdk host (nvmeof) and expose it from there to trigger the discover and use it without restarting the spdk node it is currently hosted in.
The reason we are doing such operation is in the context of a kubernetes CSI, when a volume/raid is unpublish (not used by any POD anymore), we are deleting the raid and detaching all remote lvol part of the replication when the PVC/volume is not used. We may have some situation where we may have only one base_bdev on the raid1 temporarily or even on purpose. Example:
In both use cases, I could make the my kubernetes CSI smarter and do not remove the raid1 if only one base_bdev is currently active. But I still think that having a rpc call to trigger the raid discover would be handy if the raid1 got deleted by mistake, it would avoid to restart SPDK and impact other volumes/usage/traffic in such situation.
Current Behavior
Once a raid1 bdev has been deleted, there is no way or existing rpc call that would allow to rediscover the raid1 without restarting SPDK_tgt.
Possible Solution
A rpc call would allow to re-trigger the discovery of the raid and make the raid bdev available.
Context (Environment including OS version, SPDK version, etc.)
Reference discussion: #3306 (comment)
SPDK version: https://review.spdk.io/gerrit/c/spdk/spdk/+/22641
Thank you !
The text was updated successfully, but these errors were encountered: