-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tridentctl suspend backend #558
Comments
@scaleoutsean can you elaborate on the options that you would want to pause when a backend is suspended?
Can you also provide some context on scenarios where this would be important? |
@balaramesh I refrained from attempting to specify what should be paused as I don't know what (if anything) makes sense. The main use case is storage array failure or scheduled downtime (floor, power, rack, equipment maintenance), but I think it could be used for unplanned storage downtime as well. Based on that, my ask would be to pause volume creation/deletion, but maybe others have some ideas or use cases that I don't. A big On/Off button (all three items) seems even easier to understand and use, but I wonder if in the case of unplanned downtime flipping the downed backend to Off would mean active pods would be stuck (unable to detach)? Volume create/delete seems safe for both use cases and should succeed whether the backend management and data endpoints are reachable or not. I hope we could apply advanced Trident features (virtual pools, topo-aware CSI, etc.) to differentiate between multiple homogeneous backends, but that would seem complex. For example, it may be challenging or even unsupported to remove arrays from Trident configuration, and one storage backend can serve data via multiple "AZs" (on-prem or in the cloud) which would make the selection and maintenance of Trident configuration more demanding. I think there's a use case for users who would prefer a simpler way (backend On/Off), but I'd like to consider using advanced Trident configuration options if that was viable. |
One potential use case I see is around tech refresh/migration. Assume you already know that a certain backend system will be removed soon. You'd prefer to not have any new volumes created on it, in fact you might be in the process of migration everything off that backend and Trident would interfere constantly adding new volumes. Suspending volume creation only would be the ask for this use case, e.g. Trident simply wouldn't consider this backend as suitable during the provisioning process. I think this would already work today by deleting the backend, as Trident keeps it around as long as there are still volume on it. Deletion/snapshots/clones for existing volumes would continue to work. IIRC, for (un)planned downtime Trident should already set the backend to offline once it finds out that it is unreachable. I don't think there is a periodic health check for backends? That might be useful to have... (with appropriate re-tries and timeouts). As long as a volume already exist, I'd say that attach/detach, snapshots,... should continue to work. Everything else would be very confusing for the user. |
Regarding backend health-checks, that's related and of interest to me in the context of this request. I'm not sure what exactly is checked (data, mgmt, both, neither?) so I'm waiting for this issue to make progress. Depending on how it plays out, maybe it'd impact our preferences for "suspend" behavior. |
This issue is fixed with 9a541c4 and is included in Trident release 23.10 |
Describe the solution you'd like
Currently there's no convenient way to pause the ability to create new PVs on a backend.
A workaround is to remove, and later add, the backend that needs maintenance or scheduled downtime. That, however, may require a large scale volume import.
Describe alternatives you've considered
delete backend
+create backend
+import volume
, which can translate to many operations in the last step.Additional context
One concern I'd have would be about the behavior of existing volumes. If that is tricky to solve, how about making
suspend backend
possible only when there are no Bound volumes on the backend? The ability to suspend a backend would still be valuable because existing backendUUID and PVC IDs would remain the same.The text was updated successfully, but these errors were encountered: