Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tridentctl suspend backend #558

Closed
scaleoutsean opened this issue Mar 27, 2021 · 5 comments
Closed

tridentctl suspend backend #558

scaleoutsean opened this issue Mar 27, 2021 · 5 comments

Comments

@scaleoutsean
Copy link
Contributor

Describe the solution you'd like

Currently there's no convenient way to pause the ability to create new PVs on a backend.

A workaround is to remove, and later add, the backend that needs maintenance or scheduled downtime. That, however, may require a large scale volume import.

Describe alternatives you've considered

delete backend + create backend + import volume, which can translate to many operations in the last step.

Additional context

One concern I'd have would be about the behavior of existing volumes. If that is tricky to solve, how about making suspend backend possible only when there are no Bound volumes on the backend? The ability to suspend a backend would still be valuable because existing backendUUID and PVC IDs would remain the same.

@scaleoutsean scaleoutsean changed the title tridentctl backend suspend tridentctl suspend backend Mar 27, 2021
@balaramesh
Copy link
Contributor

@scaleoutsean can you elaborate on the options that you would want to pause when a backend is suspended?

  • Volume creation/deletion?
  • Resize/snapshots?
  • Volume detach/attach?

Can you also provide some context on scenarios where this would be important?

@scaleoutsean
Copy link
Contributor Author

@balaramesh I refrained from attempting to specify what should be paused as I don't know what (if anything) makes sense.

The main use case is storage array failure or scheduled downtime (floor, power, rack, equipment maintenance), but I think it could be used for unplanned storage downtime as well.

Based on that, my ask would be to pause volume creation/deletion, but maybe others have some ideas or use cases that I don't. A big On/Off button (all three items) seems even easier to understand and use, but I wonder if in the case of unplanned downtime flipping the downed backend to Off would mean active pods would be stuck (unable to detach)? Volume create/delete seems safe for both use cases and should succeed whether the backend management and data endpoints are reachable or not.

I hope we could apply advanced Trident features (virtual pools, topo-aware CSI, etc.) to differentiate between multiple homogeneous backends, but that would seem complex. For example, it may be challenging or even unsupported to remove arrays from Trident configuration, and one storage backend can serve data via multiple "AZs" (on-prem or in the cloud) which would make the selection and maintenance of Trident configuration more demanding.

I think there's a use case for users who would prefer a simpler way (backend On/Off), but I'd like to consider using advanced Trident configuration options if that was viable.

@wonderland
Copy link

One potential use case I see is around tech refresh/migration. Assume you already know that a certain backend system will be removed soon. You'd prefer to not have any new volumes created on it, in fact you might be in the process of migration everything off that backend and Trident would interfere constantly adding new volumes. Suspending volume creation only would be the ask for this use case, e.g. Trident simply wouldn't consider this backend as suitable during the provisioning process. I think this would already work today by deleting the backend, as Trident keeps it around as long as there are still volume on it. Deletion/snapshots/clones for existing volumes would continue to work.

IIRC, for (un)planned downtime Trident should already set the backend to offline once it finds out that it is unreachable. I don't think there is a periodic health check for backends? That might be useful to have... (with appropriate re-tries and timeouts).

As long as a volume already exist, I'd say that attach/detach, snapshots,... should continue to work. Everything else would be very confusing for the user.

@scaleoutsean
Copy link
Contributor Author

Regarding backend health-checks, that's related and of interest to me in the context of this request. I'm not sure what exactly is checked (data, mgmt, both, neither?) so I'm waiting for this issue to make progress. Depending on how it plays out, maybe it'd impact our preferences for "suspend" behavior.

@mravi-na
Copy link
Contributor

This issue is fixed with 9a541c4 and is included in Trident release 23.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants