New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design review: Aggregated Local Storage and Host Reboots #144
Comments
Hi Rob, This looks really good. Here's my thoughts.
As briefly discussed yesterday - it would be great, if we could also consider the case when local disks are mirrored to hosts which aren't part of the pool. This is because the storage cluster size is likely to often exceed the pool size. (Unrelated to cross pool SRs.)
Related to the above - I wonder whether a host may be able to share it's local storage without having a PBD. Would a SM or a XAPI plugin be an alternative?
Remotely related - Assuming the RAID rebuild can be a lengthy process, on some point we are likely to want to notify and update users on progress. Could we maintain a xe-task to track the progress of the RAID rebuild? Will we need something similar to mpathalert anyway? Hope this helps, |
Perhaps describe this as "disk contents in the SR are still accessible"? It would be good to have a Task somewhere which could contain progress (if we are able to determine it, but it would be good to be "ready" just in case). It would be nice if the API calls Host.reboot, Host.shutdown could explicitly fail with an error message like `MIRROR_REBUILD_IN_PROGRESS. The exception could include the rebuild Task as an argument. This would catch the case where the user uses "xe host-reboot" rather than XenCenter (IIRC "xe" doesn't use the "allowed-operations") I think this might address Robert's comment about local disks being shared with unknown hosts. I suspect that xapi will need to talk the Melio API directly (or indirectly via some other CLI tool) to discover whether disks are in use and whether arrays are rebuilding. We may wish to expose this information via special |
To make it "safe", I think that xapi needs to be able to ask the storage backend:
If the cluster is equal to the pool, then xapi can do point 2 without asking the storage backend, which will simplify things. I think that, for the moment, it is best to assume that the storage cluster is equal to the XS pool, to avoid making things too complicated (while still need to keep in mind that we may change this in future). I'll add the idea of using a Task and the new error message that @djs55 suggested in an update of the doc. |
Thanks Rob.
I wonder whether the question should instead be, "if the cluster is completely healthy and has at least two copies of each block". That's because there is other cases such as disk failures.
If limiting storage clusters to XS pools saves us a lot of complexity, we could make this choice, but we'd need communicate this decision to the wider team due to implications on planned deployment scenarios. How much complexity do we safe? |
No description provided.
The text was updated successfully, but these errors were encountered: