Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ontap drivers should have more intelligent volume placement #64

Open
arndt-netapp opened this issue Oct 17, 2017 · 6 comments
Open

Comments

@arndt-netapp
Copy link
Contributor

The placement of flexvols by the ontap-nas-economy driver needs to be more intelligent. The following suggestions are made as a starting point, based on customer feedback for a large scale environment:

  1. Allow Trident to be configured so that we only continue to provision qtrees in a flexvol until the underlying aggregate is X percentage full or Y percentage oversubscribed.

  2. If multiple aggregates are defined, prefer to use whatever aggregate has the most free space and least oversubscribed.

  3. While Update README.md #1 and Support provisioning of qtrees instead of volumes to improve Trident scalability #2 are more important in the short term, at some point it would also be desirable to have provisioning take into account node headroom.

@clintonk
Copy link
Contributor

To be clear, most of these would apply to all ONTAP drivers. The Trident scheduler is definitely a rich target environment.

@clintonk clintonk changed the title The ontap-nas-economy driver should have more intelligent volume placement The ontap drivers should have more intelligent volume placement Oct 30, 2017
@acsulli
Copy link

acsulli commented Apr 5, 2018

I will +1 this request on behalf of many interactions I've had and add some additional items. I am often asked for several capabilities around the storage pool selection logic:

  1. Being able to exclude storage pools using the storage class definition. E.g. "I want all flash (media=ssd), except for AFF aggr1 and SF QoS policy Bronze." Currently the only mechanism to accomplish this is to do the inverse, where every storage pool except the one(s) to be excluded are specified in the storage class.

  2. A mechanism to stop provisioning new PVC requests against a storage pool when the underlying storage device (e.g. an ONTAP aggregate) reaches an arbitrary level of "full". This can/should be based on both actual/real capacity remaining as well as an over subscription ratio/percentage.

  3. Allow the specification of an arbitrary capacity limit for a particular backend. For example, with ONTAP, regardless of the actual size of an aggregate, Trident is only allowed to consume X GiB of that capacity. The same principle would apply to SolidFire, though the paradigm could be extended not just to GiB, but also IOPS (QoS minimums, specifically).

  4. Leverage AppDM as a backend.

  5. Leverage Service Level Manager as a backend.

  6. Incorporate ONTAP performance capacity as a storage pool selection metric.

@hendrikland
Copy link

Well, if we talk about placement logic, I'll add a few more items to consider (based on actual customer requirements):

  • Number of volumes on a node (since Ontap has logical limits that you don't want to exceed)
  • Node which holds the LIF for the SVM (in order to avoid indirect data access for best performance)
  • Provisioned IOPS (via adapative QoS concept) on the node/aggregate

Combined with the points already mentioned by others and weighted and sorted according to the specific requirements the customer has.

In the end, we'd either need a customizable rule engine in trident, or a flexible mechanism to attach other tools. If we decide to go with the later, I'd vote for WFA in addition to AppDM and NSLM that are already mentioned, since (today) it is the only tool that is flexible enough to build a custom solution that takes all of the above items into consideration.

@CalvinHartwell
Copy link

+1 for this as i'm hitting issues when using Trident + cloud manager together, as its still not picking the correct pool which is automatically generated for me.

@Errant-Dutchman
Copy link

+1 Same request: we have a 12 node cluster for the point: Node which holds the LIF for the SVM (in order to avoid indirect data access for best performance)
and
leveraging full volume provisioning as mentioned before.

Well, if we talk about placement logic, I'll add a few more items to consider (based on actual customer requirements):

  • Number of volumes on a node (since Ontap has logical limits that you don't want to exceed)
  • Node which holds the LIF for the SVM (in order to avoid indirect data access for best performance)
  • Provisioned IOPS (via adapative QoS concept) on the node/aggregate

Combined with the points already mentioned by others and weighted and sorted according to the specific requirements the customer has.

In the end, we'd either need a customizable rule engine in trident, or a flexible mechanism to attach other tools. If we decide to go with the later, I'd vote for WFA in addition to AppDM and NSLM that are already mentioned, since (today) it is the only tool that is flexible enough to build a custom solution that takes all of the above items into consideration.

@YvosOnTheHub
Copy link

also +1

Another idea. The limitAggregateUsage parameter looks at the global usage of the aggregate, not only the capacity managed by Trident.

if the aggregate is shared among different workloads, which is often the case, this parameter should be a bit more precise.

an example:

  • the limit is set to 20%
  • the aggr is already 50% full from other workloads
    In this case, Trident wont be able to create a volume.
    If the parameter was looking at the space occupied only by Trident, the creation would have succeeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants