Proposal: & Discussion for Enhanced Volume Functionality for Backends #1553

wallnerryan · 2015-06-09T20:49:31Z

Today Flocker allows users to create datasets with the bare minimum context of what a ”volume” is. There should be enhanced functionality in the future versions that let backends interpret context about a volume when the request to manipulate(create,destroy,list,etc) a "volume" is given. Such context can infer to the backend implementation such enhanced features and capabilities as :

(Not all-inclusive)

Min / Max IOPS
Min / Max Throughput
Min / Max Bandwidth
Protection/Failure Domains
Storage Pools ( HDD, SDD, DAS, FLASH etc)
Read Cache
Consistency Groups
Snapshots
Provisioned Size*
Thin/Thick Provisioning 
Compression?
Deduplication?
Encryption?

*max_size in flocker translates to size in drivers, could use some renaming though.

As an example, in the dataset API we can see that datasets can be configured with the primary node id, maximum_size, and metadata.

    @app.route("/configuration/datasets", methods=['POST'])
    @user_documentation(
        """
        Create a new dataset.
        """,
        header=u"Create new dataset",
        examples=[
            u"create dataset",
            u"create dataset with dataset_id",
            u"create dataset with duplicate dataset_id",
            u"create dataset with maximum_size",
            u"create dataset with metadata",
        ]
    )
    @structured(
        inputSchema={
            '$ref':
            '/v1/endpoints.json#/definitions/configuration_datasets_create'
        },
        outputSchema={
            '$ref':
            '/v1/endpoints.json#/definitions/configuration_datasets'},
        schema_store=SCHEMAS
    )
    def create_dataset_configuration(self, primary, dataset_id=None,
                                     maximum_size=None, metadata=None):
        """
        Create a new dataset in the cluster configuration.

        :param unicode primary: The UUID of the node on which the primary
            manifestation of the dataset will be created.

        :param unicode dataset_id: A unique identifier to assign to the
            dataset.  This is a string giving a UUID (per RFC 4122).  If no
            value is given, one will be generated and returned in the response.
            This is not for easy human use.  For human-friendly identifiers,
            use items in ``metadata``.

        :param maximum_size: Either the maximum number of bytes the dataset
            will be capable of storing or ``None`` to make the dataset size
            unlimited. This may be optional or required depending on the
            dataset backend.

        :param dict metadata: A small collection of unicode key/value pairs to
            associate with the dataset.  These items are not interpreted.  They
            are only stored and made available for later retrieval.  Use this
            for things like human-friendly dataset naming, ownership
            information, etc.

        :return: A ``dict`` describing the dataset which has been added to the
            cluster configuration or giving error information if this is not
            possible.
        """

The dataset or “volume” is configured but with little context. In order for volumes to take advantage of all that the backend has to offer, it would be interesting to map out the plumbing and user stories about how this would work.

For instance, if I am a user, who wants 4 Containers to run with a backend volume(s) so I can run a cassandra cluster on flocker, what would I be looking for as advanced features?

- Storage Pools (I want all cassandra nodes to run on DAS SSD or at the very least SSD)
- Consistency Groups ( I want my volumes in a consistent group so that backup/recovery and snapshots can recover the whole cassandra cluster, instead of by a per volume state.)
- IOPS ( I want my backend to be limited to certain amount of IOPS for my application)
- Protection Domain ( I want volumes spread out on protection domains, including snapshots so my volumes are resilient to node and disk failures)

Today the IBlockDeviceAPI create method has very little to infer about what the user may want.

def create_volume(dataset_id, size):

The backend will likely be configured statically with failure domains, or storage pools, the the backend may only choose to use thin provisioning, this should ultimately be configurable at privisioning-time to the backend.

One option for this is to enable the metadata to hold more information about the volume at creation time, let the metadata dict be a carrier for such operations, and backends may look through this dictionary for possible options.

e.g.

{
    "primary": "<uuid>",
    "metadata": {
        "name": "demo",
        "owner": "alice",
        "extra_provisiontype": "thinProvision"
        "extra_protectiondomain": "yourProtectionDomain"
        "extra_storagepool": "ssdPool"
    }
}

Consistency Group, in this case, drivers would need to be aware that all volumes in a group should be created before snapshoting the group. Or rather, consistency group could be "extra_consistency_group": "myConsistencyGroup instead of actual volume names.

{
    "primary": "<uuid>",
    "metadata": {
        "name": "demo",
        "owner": "alice",
        "extra_consistency_group": [
            "volumeName1",
            "volumeName2",
            "volumeName3"
        ]
    }
}

or even using EBS as an example, create a volume from a snapshot.

{
    "primary": "<uuid>",
    "metadata": {
        "name": "demo",
        "owner": "alice",
        "extra_maxiops": 4000
        "extra_from_snapshot": "mySnapshot001"
    }
}

*max_size in flocker translates to size in drivers, could use some renaming though.

Todays view of flocker

*max_size in flocker translates to size in drivers, could use some renaming though.

Proposed view of flocker with enhanced volume features.

An example of how such features can be used are Extra Specs in Openstack. See below examples for reference.

https://github.com/emc-openstack/vnxe-cinder-driver#thickthin-provisioning
http://docs.openstack.org/juno/config-reference/content/emc-vnx-direct-driver.html#emc-vnx-direct-provisioning

Other examples of more features http://docs.openstack.org/juno/config-reference/content/ontap-cluster-extraspecs.html

The above example of using metadata in the dataset object would allow flocker to pass this information back to the IBlockDeviceDriver implementation which could filter through the given information and provision or manipulate storage with these extra features in mind.

Other features, comments, discussion points

Snapshots are an important protection paradigm for DR and crash consistency, it would be good for flocker to allow backends to take advantage of these scenarios
Building on the cassandra case, consistency groups could be important when looking at lofs and sstables which can have a volume to themselves but be a part of the same consistent group, this would allow for consistent backups and enable DR solutions by grouping volumes together.
Although dedup and encryption are typically associated with colder storage, it may worth exploring what these options have to offer if any.
Multi-pathing
How this maps into flocker user experience. (e.g.) How a flocker-cli / REST API would interact, or what features could be just extra metadata vs. what features would flocker need to "bake in" or be more aware of (e.g. Snapshots and Consistency Groups)
Some options are generic (IOPS Limits) and some more specific (Read Cache), which should be abstracted out so users can define bare-minimun config to create a volume, and only add to config when specific options are needed.

One thing I wanted to make note of, Openstack Cinder has seen similar demands from customers for its drivers, this may represen a decent history of what users may want first from such drivers.

Main additions in Openstack that came first

Backend reports to platform (e.g. backend announces it "capabilities")
Backup (Ability to call backups on volumes, snapshots, consistency groups's)
Replication ( local vs remote )
Multi-Attach (UC's are live migration, gluster, oracle rack, e.g FS knows about shared vols)
Multi-Path (as vk mentioned, failover and multi paths R/W were UCs)
Consistency Group (mentioned above, also used for logs vs db consistency e.g Cassandra)

(Future Support Liberty and Beyond)

Thin provision (assumes things are always thick?)
Compression

@lukemarsden @robhaswell @itamarst @vaibhavkhanduja @kumars109

( more info on original small discussion here https://gist.github.com/wallnerryan/51dcaf54abb53e9b15f0)

The text was updated successfully, but these errors were encountered:

myechuri · 2015-06-09T23:27:31Z

@wallnerryan : Thanks a lot for the informative note, and initiating discussion!

Instead of exposing every feature of backend to end Flocker user as metadata, how about we consider having storage profiles? That way, end user can select from list of {gold/silver/bronze/other} profiles at creation time, and the storage driver can map each profile to a subset of features exposed by backend.

Thanks!

wallnerryan · 2015-06-10T02:31:14Z

Thanks @myechuri had a few conversations with @robhaswell and @lukemarsden along with some colleagues listed in the proposal. metadata was just one option that we were thinking of because it was close to extra_specs in openstack cinder and already exposed by flocker.

I like the idea of a storage profile, would storage profiles be generic enough to be used across all backend types? Or would these be storage profiles per backend? I realize you mention the "driver would match the profile to a subset of features", would this be best-effort? And if the subset is small, does the user get notified before the dataset is provisioned that it might not be "exactly" what they were looking for.

thanks for the discussion, looking forward to any input on the subject, the above proposal in meant to be a starting point to help.

myechuri · 2015-06-10T05:17:02Z

would storage profiles be generic enough to be used across all backend types?

Yes. The idea of storage profile is to abstract out selection of every little knob exposed by backend. User would not have the time to make intelligent choices for his app's desired performance, with so many exposed knobs.

Or would these be storage profiles per backend?

No. If we choose to expose {bronze, silver, gold} profiles, silver might mean
{ IOPS X, deduplication, compression} on EBS, and {IOPS Y, RAID 1} on Cinder. We could expose an API to query what each storage profile translates to on a given backend.
The motivation to use a profile is to pre-select most common configurations and give user fewer / coarser grained choices.

I realize you mention the "driver would match the profile to a subset of features", would this be best-effort?

No, user would like guaranteed SLAs. By subset of features, i meant each profile translates to subset of features available from backend. It need not be an exact subset. For example, silver and gold could include all available features, but silver offering lower settings than gold.

Thanks for the productive discussion!

robhaswell · 2015-06-10T10:07:42Z

Thanks @wallnerryan it's fascinating to get your insight on this.

One thought I had is - users are typically protective of their data (surprise!) and therefore can be particular about the details of its storage. Do you think a "best effort" approach from a storage driver is appropriate, or should there be a class of fatal failures? E.g.

User requested SSD, we only have NVRAM: This is probably OK to accept.
User specified a Consistency Group, driver does not support this: This could be a fatal failure - this sort of deficiency should not be ignored.

wallnerryan · 2015-06-10T13:27:54Z

@robhaswell thanks for the comments. Yeah this is definitely a "thought-in-progress" the first thought around best effort was around backends that didn't support, say, a list of features you wanted for you volumes, so the backend could respond with "heres what I can give you" and the user could then accept of resend the request with token or something to accept.

Your examples above make sense, but I wonder how you choose which ones are "probably OK to accept" as this is more an a opinion per workload.

If not using a analytic approach, then being able to define a profile as @myechuri suggests could be useful and in this case the same (best effort) could possibly happen on profile creation

user trys to define a profile "gold"
backend can support features in "gold" --> profile created
or backend cannot support features -- profile not created
user specifies profile when creating dataset

again, they're may be more ways to think of this, just suggestions based on the discussion so far. thanks!

wallnerryan · 2015-09-15T18:56:31Z

moby/moby#14242 supports opts, which will be used for such features.

wallnerryan · 2015-11-10T18:32:06Z

can track progress here https://clusterhq.atlassian.net/browse/FLOC-3264. Closing this as its old and we've sort of addressed this with profiles.

preflightsiren mentioned this issue Aug 20, 2015

Change EBS default type to GP2 instead of standard #1865

Closed

wallnerryan closed this as completed Nov 10, 2015

wallnerryan added in progress WIP labels Nov 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: & Discussion for Enhanced Volume Functionality for Backends #1553

Proposal: & Discussion for Enhanced Volume Functionality for Backends #1553

wallnerryan commented Jun 9, 2015

myechuri commented Jun 9, 2015

wallnerryan commented Jun 10, 2015

myechuri commented Jun 10, 2015

robhaswell commented Jun 10, 2015

wallnerryan commented Jun 10, 2015

wallnerryan commented Sep 15, 2015

wallnerryan commented Nov 10, 2015

Proposal: & Discussion for Enhanced Volume Functionality for Backends #1553

Proposal: & Discussion for Enhanced Volume Functionality for Backends #1553

Comments

wallnerryan commented Jun 9, 2015

Other features, comments, discussion points

Main additions in Openstack that came first

(Future Support Liberty and Beyond)

myechuri commented Jun 9, 2015

wallnerryan commented Jun 10, 2015

myechuri commented Jun 10, 2015

robhaswell commented Jun 10, 2015

wallnerryan commented Jun 10, 2015

wallnerryan commented Sep 15, 2015

wallnerryan commented Nov 10, 2015