Proposal: Docker Storage API #11090

hoskeri · 2015-02-28T20:22:51Z

Docker Storage API - Proposal

Motivation

Data volumes cannot be described fully by docker/containers.
Image storage for containers is not configurable.
There is currently no infrastructure to support distributed
storage volumes from within docker.
To manage life-cycle of data, image and metadata volumes
Ability to manage shared volumes across a cluster of docker hosts

Volumes as a first class object in Docker.

We achieve these goals by promoting Volumes and Storage as
first class entities in docker on the same level as containers.

This would also allow containers to describe the volumes they must be
attached and store the container configuration in a way that permits the
creation of Highly available containers.

Another feature would be the ability to discover capabilities of various
storage backends and automatically provision and configure storage volume
based on QoS specs for containers.

Use Case Example.

Lets take the example of a common pattern for multi-tiered database application.

Such an application may consist of one or more database servers, a number of
application processes, HTTP servers, caching servers, load balancers, management, etc.

The deployment of such an application imposes different levels of performance,
availability and reliability requirements. It is useful to be able to specify
what each volume and each container type needs, and for the available storage
to be matched and provisioned automatically.

Thus user facing terms like 'high', 'low' can be translated by docker and/or volume
drivers to concrete metrics such as IOPS, etc.

Examples of such containers and their storage needs:

        :database-master:
          build: .
          ports:
           - "8000:8000"
          volumes:
            name: DataVolume
            size: 10G
            type: block
            path: /dev/sdb
            qos-spec:
                availability: high
                performance: high

        :database-replica:
          build: .
          ports:
           - "3307:3307"
          volumes:
            name: DataVolumeReplica1
            size: 10G
            type: block
            path: /dev/sdb
            qos-spec:
                availability: medium
                performance: high

        :caching-server:
          build: .
          ports:
           - "3307:3307"
          volumes:
            name: VarnishCache
            size: 2G
            type: fs
            path: /var/lib/varnish-cache
            qos-spec:
                availability: low
                performance: medium

        :application-worker:
          build: .
          ports:
           - "3307:3307"
          volumes:
            name: ScratchVolume1
            size: 1G
            type: fs
            path: /var/lib/app-scratch
            qos-spec:
                availability: low
                performance: low

        :load-balancer:
          build: .
          ports:
           - "3307:3307"
          volumes:
            name: LoggingVolume1
            size: 1G
            type: fs
            path: /var/lib/balancer
            read-only: true
            qos-spec:
                availability: low
                performance: low

Design Overview.

We create a new type of driver interface called the 'storagedriver'.

Just like the existing 'graphdriver' and 'execdriver' interfaces,
this will allow the existence of multiple implementations of
the interface - referred to below as 'drivers'.

The central new entity introduced by this interface is called the
'Volume'. Each driver allows the creation, manipulation and destruction
of these volumes.

'Volumes' are meant to supersede and formalize the existing concept of
data volumes. They can also be used to represent individual layers in a
container. Thus any snapshot capability offered by the storage can be used
to implement the layering of filesystems needs for docker to work.

We also create definitions of service levels and performance criteria
that can serve as input to the storage drivers to manage their resources.

Interface/API definitions.

    package driver

    // Custom metadata for volumes.
    type VolumeTags map[string]string

    const (
        METADATA = iota
        DATA     = iota
    )

    type storageDriverInfo struct {
        driver *Driver
    }

    // List of available storage drivers.
    var storageDrivers []storageDriverInfo

    func Register(d *Driver) (err error) {
        return nil
    }

    type PolicySpec struct {
        AvailabilityZone []string // List of availablity zones to replicate to.
                                  // Also specifies no. of copies.
        MinIOPS          uint64   // Min IOPS expected.
        MaxIOPS          uint64   // Max IOPS allowed.
        MinLatency       uint64   // Min Latency allowed.
        MaxLatency       uint64   // Max Latency expected.
        Burst            uint64   // Number to denote 'burstiness'
        MinThroughput    uint64   // Guaranteed throughput
        MaxThroughput    uint64   // Max throughput allowed.
        MinResv          uint64   // Minimum bytes to reserve.
        MaxResv          uint64   // Max bytes reserved. Same as capacity/size of the volume
                                  // if Min = 0, implies thin provisioning
                                  // if Min == Max, implies thick provisioning
        Dedup            bool     // Dedup enabled.
        Compression      string   // Compression type, if enabled.
        Encryption       string   // Encryption type, if enabled.
        RPO              uint64   // Recovery Point Objective - represented by date/time
        RTO              uint64   // Recovery Time Objective - represented by date/time
    }


    // Interface to be implemented by each of the drivers.
    type Driver interface {
        // Create a Volume given a name, a policy specification, and optionally a
        // source snapshot to fork from
        // tags is a set of arbitrary key value pairs
        // to associate meta-data.
        Create(name string, snapshot VolumeID, spec VolumePolicySpec, tags VolumeTags) (v *VolumeID, err error)

        // Get the runtime structure representing a volume.
        Get(v *VolumeID) (vol *Volume, err error)

        // Update the PolicySpec and/or other parameters
        Update(v *VolumeID, spec VolumePolicySpec) (err error)

        // Delete this volume
        // lazy implies that delete will be
        // kept pending until last mount is detached,
        // instead of failing if its still in use.
        Delete(v *VolumeID, bool lazy) (err error)

        // Clone a volume from another volume or snapshot.
        // This creates a writeable copy of this volume.
        Clone(v *VolumeID, name string, spec VolumePolicySpec, tags VolumeTags) (v *VolumeID, err error)

        // Create a read-only point in time snapshot.
        Snapshot(v *VolumeID, name string, spec VolumePolicySpec, tags VolumeTags) (v *VolumeID, err error)

        // Get a list of all child snapshots or clones created
        // from this volume.
        EnumerateSnapshots(v *VolumeID) (snapshots *[]VolumeID, err error)

        // Stream snapshot diff between two volumes/snapshots to a third Volume.
        SnapshotDiff(v *VolumeID, other *VolumeID, dest *VolumeID) (err error)

        // Restore the contents of this volume from a previous snapshot.
        Revert(v *VolumeID, to VolumeID) (err error)

        // Ask the driver to setup the volume,
        // and return the access point - eg. block device file, local directory, nfs mountpoint, iscsi volume etc.
        Attach(v *VolumeID) (id string, t AccessType, err error)

        // Ask the driver to destroy the access point
        // and free resources.
        Detach(v *VolumeID, lazy bool) (err error)

        // General lookup of all volumes provided by this driver,
        // search could be by ID, name, policy, snapshot-create-time, etc.
        Search(query string) (result *[]VolumeID, err error)
    }

    // Representation of the run time status
    // of the volume and any of its access points,
    // associated containers, hosts, etc.
    type Volume struct {
        // Run time details.
        Name  string
        Spec  VolumePolicySpec
        Stats string
        // alarms, hosts attached, policy compliance status, etc.
        Status string
    }

Future work.

Allow Delegation to out of process daemon to permit hot plugging of drivers.

The text was updated successfully, but these errors were encountered:

LaynePeng · 2015-03-01T14:04:46Z

I like this proposal, +1

thaJeztah · 2015-03-01T14:11:24Z

For reference; another proposal was also created recently; Proposal: Storage Interfaces for Docker #11020

thaJeztah · 2015-03-01T14:14:13Z

Also, @hoskeri could you change the title and have it start with "Proposal: "?

cpuguy83 · 2015-03-02T18:03:20Z

I feel like this is covering a couple of things, 1) volume management, 2) enhanced volume functionality.

Please see #8363 and the subsequent #8484 implementing volume management.
Would be great to have discussions on volume management centered around those.

On the enhanced volume functionality, I feel like adding complexity to Docker and making it a data management platform is not the right way to go... what we can do is provide a standard API that calls out to other tools when a volume is requested.

hoskeri · 2015-03-02T18:08:09Z

@thaJeztah Done!

ahanwadi · 2015-03-02T20:54:52Z

@cpuguy83 - The CLI in proposal #8363 looks reasonable and we can have the API mirror those.
However, currently docker does not provide a way to utilize specific storage provider on a container basis.
We would definitely need a way to:

Be able to specify storage provider/driver when creating a volume
be able to create volume from an existing volume (as a clone or snapshot) as you may not always start with an empty volume.

With the above two, the modified volume create CLI could look like this:
docker volume create --name xyz --size 10G --read-write --provider springpath --from source-volume

The API proposed addresses these additional areas as well.
The following items in the proposed API could be moved out into a separate interface or even outside of docker:

attach/detach - provide a way to manage access to a volume from any docker host
revert, enumerate snapshots - provide management of snapshot tree
We could definitely move 2. outside of docker.
We could also move 1. into a separate interface and add some provider management (list providers, register provider, etc.) functionality.

Without a way for docker to manage relationship between volumes and providers, docker will not be able to use multiple storage providers and allow the volumes to be shared between docker hosts.

thaJeztah · 2015-03-02T21:15:06Z

From a UX perspective, here are some proposals for being able to specify a driver/options when creating volumes; #9803, #9250 (and others #7249 (comment)).

(IMO, it's time to collect all those proposals and have a design, both from a UX and technical perspective)

ahanwadi · 2015-03-03T01:26:47Z

@thaJeztah - Thanks for the references. So, essentially with the remote volumes, docker will be mainly connecting to existing volumes (so 'docker volume create' is more like attach or mount).
Seeing a few clarifications:

How do we then handle the case of creating a new empty volume as specified in the Dockerfile using VOLUME entry? Will such a volume be always provisioned using some default/local provider/driver? Or we will always force the user to specify an pre-provisioned volume during docker run?
Currently, docker relies on devicemapper snaps (when devicemapper storage backend is in use) https://github.com/docker/docker/tree/master/daemon/graphdriver/devmapper. For images, docker needs to continue drive creation of snapshots or layers. How can that be achieved without exposing clone/snapshot management for remote storage backends?

thockin · 2015-03-04T04:45:06Z

It's not clear to me how these are really to work in an orchestrated system. Suppose, as per your example, I get DataVolume for my database-master. Then that node dies - how do I get my data back?

ahanwadi · 2015-03-04T15:01:28Z

@thockin - The way this would work is docker (or any other orchestration layer) will store the reference to the volume (volumeID) and provider details in its configuration. If the volume was attached to any container(s), volumeID will also be referenced (along with details such as mount point or dev path) by the container's configuration.

All of this can be achieved if we use the same common interface to manage shared volumes:

Data volumes
Configuration volume for each container - can be looked by container ID
Orchestrator/docker configuration - Volumes, networks, various providers and other configuration information (independent of how they are attached to the specific containers) - can be looked by using search with a well-known volume type
Orchestrator and docker's runtime details - i.e. which hosts are running which containers, heartbeat from hosts, etc. - can be looked by using search with a well-known volume type and host ID

So if a host crashes, the container can be restarted on another host if that host has access to the docker configuration (# 3 above). It can then look up the container configuration (# 2 above) using container ID, lookup, attach all its volume and then restart/resume the container.

LaynePeng · 2015-03-04T15:52:57Z

Hi, @hoskeri I am very interested in your proposal, is there any work have started, or perhaps we can discuss, even work together.

thockin · 2015-03-04T16:22:56Z

@ahanwadi I'm a concrete sort of guy - can we bring this down to examples of possible implementation strategies? Where does a user who is running a container ask for "I want 10 GB of HA read-write block storage with at least 500 IOPs"?

Now, once that container is running imagine the machine hosting it dies. How does a user ask for "I want THE SAME storage I was using before" ?

cpuguy83 · 2015-03-04T16:26:44Z

@ahanwadi So my thoughts here are this:

A volume is necessarily a filesystem that is either actually part of the host's fs, or has been mounted as an fs to the host.
I say necessarily here because this is what will end up in the container anyway, and for Docker to handle provisioning block devices, snapshotting/cloning data, etc. seems a bit out of scope.
It is mounted to the host (and not directly in the container) because some things can only be mounted to a system one time (zfs).

So, using the host path of the volume as the API, we can create an extension system that is simple to implement but powerful to use.
Consider these two commands:

docker run -v /foo busybox
docker run -v /foo:/bar busybox

In the first command, we are requesting a volume for the container with no specific host path, which would be (and is indeed today presented to the volume subsystem) as an empty string.

In the 2nd command we explicitly requested /foo on the host to be used as the host path.

In both cases this would be sent to an extension, along with the container ID, to deal with (or ignore). The extension would return either return exactly what it got or even a modified path. If ignored Docker would handle it as it does today.
Since we are also providing the container ID, the container configuration can be looked up to determine actual storage requirements for that specific container/image.

This, by the way, is already a working POC.

While data must be addressable beyond the life of any single container, data is for and created by a container.
The above POC is about keeping volume creation/attaching within the scope of the existing docker container commands/API, but keeping the knowledge of the actual underlying infrastructure outside of Docker.

hoskeri · 2015-03-04T21:29:01Z

@LaynePeng We are playing with some code to get an implementation going. One of the other goals of this issue is document the requirements a storage system has from a container orchestrator/implementor, and also to get some feedback on what the docker folks feel the interaction should look like.

LaynePeng · 2015-03-05T02:45:19Z

@hoskeri thanks for your quickly reply. Is there any codes we can acces currently? You know, I like this proposal, but just one suggestion: Docker core is tend to become focus and lean, is it more reasonable to consider this as "add addition facility to a container", then put this enhancement to the docker/compose project? Then in docker core project, we can remain use "-v" to mount the fs or block device to a container.

kunalkushwaha · 2015-05-13T02:50:04Z

Hi @hoskeri I also liked the proposal. +1

@cpuguy83 since you already had a working on POC https://github.com/cpuguy83/docker/blob/volume_drivers/volumes/volumedriver/host/driver.go

Do you plan to bring a some consolidated proposal from #11090, #11020 and UX proposals ?

ahanwadi · 2015-05-13T04:38:32Z

@cpuguy83 I understand your concern about docker having to provision block storage.
Docker can callout to storage drivers when provisioning volumes and it is up to the storage provider to provision the storage (block, filesystem, etc.).
Once a volume is created, docker run/start CLI/API can request a specific volume to be attached to the container.
We believe docker should be flexible in terms of storage access interfaces, such as POSIX filesystem, block device, etc.
Please note that docker uses snapshot/clones to create/manager image layers. Generalizing this to data volumes will be very helpful.

ahanwadi · 2015-05-13T05:13:43Z

@cpuguy83 Maybe, I am trying to understand your suggestion about using container ID in the extension to lookup container specific configuration. The container ID won't be known to the extension(s) in advance; so how can they lookup container specific volume(s)? This is similar to the issue that we are having for networking. The netns is based on container ID and hence there is no way to configure netns before hand for a container. Many (including us) ended up having to use pipework to get around that.
At a minimum docker will have to accept (from CLI/API) and store volume details (such as, type, policy/qos spec, etc.) along with container before it calls out to extension(s) so that the storage extension can lookup those details for provisioning storage for the container.
Or support an alternate workflow:

User creates a container (w/o starting it)
User provisions storage using out of band mechanism and associates it with the container ID
User starts the container
Docker calls out to the extension, which looks up the storage for the container ID and the path, returns the new path. Docker then bind mounts the new path into the container and starts the container
The above workflow is too complex and requires a wrapper script or manual intervention (step Whiteouts in layers #2 above) and goes against the simplicity that made docker so popular.

cpuguy83 · 2015-05-13T12:42:52Z

@ahanwadi @kunalkushwaha #13161

calavera · 2016-03-03T22:53:41Z

We already have a storage API with a plugin system. Closing this as fixed.

jessfraz added Proposal kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Mar 2, 2015

hoskeri changed the title ~~Docker Storage API~~ Proposal: Docker Storage API Mar 2, 2015

jessfraz added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny and removed kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny kind/proposal labels Sep 8, 2015

calavera closed this as completed Mar 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Docker Storage API #11090

Proposal: Docker Storage API #11090

hoskeri commented Feb 28, 2015

LaynePeng commented Mar 1, 2015

thaJeztah commented Mar 1, 2015

thaJeztah commented Mar 1, 2015

cpuguy83 commented Mar 2, 2015

hoskeri commented Mar 2, 2015

ahanwadi commented Mar 2, 2015

thaJeztah commented Mar 2, 2015

ahanwadi commented Mar 3, 2015

thockin commented Mar 4, 2015

ahanwadi commented Mar 4, 2015

LaynePeng commented Mar 4, 2015

thockin commented Mar 4, 2015

cpuguy83 commented Mar 4, 2015

hoskeri commented Mar 4, 2015

LaynePeng commented Mar 5, 2015

kunalkushwaha commented May 13, 2015

ahanwadi commented May 13, 2015

ahanwadi commented May 13, 2015

cpuguy83 commented May 13, 2015

calavera commented Mar 3, 2016

Proposal: Docker Storage API #11090

Proposal: Docker Storage API #11090

Comments

hoskeri commented Feb 28, 2015

Docker Storage API - Proposal

Motivation

Volumes as a first class object in Docker.

Use Case Example.

Design Overview.

Interface/API definitions.

Future work.

LaynePeng commented Mar 1, 2015

thaJeztah commented Mar 1, 2015

thaJeztah commented Mar 1, 2015

cpuguy83 commented Mar 2, 2015

hoskeri commented Mar 2, 2015

ahanwadi commented Mar 2, 2015

thaJeztah commented Mar 2, 2015

ahanwadi commented Mar 3, 2015

thockin commented Mar 4, 2015

ahanwadi commented Mar 4, 2015

LaynePeng commented Mar 4, 2015

thockin commented Mar 4, 2015

cpuguy83 commented Mar 4, 2015

hoskeri commented Mar 4, 2015

LaynePeng commented Mar 5, 2015

kunalkushwaha commented May 13, 2015

ahanwadi commented May 13, 2015

ahanwadi commented May 13, 2015

cpuguy83 commented May 13, 2015

calavera commented Mar 3, 2016