Persistent data volumes #111

JeremyGrosser · 2013-03-20T02:07:17Z

A lot of applications store nontrivial amounts of data on disk and need it to be persisted outside the scope of the aufs filesystem.

Proposal:

docker run base --bind /path/outside/container:/mnt/path/inside/container /path/to/crazydb

Bonus points if you can use variable substitution in the bind path names eg.

docker run base --bind '/mnt/$Id/mail:/var/spool/mail' /usr/sbin/postfix

Presumably this feature would manifest itself in config.json somewhat like this:

"Mountpoint": {
  "Bind": [
    {"OutsidePath": "/path/outside/container",
     "InsidePath": "/path/inside/container"}
  ]
}

The text was updated successfully, but these errors were encountered:

sa2ajj · 2013-03-26T14:34:41Z

would it be possible to have a some sort of --fstab option that'd result in adding lxc.mount.entry entries in the container's config file??

sa2ajj · 2013-03-26T14:40:57Z

actually, there are two options here:

copy the given fstab verbatim and use lxc.mount = option
translate the content of the file to corresponding lxc.mount.entry

shykes · 2013-03-26T14:47:56Z

The key principle to keep in mind is that we want to minimize how much the
container's execution environment depends on the host's.

On Tue, Mar 26, 2013 at 7:40 AM, Mikhail Sobolev
notifications@github.comwrote:

actually, there are two options here:

copy the given fstab verbatim and use lxc.mount = option

translate the content of the file to corresponding lxc.mount.entry

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-15462016
.

jpetazzo · 2013-03-27T03:23:12Z

Short term.
The command-line binding proposed by @synack would work great. That would make an easy way to persist data, with minimal "out of docker" instrumentation. My personal taste would be to reverse the args, e.g. dotcloud run base -volume /path/in/container=/path/on/host, but that's just me.
Mid term.
I don't know what we want in the config.json file. FTR, on the current dotCloud platform (which uses the cloudlets format, this is split between two parts: manifest and config. The manifest is the conceptual equivalent of a class definition. It says "to run this, I need one tcp port for SSH, and another for SQL; and also, /var/lib/mysql should be a persistent volume". The config is the instantiated version, so it tells exactly which port was allocated, which volume was binded, etc.
It looks like we might have port information in the image json file (to mention "hey, that image exposes a service on port 5432, so by default, dotcloud run should automatically add -p 5432 unless overriden").
If that's so, it would make sense to also mention which paths are supposed to be volumes, if only for mere introspection purposes.
Then, if we implement container tagging, it would integrate very neatly to provide persistent data storage. I.E. by default, you get a tmpfs on each volume, but if the container is tagged, then volume foo is bound from e.g. /var/lib/docker/volumes/<containertag>/foo.
Long term.
I believe that storage providers will be an important feature. It's too early to discuss that in detail I guess; but the idea would be to allow docker to interface with storage systems like LVM, btrfs, iSCSI, NFS, glusterfs, ceph... The scheme used by Xen 3 for network and block devices is not perfect, but it's a good source of inspiration (TL,DR: it allows to specify that e.g. /dev/xvdk should be myiscsi:foobar, and it will offload to a myiscsi script the task of locating foobar and making it available, whatever that means; so it is fairly extendable without touching the core). Of course docker wouldn't implement all those interfaces, but provide something that makes it easy for everyone to hook up whatever they need in the system.

sa2ajj · 2013-03-27T07:32:26Z

(Just for the record) I realized one thing: the bound directory should somehow excluded from what is being tracked as "changes". I am not sure if a straightforward implementation would work right away.

jpetazzo · 2013-03-27T07:48:16Z

That will actually work out of the box—because docker tracks changes by checking the AUFS layer, and a bind mount wouldn't show up in the layer.

tadev · 2013-03-29T23:05:51Z

+1 want

titanous · 2013-03-29T23:14:54Z

👍 I want to see if I can get Ceph running in docker so that I can get docker running on Ceph.

shykes · 2013-04-08T19:06:03Z

Updated the title for clarity.

shykes · 2013-04-08T20:02:08Z

So, here's the current plan.

1. Creating data volumes

At container creation, parts of a container's filesystem can be mounted as separate data volumes. Volumes are defined with the -v flag.

For example:

$ docker run -v /var/lib/postgres -v /var/log postgres /usr/bin/postgres

In this example, a new container is created from the 'postgres' image. At the same time, docker creates 2 new data volumes: one will be mapped to the container at /var/lib/postgres, the other at /var/log.

2 important notes:

Volumes don't have top-level names. At no point does the user provide a name, or is a name given to him. Volumes are identified by the path at which they are mounted inside their container.
The user doesn't choose the source of the volume. Docker only mounts volumes it created itself, in the same way that it only runs containers that it created itself. That is by design.

2. Sharing data volumes

Instead of creating its own volumes, a container can share another container's volumes. For example:

$ docker run --volumes-from $OTHER_CONTAINER_ID postgres /usr/local/bin/postgres-backup

In this example, a new container is created from the 'postgres' example. At the same time, docker will re-use the 2 data volumes created in the previous example. One volume will be mounted on the /var/lib/postgres of both containers, and the other will be mounted on the /var/log of both containers.

3. Under the hood

Docker stores volumes in /var/lib/docker/volumes. Each volume receives a globally unique ID at creation, and is stored at /var/lib/docker/volumes/ID.

At creation, volumes are attached to a single container - the source of truth for this mapping will be the container's configuration.

Mounting a volume consists of calling "mount --bind" from the volume's directory to the appropriate sub-directory of the container mountpoint. This may be done by Docker itself, or farmed out to lxc (which supports mount-binding) if possible.

4. Backups, transfers and other volume operations

Volumes sometimes need to be backed up, transfered between hosts, synchronized, etc. These operations typically are application-specific or site-specific, eg. rsync vs. S3 upload vs. replication vs...

Rather than attempting to implement all these scenarios directly, Docker will allow for custom implementations using an extension mechanism.

5. Custom volume handlers

Docker allows for arbitrary code to be executed against a container's volumes, to implement any custom action: backup, transfer, synchronization across hosts, etc.

Here's an example:

$ DB=$(docker run -d -v /var/lib/postgres -v /var/log postgres /usr/bin/postgres)

$ BACKUP_JOB=$(docker run -d --volumes-from $DB shykes/backuper /usr/local/bin/backup-postgres --s3creds=$S3CREDS)

$ docker wait $BACKUP_JOB

Congratulations, you just implemented a custom volume handler, using Docker's built-in ability to 1) execute arbitrary code and 2) share volumes between containers.

shykes · 2013-04-08T20:03:25Z

One aspect of the spec which is not yet determined: specifying read-only mounts. Any preference on the best way to extend the syntax?

glasser · 2013-04-08T20:08:39Z

Can you specify --volumes-from more than once?

shykes · 2013-04-08T20:11:08Z

@glasser I didn't consider it. One obvious problem is that 2 containers might each have a volume mounted on the same path - in which case the 2 volumes would conflict.

I'm guessing you have a specific use case in mind? :)

glasser · 2013-04-08T20:18:50Z

Sure, but that should be something that can be statically checked by docker while building the container, right?

And yes :)

neomantra · 2013-04-24T22:49:06Z

Another use case for exposing the host file system is for communication via Unix Domain Sockets and mqueues, which use files as the connection point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however, I don't think the "data volumes" spec covers it since it only deals with container-to-container bridging and extraction.

This is concept definitely opposed to @shykes comment regarding repeatability on different hosts. Similarly, what I tried to do here with pinning CPUs (#439) is host-specific, albeit repeatable only if different hosts supports all the same key/values. But people who do this would be using some way of configuring/maintaining it all (like picking service endpoints filepaths/host:port or making sure all processes aren't pinned to the same CPU).

jpetazzo · 2013-04-24T23:16:14Z

I had an interesting discussion with @mpetazzoni yesterday about a rather
contrived use-case: using containerization to provide isolated environment
for remote shell access on a multi-user server.

Specifically, the concern was to operate the local MTA in a separate
container, but still deliver mails to e.g. ~/Maildir (per-user).

The challenge here is "partial volume sharing", i.e. the "shell" containers
should be given access only to /home/$USERNAME (for a unique $USERNAME),
while the MTA should be able to access /home/$USERNAME/Maildir (for all
users of the system).

On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication
via Unix Domain Sockets, mqueues, , all which use files as the connection
point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however,
I don't think the "data volumes" spec covers it since it only deals with
container-to-container bridging and extraction.

This is concept definitely opposed to @shykes https://github.com/shykescomment regarding repeatability on different hosts. Similarly, what I tried
to do here with pinning CPUs (#439 #439)
is host-specific, albeit repeatable only if different hosts supports all
the same key/values. But people who do this would be using some way of
configuring/maintaining it all (like picking service endpoints
filepaths/host:port or making sure all processes aren't pinned to the same
CPU).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-16977366
.

shykes · 2013-04-24T23:18:27Z

So, I now have enough datapoints (ie people yelling at me) to acknowledge the need for "choosing your own adventure" when it comes to custom runtime configuration: cpu pinning, external mountpoints, etc.

So, to quote @brianm, we need "escape hatches" for the experts without ruining the experience for everyone.

To get back to external mountpoints for volumes, I am convinced and have a design in mind for the escape hatch, stay tuned.
—
@solomonstre
@getdocker

On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication via Unix Domain Sockets, mqueues, , all which use files as the connection point. Or maybe also exposing serial ports in /dev?
The original proposal at the top of this thread would allow this, however, I don't think the "data volumes" spec covers it since it only deals with container-to-container bridging and extraction.

This is concept definitely opposed to @shykes comment regarding repeatability on different hosts. Similarly, what I tried to do here with pinning CPUs (#439) is host-specific, albeit repeatable only if different hosts supports all the same key/values. But people who do this would be using some way of configuring/maintaining it all (like picking service endpoints filepaths/host:port or making sure all processes aren't pinned to the same CPU).

Reply to this email directly or view it on GitHub:
#111 (comment)

shykes · 2013-04-24T23:20:24Z

.. and the design will solve your MTA example, jerome, as well as all those listed in this issue.
—
@solomonstre
@getdocker

On Wed, Apr 24, 2013 at 4:16 PM, Jérôme Petazzoni
notifications@github.com wrote:

I had an interesting discussion with @mpetazzoni yesterday about a rather
contrived use-case: using containerization to provide isolated environment
for remote shell access on a multi-user server.
Specifically, the concern was to operate the local MTA in a separate
container, but still deliver mails to e.g. ~/Maildir (per-user).
The challenge here is "partial volume sharing", i.e. the "shell" containers
should be given access only to /home/$USERNAME (for a unique $USERNAME),
while the MTA should be able to access /home/$USERNAME/Maildir (for all
users of the system).
On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication
via Unix Domain Sockets, mqueues, , all which use files as the connection
point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however,
I don't think the "data volumes" spec covers it since it only deals with
container-to-container bridging and extraction.

This is concept definitely opposed to @shykes https://github.com/shykescomment regarding repeatability on different hosts. Similarly, what I tried
to do here with pinning CPUs (#439 #439)
is host-specific, albeit repeatable only if different hosts supports all
the same key/values. But people who do this would be using some way of
configuring/maintaining it all (like picking service endpoints
filepaths/host:port or making sure all processes aren't pinned to the same
CPU).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-16977366
.

Reply to this email directly or view it on GitHub:
#111 (comment)

creack · 2013-05-07T02:03:53Z

Closed by #376

niclashoyer · 2013-05-07T15:07:39Z

does the pull request #376 solve the use cases mentioned here? If so, is there any documentation about how to use it?

Vivek fixes 1.10.3

…4968b7 $ git log --oneline dabebe21bf79..9ff6c6923cff 9ff6c69 Add FlagSet.FlagUsagesWrapped(cols) which wraps to the given column (moby#105) a9a634f Add BoolSlice and UintSlice flag types. (moby#111) a232f6d Merge pull request moby#102 from bogem/redundant 5126803 Merge pull request moby#110 from hardikbagdi/master 230dccf add badges to README.md c431975 Merge pull request moby#107 from xilabao/add-user-supplied-func-when-parse 271ea0e Make command line parsing available outside pflag 25f8b5b Merge pull request moby#109 from SinghamXiao/master 1fcda0c too many arguments 5ccb023 Remove Go 1.5 from Travis 86d3545 Clean up code I am interested in 9ff6c69 for a followup. Signed-off-by: Ian Campbell <ian.campbell@docker.com>

shykes closed this as completed Mar 21, 2013

shykes reopened this Mar 26, 2013

shykes mentioned this issue Mar 26, 2013

Data volumes #3

Closed

sa2ajj mentioned this issue Apr 8, 2013

riak use case #351

Closed

ghost assigned creack Apr 8, 2013

shykes mentioned this issue Apr 8, 2013

#111 -- persistent storage #309

Closed

jpetazzo mentioned this issue Apr 25, 2013

Question on Resource Limits? #471

Closed

neomantra mentioned this issue May 2, 2013

Add JSON data to Container.Config, specified with -j option of "run" com... #460

Closed

creack closed this as completed May 7, 2013

gabrtv mentioned this issue May 14, 2013

Bind mount implementation and tests #602

Merged

ptone mentioned this issue Jun 26, 2013

setup lab-wide shared folder ptone/jiffylab#12

Open

unclejack mentioned this issue Jul 4, 2013

shared directory (between container and host)? #1108

Closed

Kloadut mentioned this issue Aug 13, 2013

PostgreSQL plugin dokku/dokku#166

Closed

jpetazzo mentioned this issue Jul 4, 2014

Collected issues with Volumes #6496

Closed

12 tasks

JeremyGrosser unassigned creack Jul 24, 2014

GolfenGuo mentioned this issue Nov 27, 2015

docker ps cannot return, however docker ps -n num can success with containers info #18279

Closed

caseca mentioned this issue Nov 27, 2015

Error starting daemon: Error initializing network controller: could not delete the default bridge network: network bridge has active endpoints #18283

Closed

gilo mentioned this issue Dec 4, 2015

An error occurred trying to connect version 1.9.0 #17846

Closed

allencloud mentioned this issue Dec 22, 2015

docker 1.8.1 docker inspect some specific container hangs, no response #18848

Closed

swisscat mentioned this issue Dec 29, 2015

Docker daemon doesn't start after updating to 1.7 on Ubuntu 15.04 #14088

Closed

rteusner mentioned this issue Jan 21, 2016

Docker API: archive does not untar file content, tar archive is owned by root #19545

Closed

runcom pushed a commit to runcom/docker that referenced this issue Apr 20, 2016

Merge pull request moby#111 from runcom/vivek-fixes-1.10.3

81e9425

Vivek fixes 1.10.3

andybkay mentioned this issue Feb 11, 2021

awslogs driver: timestamps are wrong when using multiline options #41545

Open

thaJeztah added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny area/volumes labels Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent data volumes #111

Persistent data volumes #111

JeremyGrosser commented Mar 20, 2013

sa2ajj commented Mar 26, 2013

sa2ajj commented Mar 26, 2013

shykes commented Mar 26, 2013

jpetazzo commented Mar 27, 2013

sa2ajj commented Mar 27, 2013

jpetazzo commented Mar 27, 2013

tadev commented Mar 29, 2013

titanous commented Mar 29, 2013

shykes commented Apr 8, 2013

shykes commented Apr 8, 2013

shykes commented Apr 8, 2013

glasser commented Apr 8, 2013

shykes commented Apr 8, 2013

glasser commented Apr 8, 2013

neomantra commented Apr 24, 2013

jpetazzo commented Apr 24, 2013

shykes commented Apr 24, 2013

shykes commented Apr 24, 2013

creack commented May 7, 2013

niclashoyer commented May 7, 2013

Persistent data volumes #111

Persistent data volumes #111

Comments

JeremyGrosser commented Mar 20, 2013

sa2ajj commented Mar 26, 2013

sa2ajj commented Mar 26, 2013

shykes commented Mar 26, 2013

jpetazzo commented Mar 27, 2013

sa2ajj commented Mar 27, 2013

jpetazzo commented Mar 27, 2013

tadev commented Mar 29, 2013

titanous commented Mar 29, 2013

shykes commented Apr 8, 2013

shykes commented Apr 8, 2013

1. Creating data volumes

2. Sharing data volumes

3. Under the hood

4. Backups, transfers and other volume operations

5. Custom volume handlers

shykes commented Apr 8, 2013

glasser commented Apr 8, 2013

shykes commented Apr 8, 2013

glasser commented Apr 8, 2013

neomantra commented Apr 24, 2013

jpetazzo commented Apr 24, 2013

shykes commented Apr 24, 2013

shykes commented Apr 24, 2013

creack commented May 7, 2013

niclashoyer commented May 7, 2013