Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent data volumes #111

Closed
JeremyGrosser opened this issue Mar 20, 2013 · 43 comments
Closed

Persistent data volumes #111

JeremyGrosser opened this issue Mar 20, 2013 · 43 comments
Labels
area/volumes kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Milestone

Comments

@JeremyGrosser
Copy link
Contributor

A lot of applications store nontrivial amounts of data on disk and need it to be persisted outside the scope of the aufs filesystem.

Proposal:

docker run base --bind /path/outside/container:/mnt/path/inside/container /path/to/crazydb

Bonus points if you can use variable substitution in the bind path names eg.

docker run base --bind '/mnt/$Id/mail:/var/spool/mail' /usr/sbin/postfix

Presumably this feature would manifest itself in config.json somewhat like this:

"Mountpoint": {
  "Bind": [
    {"OutsidePath": "/path/outside/container",
     "InsidePath": "/path/inside/container"}
  ]
}
@shykes shykes closed this as completed Mar 21, 2013
@shykes shykes reopened this Mar 26, 2013
@shykes shykes mentioned this issue Mar 26, 2013
@sa2ajj
Copy link
Contributor

sa2ajj commented Mar 26, 2013

would it be possible to have a some sort of --fstab option that'd result in adding lxc.mount.entry entries in the container's config file??

@sa2ajj
Copy link
Contributor

sa2ajj commented Mar 26, 2013

actually, there are two options here:

  • copy the given fstab verbatim and use lxc.mount = option
  • translate the content of the file to corresponding lxc.mount.entry

@shykes
Copy link
Contributor

shykes commented Mar 26, 2013

The key principle to keep in mind is that we want to minimize how much the
container's execution environment depends on the host's.

On Tue, Mar 26, 2013 at 7:40 AM, Mikhail Sobolev
notifications@github.comwrote:

actually, there are two options here:

  • copy the given fstab verbatim and use lxc.mount = option
  • translate the content of the file to corresponding lxc.mount.entry


Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-15462016
.

@jpetazzo
Copy link
Contributor

  1. Short term.
    The command-line binding proposed by @synack would work great. That would make an easy way to persist data, with minimal "out of docker" instrumentation. My personal taste would be to reverse the args, e.g. dotcloud run base -volume /path/in/container=/path/on/host, but that's just me.
  2. Mid term.
    I don't know what we want in the config.json file. FTR, on the current dotCloud platform (which uses the cloudlets format, this is split between two parts: manifest and config. The manifest is the conceptual equivalent of a class definition. It says "to run this, I need one tcp port for SSH, and another for SQL; and also, /var/lib/mysql should be a persistent volume". The config is the instantiated version, so it tells exactly which port was allocated, which volume was binded, etc.
    It looks like we might have port information in the image json file (to mention "hey, that image exposes a service on port 5432, so by default, dotcloud run should automatically add -p 5432 unless overriden").
    If that's so, it would make sense to also mention which paths are supposed to be volumes, if only for mere introspection purposes.
    Then, if we implement container tagging, it would integrate very neatly to provide persistent data storage. I.E. by default, you get a tmpfs on each volume, but if the container is tagged, then volume foo is bound from e.g. /var/lib/docker/volumes/<containertag>/foo.
  3. Long term.
    I believe that storage providers will be an important feature. It's too early to discuss that in detail I guess; but the idea would be to allow docker to interface with storage systems like LVM, btrfs, iSCSI, NFS, glusterfs, ceph... The scheme used by Xen 3 for network and block devices is not perfect, but it's a good source of inspiration (TL,DR: it allows to specify that e.g. /dev/xvdk should be myiscsi:foobar, and it will offload to a myiscsi script the task of locating foobar and making it available, whatever that means; so it is fairly extendable without touching the core). Of course docker wouldn't implement all those interfaces, but provide something that makes it easy for everyone to hook up whatever they need in the system.

@sa2ajj
Copy link
Contributor

sa2ajj commented Mar 27, 2013

(Just for the record) I realized one thing: the bound directory should somehow excluded from what is being tracked as "changes". I am not sure if a straightforward implementation would work right away.

@jpetazzo
Copy link
Contributor

That will actually work out of the box—because docker tracks changes by checking the AUFS layer, and a bind mount wouldn't show up in the layer.

@tadev
Copy link

tadev commented Mar 29, 2013

+1 want

@titanous
Copy link
Contributor

👍 I want to see if I can get Ceph running in docker so that I can get docker running on Ceph.

@sa2ajj sa2ajj mentioned this issue Apr 8, 2013
@ghost ghost assigned creack Apr 8, 2013
@shykes
Copy link
Contributor

shykes commented Apr 8, 2013

Updated the title for clarity.

@shykes
Copy link
Contributor

shykes commented Apr 8, 2013

So, here's the current plan.

1. Creating data volumes

At container creation, parts of a container's filesystem can be mounted as separate data volumes. Volumes are defined with the -v flag.

For example:

$ docker run -v /var/lib/postgres -v /var/log postgres /usr/bin/postgres

In this example, a new container is created from the 'postgres' image. At the same time, docker creates 2 new data volumes: one will be mapped to the container at /var/lib/postgres, the other at /var/log.

2 important notes:

  1. Volumes don't have top-level names. At no point does the user provide a name, or is a name given to him. Volumes are identified by the path at which they are mounted inside their container.

  2. The user doesn't choose the source of the volume. Docker only mounts volumes it created itself, in the same way that it only runs containers that it created itself. That is by design.

2. Sharing data volumes

Instead of creating its own volumes, a container can share another container's volumes. For example:

$ docker run --volumes-from $OTHER_CONTAINER_ID postgres /usr/local/bin/postgres-backup

In this example, a new container is created from the 'postgres' example. At the same time, docker will re-use the 2 data volumes created in the previous example. One volume will be mounted on the /var/lib/postgres of both containers, and the other will be mounted on the /var/log of both containers.

3. Under the hood

Docker stores volumes in /var/lib/docker/volumes. Each volume receives a globally unique ID at creation, and is stored at /var/lib/docker/volumes/ID.

At creation, volumes are attached to a single container - the source of truth for this mapping will be the container's configuration.

Mounting a volume consists of calling "mount --bind" from the volume's directory to the appropriate sub-directory of the container mountpoint. This may be done by Docker itself, or farmed out to lxc (which supports mount-binding) if possible.

4. Backups, transfers and other volume operations

Volumes sometimes need to be backed up, transfered between hosts, synchronized, etc. These operations typically are application-specific or site-specific, eg. rsync vs. S3 upload vs. replication vs...

Rather than attempting to implement all these scenarios directly, Docker will allow for custom implementations using an extension mechanism.

5. Custom volume handlers

Docker allows for arbitrary code to be executed against a container's volumes, to implement any custom action: backup, transfer, synchronization across hosts, etc.

Here's an example:

$ DB=$(docker run -d -v /var/lib/postgres -v /var/log postgres /usr/bin/postgres)

$ BACKUP_JOB=$(docker run -d --volumes-from $DB shykes/backuper /usr/local/bin/backup-postgres --s3creds=$S3CREDS)

$ docker wait $BACKUP_JOB

Congratulations, you just implemented a custom volume handler, using Docker's built-in ability to 1) execute arbitrary code and 2) share volumes between containers.

@shykes
Copy link
Contributor

shykes commented Apr 8, 2013

One aspect of the spec which is not yet determined: specifying read-only mounts. Any preference on the best way to extend the syntax?

@glasser
Copy link
Contributor

glasser commented Apr 8, 2013

Can you specify --volumes-from more than once?

@shykes
Copy link
Contributor

shykes commented Apr 8, 2013

@glasser I didn't consider it. One obvious problem is that 2 containers might each have a volume mounted on the same path - in which case the 2 volumes would conflict.

I'm guessing you have a specific use case in mind? :)

@glasser
Copy link
Contributor

glasser commented Apr 8, 2013

Sure, but that should be something that can be statically checked by docker while building the container, right?

And yes :)

@neomantra
Copy link
Contributor

Another use case for exposing the host file system is for communication via Unix Domain Sockets and mqueues, which use files as the connection point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however, I don't think the "data volumes" spec covers it since it only deals with container-to-container bridging and extraction.

This is concept definitely opposed to @shykes comment regarding repeatability on different hosts. Similarly, what I tried to do here with pinning CPUs (#439) is host-specific, albeit repeatable only if different hosts supports all the same key/values. But people who do this would be using some way of configuring/maintaining it all (like picking service endpoints filepaths/host:port or making sure all processes aren't pinned to the same CPU).

@jpetazzo
Copy link
Contributor

I had an interesting discussion with @mpetazzoni yesterday about a rather
contrived use-case: using containerization to provide isolated environment
for remote shell access on a multi-user server.

Specifically, the concern was to operate the local MTA in a separate
container, but still deliver mails to e.g. ~/Maildir (per-user).

The challenge here is "partial volume sharing", i.e. the "shell" containers
should be given access only to /home/$USERNAME (for a unique $USERNAME),
while the MTA should be able to access /home/$USERNAME/Maildir (for all
users of the system).

On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication
via Unix Domain Sockets, mqueues, , all which use files as the connection
point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however,
I don't think the "data volumes" spec covers it since it only deals with
container-to-container bridging and extraction.

This is concept definitely opposed to @shykes https://github.com/shykescomment regarding repeatability on different hosts. Similarly, what I tried
to do here with pinning CPUs (#439#439)
is host-specific, albeit repeatable only if different hosts supports all
the same key/values. But people who do this would be using some way of
configuring/maintaining it all (like picking service endpoints
filepaths/host:port or making sure all processes aren't pinned to the same
CPU).


Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-16977366
.

@shykes
Copy link
Contributor

shykes commented Apr 24, 2013

So, I now have enough datapoints (ie people yelling at me) to acknowledge the need for "choosing your own adventure" when it comes to custom runtime configuration: cpu pinning, external mountpoints, etc.

So, to quote @brianm, we need "escape hatches" for the experts without ruining the experience for everyone.

To get back to external mountpoints for volumes, I am convinced and have a design in mind for the escape hatch, stay tuned.

@solomonstre
@getdocker

On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication via Unix Domain Sockets, mqueues, , all which use files as the connection point. Or maybe also exposing serial ports in /dev?
The original proposal at the top of this thread would allow this, however, I don't think the "data volumes" spec covers it since it only deals with container-to-container bridging and extraction.

This is concept definitely opposed to @shykes comment regarding repeatability on different hosts. Similarly, what I tried to do here with pinning CPUs (#439) is host-specific, albeit repeatable only if different hosts supports all the same key/values. But people who do this would be using some way of configuring/maintaining it all (like picking service endpoints filepaths/host:port or making sure all processes aren't pinned to the same CPU).

Reply to this email directly or view it on GitHub:
#111 (comment)

@shykes
Copy link
Contributor

shykes commented Apr 24, 2013

.. and the design will solve your MTA example, jerome, as well as all those listed in this issue.

@solomonstre
@getdocker

On Wed, Apr 24, 2013 at 4:16 PM, Jérôme Petazzoni
notifications@github.com wrote:

I had an interesting discussion with @mpetazzoni yesterday about a rather
contrived use-case: using containerization to provide isolated environment
for remote shell access on a multi-user server.
Specifically, the concern was to operate the local MTA in a separate
container, but still deliver mails to e.g. ~/Maildir (per-user).
The challenge here is "partial volume sharing", i.e. the "shell" containers
should be given access only to /home/$USERNAME (for a unique $USERNAME),
while the MTA should be able to access /home/$USERNAME/Maildir (for all
users of the system).
On Wed, Apr 24, 2013 at 3:49 PM, Evan notifications@github.com wrote:

Another use case for exposing the host file system is for communication
via Unix Domain Sockets, mqueues, , all which use files as the connection
point. Or maybe also exposing serial ports in /dev?

The original proposal at the top of this thread would allow this, however,
I don't think the "data volumes" spec covers it since it only deals with
container-to-container bridging and extraction.

This is concept definitely opposed to @shykes https://github.com/shykescomment regarding repeatability on different hosts. Similarly, what I tried
to do here with pinning CPUs (#439#439)
is host-specific, albeit repeatable only if different hosts supports all
the same key/values. But people who do this would be using some way of
configuring/maintaining it all (like picking service endpoints
filepaths/host:port or making sure all processes aren't pinned to the same
CPU).


Reply to this email directly or view it on GitHubhttps://github.com//issues/111#issuecomment-16977366
.


Reply to this email directly or view it on GitHub:
#111 (comment)

@creack
Copy link
Contributor

creack commented May 7, 2013

Closed by #376

@creack creack closed this as completed May 7, 2013
@niclashoyer
Copy link

does the pull request #376 solve the use cases mentioned here? If so, is there any documentation about how to use it?

runcom pushed a commit to runcom/docker that referenced this issue Apr 20, 2016
ijc pushed a commit to ijc/moby that referenced this issue Feb 3, 2017
…4968b7

$ git log --oneline dabebe21bf79..9ff6c6923cff
9ff6c69 Add FlagSet.FlagUsagesWrapped(cols) which wraps to the given column (moby#105)
a9a634f Add BoolSlice and UintSlice flag types. (moby#111)
a232f6d Merge pull request moby#102 from bogem/redundant
5126803 Merge pull request moby#110 from hardikbagdi/master
230dccf add badges to README.md
c431975 Merge pull request moby#107 from xilabao/add-user-supplied-func-when-parse
271ea0e Make command line parsing available outside pflag
25f8b5b Merge pull request moby#109 from SinghamXiao/master
1fcda0c too many arguments
5ccb023 Remove Go 1.5 from Travis
86d3545 Clean up code

I am interested in 9ff6c69 for a followup.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
@thaJeztah thaJeztah added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny area/volumes labels Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/volumes kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

No branches or pull requests