Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data-only containers obsolete with docker 1.9.0? #17798

Closed
jonaskello opened this issue Nov 8, 2015 · 38 comments
Closed

Data-only containers obsolete with docker 1.9.0? #17798

jonaskello opened this issue Nov 8, 2015 · 38 comments

Comments

@jonaskello
Copy link

This is a question regarding best-practice, not sure this is the right place to ask but here it goes anyway:

So in docker 1.9.0 we can create named volumes. This means I could create a container with a named volume, then remove the container completely, and then re-create it again with the same named volume and the data would be retained. I think this was at least one (if not the only) purpose of data-only containers. So my question is if having data-only containers is still considered best-practice? Or can we now skip data-only containers completely and only use named volumes?

@GordonTheTurtle
Copy link

Hi!

Please read this important information about creating issues.

If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead.

If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information.

This is an automated, informational response.

Thank you.

For more information about reporting issues, see https://github.com/docker/docker/blob/master/CONTRIBUTING.md#reporting-other-issues


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:

docker version:
docker info:
uname -a:

Provide additional environment details (AWS, VirtualBox, physical, etc.):

List the steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Provide additional info you think is important:

----------END REPORT ---------

#ENEEDMOREINFO

@thaJeztah
Copy link
Member

yes, named volumes should be able to replace data-only volumes in most (if not all) cases.

@cpuguy83 any ideas of cases where data-only containers still make sense?

We may have to further improve the docs around this

@cpuguy83
Copy link
Member

cpuguy83 commented Nov 8, 2015

Yep, no reason I can see to use data-only containers.

@jonaskello
Copy link
Author

Ok, so no more data-only containers :-) Thanks for verifying this!

@runcom
Copy link
Member

runcom commented Nov 8, 2015

Do we have docs regarding data-only containers? If so we could leave this open until we fix those
Ping @thaJeztah

@trkoch
Copy link

trkoch commented Nov 12, 2015

@runcom Yes, e.g. https://docs.docker.com/engine/userguide/dockervolumes/#creating-and-mounting-a-data-volume-container. I suggest to refer to named volumes instead to prevent future confusion.

@duglin
Copy link
Contributor

duglin commented Nov 12, 2015

Never really used data-only containers, but wouldn't there still be a need for them if you wanted a way to move data (not apps) between clouds? At that point the container becomes the portable-filesystem artifact.

@thaJeztah
Copy link
Member

@duglin actually not, because the data-only container is only used to reference the volume through --volumes-from, so they're also not portable

@duglin
Copy link
Contributor

duglin commented Nov 12, 2015

But, if I used the VOLUME Dockerfile command and then pre-populate that volume during the build process, won't that data be available whenever/wherever I deploy that image?

@trkoch
Copy link

trkoch commented Nov 12, 2015

Maybe a remaining use case is to have pre-seeded volumes from data only container (see #14242 (comment) for subtle differences between anonymous and named volumes).

@BrianAdams
Copy link

Another use case that keeps data only containers relevant is that you can use container affinity (--volumes-from=dependency) to make sure you container runs on the same node as the data container. At the moment there does not appear to be a filter for volume affinity.

@quinncomendant
Copy link

@runcom @trkoch @thaJeztah The section Creating and mounting a data volume container still says, "…it’s best to create a named Data Volume Container…". If this is no longer the case, can y'all make a ticket to get the docs updated? I'm certainly confused by this. ;P

@cpuguy83
Copy link
Member

Already done: #20465

@quinncomendant
Copy link

@cpuguy83 Great, thanks!

@carsten-ulrich-saitow-ag

Hi, I use that feature for nginx, php-fpm. I have nginx and php-fpm in two different containers. As nginx has a link to php-fpm I can not use volumes_from inside the php-fpm container as that would create a circular reference. So I found the solution to use a container that only has the php source code and both the nginx and the php-fpm container use the volumes_from feature.

@thaJeztah
Copy link
Member

@carsten-ulrich-saitow-ag volumes-from is still supported (as is the possibility of using data-only containers), it's just that in many cases you don't need a data-only container per-se, because volumes can now be managed on their own (without the need for a container to be attached to it).

Also, with the new networking, you can reach containers by name (or still provide a --link to set an alias). For example

docker network create myapp
docker run --net=myapp -v phpsource:/var/www --name php php-fpm
docker run --net=myapp -v phpsource:/usr/share/nginx/html --name web nginx

Both containers share the "phpsource" volume (which is propagated with the content of the php-fpm containers the first time it's run). The nginx container can connect to the php container using php as hostname

@quinncomendant
Copy link

@carsten-ulrich-saitow-ag And the named volume phpsource would be created (just once) using:

docker volume create --name phpsource

Oh, wait, @thaJeztah, when you say the volume is “propagated with the content of the php-fpm containers the first time it's run” do you mean it's not necessary to run docker volume create first?

@thaJeztah
Copy link
Member

@quinncomendant correct, you can either create the volume first (docker volume create --name phpsource), or provide a name when starting the container (as in my example). If a volume with that name does not yet exist, it is created automatically.

If is important to do it in the right order, because the files inside the container are only copied to the volume by the first container that uses it (and the volume is still empty)

@damnhandy
Copy link

If you need to apply permissions to a named volume, you're kind of SOL at the moment. When you create volumes with docker volume create, it is owned by root. If your container process runs under a UID other than root (i.e. Jenkins), you're kind of stuck. PR #20262 should will fix this, but that's not available today.

@cpuguy83
Copy link
Member

cpuguy83 commented Mar 7, 2016

@damnhandy It is owned by whoever owns the data in the container at the path you mount it to (assuming it's the first time it's been mounted).

@robvelor
Copy link

What about a data only container when scaling in a swarm? For example scaling the same container to multiple host machines sharing the same volume? Is this the only use case or is scaling to the same host with the volume recommended?

@thaJeztah
Copy link
Member

@robvelor the default ("local") volume driver is indeed local to a host (although swarm will create a volume with the same name on each host). You can, however, use a different driver/plugin; some plugins allow a volume to be shared or replicated on each host; you can find some plugins here; https://docs.docker.com/engine/extend/plugins/#finding-a-plugin

@robvelor
Copy link

@thaJeztah Yes, I forgot to add that I am using rexray to persist in the cloud (AWS-EBS) but the volume can only be mounted to one host machine, hence my question. Any thoughts on this? Maybe I need to use a different plugin to achieve multi-host volumes between containers.

@thaJeztah
Copy link
Member

@robvelor hm, possibly yes; you could ask rex-ray what the options are with their plugin. Be aware, that it may also depend on the application that's writing to the volume; does the app support concurrent processes writing to the "same" volume.

@sergeyklay
Copy link

@thaJeztah Sorry for stupid question but I not fully realized.

You said

docker run --net=myapp -v phpsource:/var/www --name php php-fpm
docker run --net=myapp -v phpsource:/usr/share/nginx/html --name web nginx

you can either create the volume first (docker volume create --name phpsource), or provide a name when starting the container (as in my example). If a volume with that name does not yet exist, it is created automatically.

But how to work with these volumes at host machine? For example to develop application and store it to the named volume.

Sorry if the question is too stupid, but I actually don't quite understand 😐

@derqnaque
Copy link

Stand-alone volumes usually have root,root access until changed by a container with root access. If your container runs software as a normal user (e.g. jenkins, ...) there is no way to change the permissions of a volume mounted at runtime to normal user access for the container. A data-container can be used to run a single chown CMD as root on the volume before it is used by the normal-user container. AFAIK there is no way to do this on a named-volume without a root container. (See also long discussions at e.g. #2259, #7198)

@cpuguy83
Copy link
Member

cpuguy83 commented Apr 7, 2016

@derqnaque stand-alone volumes (As of 1.10) will work the same as anonymous volumes (e.g. docker run -v /foo)... which is they will inherit data/perms from the container image (the first one that attaches to it).
in 1.11 you can supply mount opts for the volume to use, so if the underlying filesystem you are mounting supports uid/gid, you can specify those... e.g. docker volume create --opt type=bindfs --opt o=uid=1000,gid=1000

@cpuguy83
Copy link
Member

cpuguy83 commented Apr 7, 2016

Also in 1.11, you can supply nocopy when attaching a volume to make sure it doesn't copy data to the volume (and set perms, etc) if you are sharing that volume with multiple containers.

@derqnaque
Copy link

@cpuguy83: The host volume might be something I don't have host access to (e.g. in a cloud setting where i am not the hoster). Mount options might work, if I know the filesystem that is in use on the host. And I think e.g. ext4 does not provide the uid, gid options.

The data-container solution still seems a lot easier and more portable to me, since all I need to know is the chown command. Of course I still need the root access in the data container. But this is only run once and the normal user of the software-container does not get the root access.

The nocopy option is interesting. And I sure want to get rid of data-containers. Maybe using it on a named volume and naming it a run-once-to-prepare-the-volume-container solves the problem.

@cpuguy83
Copy link
Member

cpuguy83 commented Apr 7, 2016

@derqnaque A named volume will act exactly like a -v /foo volume as of 1.10 so there should be no added benefits in a data-only container other than grouping volumes together with --volumes-from

@nioncode
Copy link

Is there a way to create a named volume that stores the data of all VOLUMEs defined by the Dockerfile?
I don't really care which folders those are, the Dockerfile should already declare everything that is configuration (e.g. Jenkins' home) as VOLUME. I then want to capture everything outside of the application container, like I currently can do with a data-only container, so that I can upgrade the application independently from the data.

So, is there something like a -v all-data:$ALL_VOLUMES that can map to multiple folders?

@thaJeztah
Copy link
Member

@nioncode no; you can't "nest" volumes, you'll have to assign a named volume for each folder, or (if you don't assign a named volume), docker creates "anonymous" / "unnamed" volumes for each volume that's declared in the Dockerfile

@lsgd
Copy link

lsgd commented Jun 16, 2016

@thaJeztah Is it also possible to automatically create data volumes with mount points? (Or do I just mix up auto-creation things?)

Do you also know how to use data volume containers with docker-compose?
Or do I have to manually create them beforehand?

@thaJeztah
Copy link
Member

@lukas-schulze if you mean bind-mounting a directory, that's a different thing: a bind-mounted directory (-v /some/path/on/host:/path/in/container) doesn't copy the data from inside the container to the volume, so may have a different effect.

If you want to easily access the volume data from your host, you code consider using a volume plugin for that https://docs.docker.com/engine/extend/plugins/#volume-plugins, for example, the "local persist" plugin allows you to specify an custom path where volumes are stored

aripalo added a commit to aripalo/docker-chat-demo that referenced this issue Sep 8, 2016
Define node_modules as named module so it will be automatically attached in one-off `docker-compose run` containers allowing `npm install` to have an effect without the need to rebuild containers.

This requires compose file version 2 syntax: https://docs.docker.com/compose/compose-file/#/version-2 (which I think requires Docker Engine 1.10.0 or newer)

For persisting with named modules, see:
- brikis98/docker-osx-dev#168 (comment)
- moby/moby#17798
@arvenil
Copy link

arvenil commented Feb 1, 2017

What's the best alternative to something like this:

docker create \
-v /var \
-v /bin \
-v /any/other/path \
--name data-xyz xyz /bin/true

docker run -d -p 80:80 --volumes-from data-xyz --name xyz xyz

In other words, data only container with multiple paths?

So far, best I could achieve is this:

docker run -d -p 80:80 \
-v data-xyz-var:/var \
-v data-xyz-bin:/bin \
-v data-xyz-any-other-path:/any/other/path \
--name xyz xyz

In other words for each directory I need separate volume?

@thaJeztah
Copy link
Member

@arvenil as a replacement for "data-containers" generally looks good to me. Without knowing your exact use case however;

  • do those paths belong in the same volume, or not; i.e. is the data "tied" together, or separate? If not tied together, having separate volumes for each directory may make sense
  • if these are actual examples; /var/ and /bin/ look very broad to use as a volume;
    • are all paths inside that actually intended to have a lifecycle independent of the container? /var/ contains a lot of directories; most of those are probably not related to the actual data for your container (/var/tmp, /var/run ?). Try to limit to the actual data you want to persist
    • /bin looks odd as well (i.e. if your container's binary/executable is in there; how do you upgrade the binary? if it's not data, it may not have to be in a volume, that allows you to upgrade the container to update the binary)

@arvenil
Copy link

arvenil commented Feb 2, 2017

@thaJeztah hah, bad choice of examples from my side :) I shouldn't use /var /bin :) Sorry. I meant something more generic like /xyz /abc /qwe. I think what I'm looking for is a Group Volume, or a root / volume (yes, I'm making those names). Right now if I create volume data-xyz:/some/path it creates a volume/dir and puts all the context of /some/path under it

mkdir /xyz
cd /xyz
cp /some/path .

I would rather have a volume that keeps the paths, so I could have multiple paths in one volume

mkdir /xyz
cd /xyz
mkdir -p some/path
cp /some/path ./some/path

Maybe something like this: docker run -d -p 80:80 -v data-var/var:/var

@nioncode
Copy link

nioncode commented Aug 8, 2018

@thaJeztah Are there any plans to support grouping volumes or should we create volumes one by one for each declared VOLUME in a Dockerfile?

For example, you start with a simple Dockerfile that just has a single VOLUME /data/vol1, so I start the container with -v vol1:/data/vol1. If then the Dockerfile gets updated to have a second volume /data/vol2, I have to change my run command to -v vol1:/data/vol1 -v vol2:/data/vol2.

It would be nice to have a -v all-data:$ALL_VOLUMES that transparently captures all declared VOLUMEs of the Dockerfile and puts them at their absolute path inside the all-data volume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests