Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host-mounted volumes in Dockerfile's VOLUME #3156

Closed
wking opened this issue Dec 10, 2013 · 24 comments
Closed

host-mounted volumes in Dockerfile's VOLUME #3156

wking opened this issue Dec 10, 2013 · 24 comments
Labels
area/builder area/volumes kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Comments

@wking
Copy link

wking commented Dec 10, 2013

In #1124, there was an explicit decision to not allow host-mounted volumes via the VOLUME command. However, I think I have a good use case for supporting host-mounts via VOLUME. Gentoo stores it's package repository under /usr/portage. I'm trying to set up a workflow like:

$ cat Dockerfile
FROM gentoo
VOLUME ["/usr/portage:/usr/portage:ro", "/usr/portage/distfiles:/usr/portage/distfiles:rw"]
RUN emerge sys-process/vixie-cron app-admin/syslog-ng app-admin/logrotate
RUN rc-update add syslog-ng default
RUN rc-update add vixie-cron default
$ docker import - gentoo < stage3-amd64-20131205.tar.bz2
$ docker build -t gentoo-syslog .

However, VOLUME doesn't support that, and docker build … doesn't support the -v tag. The end goal is to get a production image that doesn't include /usr/portage, since the package repository just bloats the production container. I can think of a few workarounds including NFS-mounting /usr/portage during the build or using:

$ docker run -v /usr/portage:/usr/portage:ro -v /usr/portage/distfiles:/usr/portage/distfiles:rw gentoo emerge …
$ docker commit …

for each of the build steps, but native Dockerfile support would be nice.

@SvenDowideit
Copy link
Contributor

this might be an interesting candidate for an example of a volume-container

   #start the volume that container the portage
   $ docker run -d -v /usr/portage -name portage-vol busybox
   # 
   $ docker run -volumes-from portage-vol -v /usr/portage/distfiles -name emerge gentoo emerge …
   $ docker commit emerge image-name
   $ docker rm emerge

but i agree, its a shame that docker build doesn't share the -v param

@tianon
Copy link
Member

tianon commented Dec 11, 2013

+1 - I'm not sure what the best compromise is, but have run into this quandary myself for sure

When I first created tianon/gentoo, I realized I didn't want to encourage people to be doing emerge --sync in their images (since that's a really great way to get banned by the portage mirrors), and thus never included a copy of the portage tree, even more so because it would always be stale in a matter of days and just further encourage bad emerge --sync habits.

This is exactly why I haven't updated tianon/gentoo in a while, and haven't given any more serious thought to creating a standard "gentoo" base image out of the stage3s, because I couldn't come up with a reasonable compromise that wouldn't be "just for Gentoo" and have other people doing bad things with, like the old EXPOSE syntax allowed.

@wking
Copy link
Author

wking commented Dec 11, 2013

On Wed, Dec 11, 2013 at 07:08:51AM -0800, Tianon Gravi wrote:

I'm not sure what the best compromise is, …

Do we need to compromise? Extending the -v syntax to VOLUME or
build just gives people the power they already have with RUN in
the build stage.

When I first created tianon/gentoo, I realized I didn't want to
encourage people to be doing emerge --sync in their images (since
that's a really great way to get banned by the portage mirrors), and
thus never included a copy of the portage tree, even more so because
it would always be stale in a matter of days and just further
encourage bad emerge --sync habits.

Gentoo makes it easy to setup a local Portage mirror, from which you
can sync all your images. I don't think that's a good approach for
Docker though, because layering Portage syncs in storage is going to
be really ugly. For me, it makes more sense to just rebuild your
target container from scratch whenever you want to update Portage.
One more reason for the Portage tree to not be in the image itself ;).

For what it's worth, my build script is here:
https://github.com/wking/dockerfile

@shykes
Copy link
Contributor

shykes commented Dec 11, 2013

Builds must be host-independent. If you want to use a particular portage tree as part of the build, simply add it to your container. You could use a Gentoo base image, and extend it as needed.

Whatever gain you think you'll get from mount-binding whatever is on the host for a build, it will pale in comparison to destroying the entire premise of why containers are useful.

@shykes shykes closed this as completed Dec 11, 2013
@wking
Copy link
Author

wking commented Dec 11, 2013

On Wed, Dec 11, 2013 at 10:19:21AM -0800, Solomon Hykes wrote:

Builds must be host-independent.

What I'm suggesting is as host independent as downloading packages
from a mirror before installing. A host-mounted volume is just an
easy way to use the local host as the mirror, which means:

  • faster builds for me
  • less bloat in the container

I'm not sure who you're protecting by avoiding host-mounted volumes
during the build. Surely, folks who use them in there Dockerfile
will mention that when they distribute the Dockerfile [1]? Is there
a build service that converts Dockerfiles to images, and wants to
limit external dependencies to the public network?

@tianon
Copy link
Member

tianon commented Dec 11, 2013

Yes, see the "Trusted Builds" feature on index.docker.io. :)

@wking
Copy link
Author

wking commented Dec 11, 2013

On Wed, Dec 11, 2013 at 01:19:32PM -0800, Tianon Gravi wrote:

Yes, see the "Trusted Builds" feature on index.docker.io. :)

Ah, the lack of host-mounted volumes makes more sense in this context
;). If host-mounting is out, can we come up with a different solution
for linking to mirrored content during the build? For example, with a
portage-snapshot image at portage-snapshot, you could build with:

FROM gentoo
VOLUMES-FROM portage-snapshot
RUN emerge sys-process/vixie-cron app-admin/syslog-ng app-admin/logrotate
RUN rc-update add syslog-ng default
RUN rc-update add vixie-cron default

To volume-mount from an existing trusted build. That way the
external-mirror-to-build-host copy only needs to happen once (building
portage-snapshot) instead of every time someone runs emerge --sync
in a trusted-build-Dockerfile. It would also let you lock in a
particular snapshot portage-snapshot:20131211, or leave it floating
at portage-snapshot:latest.

I don't know how the trusted build processing works, maybe from
Docker's perspective a mirror sync in each Dockerfile would be as
efficient as a volumes-from mount?

@tianon
Copy link
Member

tianon commented Dec 11, 2013

I really like the idea of pulling the portage data from a separate image, and definitely agree that something is necessary for Gentoo as an image base to be even remotely possible. It's not only bad etiquette to emerge --sync more than once per day to a particular mirror, but many of them will outright autoban you for doing so, because it's a reasonably taxing operation (open rsync on a fairly large directory of files).

It also takes long enough that I think it's unreasonable to do in each Dockerfile, because unlike "apt-get update", it effectively has to download a build/installation script for every package available in the tree (that's the nature of the beast).

@wking
Copy link
Author

wking commented Dec 11, 2013

On Wed, Dec 11, 2013 at 01:43:18PM -0800, W. Trevor King wrote:

It would also let you lock in a particular snapshot
portage-snapshot:20131211, or leave it floating at
portage-snapshot:latest.

This may not actually work with Trusted Builds, where tags look like a
'configure-once' thing 1. It would be nice if you could set up a
Trusted Build to only build tagged releases, and to tag the resulting
images with the same tag as the GitHub release. As it stands, you'd
need to bump the tag in the Trusted Build UI by hand before pushing
your new release…

I don't think that's a deal-breaker for the Portage-mirror use case
though.

@wking
Copy link
Author

wking commented Dec 14, 2013

On Wed, Dec 11, 2013 at 01:43:18PM -0800, W. Trevor King wrote:

If host-mounting is out, can we come up with a different solution
for linking to mirrored content during the build? For example, with
a portage-snapshot image at portage-snapshot, you could build
with:

FROM gentoo
VOLUMES-FROM portage-snapshot
RUN emerge sys-process/vixie-cron app-admin/syslog-ng app-admin/logrotate
RUN rc-update add syslog-ng default
RUN rc-update add vixie-cron default

To volume-mount from an existing trusted build.

I've added ${NAMESPACE}/portage image creation to my Dockerfile repo
1, and I can successfully emerge files after -volumes-from mounting
containers based on that image. That just leaves Dockerfile support
for VOLUMES-FROM to get this working. I expect mounting buildtime
tools and data like this without polluting the target image will be
useful beyond Gentoo, so VOLUMES-FROM may have better odds of landing
in Docker ;).

I think usage should mirror FROM:

VOLUMES-FROM <image>[:<tag>]

This would add the volume image names to the built-image's metadata
(like ENV), and subsequent:

docker run <image>[:<tag>]

invocations would look up VOLUMES-FROM images in the metadata and spin
them up in temporary containers. The temporary containers would be
volume-mounted on the target container, and killed and removed after
the target container closed.

There's no UNSET-ENV equivalent to pattern on, but I think we would
want an UNMOUNT-VOLUMES-FROM so Dockerfiles could clean up after
themselves or shuffle toolchains for different parts of a multi-stage
build. An UNMOUNT-VOLUMES-FROM command would just remove the matching
VOLUMES-FROM entry from the metadata (altough it would still be in the
image's history).

An obvious extension of this would be, “I want to spin up a server and
bind to it's EXPOSEd ports for my build”. Perhaps we want a
generalized:

ADD-SERVER {-volumes-from|-link } [:]
REMOVE-SERVER {-volumes-from|-link } [:]

but I'd be happy with just VOLUMES-FROM ;). Thoughts?

@d11wtq
Copy link

d11wtq commented Feb 16, 2014

👍 I would like to use Gentoo as a base image but definitely don't want > 1GB of Portage tree data to be in any of the layers once the image has been built. You could have some nice a compact containers if it wasn't for the gigantic portage tree having to appear in the image during the install.

@d11wtq
Copy link

d11wtq commented Mar 7, 2014

Here's another suggestion. You'd incur some bandwidth overhead having to emerge-webrsync each time an image is built, but you'd avoid publishing /usr/portage into the image.

FROM       gentoo
MAINTAINER Bob

EXCLUDE /usr/portage

RUN emerge-webrsync
RUN emerge sys-devel/llvm
RUN emerge sys-devel/clang --autounmask-write
RUN etc-update --automode -5
RUN emerge sys-devel/clang

@wking
Copy link
Author

wking commented Mar 7, 2014

On Thu, Mar 06, 2014 at 04:51:42PM -0800, d11wtq wrote:

Here's another suggestion. You'd incur some bandwidth overhead
having to emerge-webrsync each time an image is built, but you'd
avoid publishing /usr/portage into the image.

That's a good idea. With a package-cache container [1,2] that
bandwidth would all be between containers, so no bogging down physical
wires. However, there's currently no way to pin emerge-webrsync to a
particular date, and I don't want my build results tied to whatever
the snapshot of the day happens to be. I want them pinned to a fixed
snapshot. I've got that snapshot in a local tarball, so I could ADD
it in each Dockerfile, but that's a lot of tarball unpacking to avoid
supporting a quick bind mount ;).

@fatherlinux
Copy link

I want something like a -v option for builds because I would like to bind mount the puppet client, and manifests during build time. The end result would be a pristine container image that would be configured with Puppet (et. al.) but would contain none of the bloat?

I get why you guys don't want to put it in the Dockerfile, but why limit it from the build command? These are two different things. People share Docker files, but I am not sure they need to share ALL build hosts?

I see the build command as something that IS specific to the build host, which is fine... I am sure I can imagine other use cases where a pristine docker image would be really nice...

Best Regards
Scott M

@hjwp
Copy link

hjwp commented May 7, 2015

+1.

@PatrickSteiner
Copy link

+1

@zrml
Copy link

zrml commented Jun 21, 2015

+1 @fatherlinux suggestion. Much needed here installing up from large .tar.zg exploded tarballs...

@cpuguy83
Copy link
Member

@fatherlinux @zrml Can you open an issue with your use-case rather than a possible solution?

@selurvedu
Copy link

+1.

@fatherlinux
Copy link

I will as soon as I get a chance...

----- Original Message -----

From: "Brian Goff" notifications@github.com
To: "docker/docker" docker@noreply.github.com
Cc: "Scott McCarty" scott.mccarty@gmail.com
Sent: Sunday, June 21, 2015 1:55:30 PM
Subject: Re: [docker] host-mounted volumes in Dockerfile's VOLUME (#3156)

@fatherlinux @zrml Can you open an issue with your use-case rather
than a possible solution?


Reply to this email directly or view it on GitHub .

@fatherlinux
Copy link

Done, created issue: #14251

@kaithar
Copy link

kaithar commented Dec 30, 2015

As a thought on this... for a short term work around, maybe this?

FROM portage-snapshot:latest
ADD stage3.tar.bz2
RUN emerge whatever
RUN rm -r /usr/portage/*

then volume over /usr/portage in docker create. It does have the downside of having an image hanging around for any images based off it though.
Would be interesting to see how much emerge --sync would alter the container size on a per day or per week basis though, then the duplication wouldn't be quite as bad and would only need building fresh on a longer cycle.

Definitely need the -v though for ephemeral stuff that shouldn't be in the images. Beyond the case of large filesets, adding and then removing something creates a file system layer so not only does bloats the image but also incurs the same problems as committing secrets to a VCS.
I'm trying to think of a use case of this... a central config management, something like salt maybe, where the client has to authenticate and you don't want the client side private key in the image. Or maybe a build artefact that you want to sign or encrypt but obviously don't want to include a private key for that in the image.

I can appreciate that these kind of environment specific but always required files do run against the idea of universal Dockerfiles, but ADD already does that.

@zerthimon
Copy link

I have a following use case:

I need to have a working ssh to run commands from Dockerfile during a build of image. The ssh server I'm connecting to uses ssh public key authorization.
There is an ssh-agent running on host that has all the necesarry private keys added.
I'd like to have the ssh-agent's socket available in build container diring the build so the ssh commands in Dockerfile would work.
I don't want to have ssh private key and config file in context dirs and I don't want to ADD it into the image just so I can run ssh commands.

Just sharing my pain.

@kbaegis
Copy link

kbaegis commented Feb 23, 2018

+1

@thaJeztah thaJeztah added area/builder kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/volumes labels Jun 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder area/volumes kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.
Projects
None yet
Development

No branches or pull requests