Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is there volume for data in the first place? #255

Open
etki opened this issue Jan 11, 2017 · 24 comments
Open

Why is there volume for data in the first place? #255

etki opened this issue Jan 11, 2017 · 24 comments
Labels
question Usability question, not directly related to an error with the image Request Request for image modification or feature

Comments

@etki
Copy link

etki commented Jan 11, 2017

Hi.
My inquiry may seem strange, but i really don't get it.
Why does MySQL Dockerfile contain VOLUME directive? From my perspective, there are more cons than pros:

  • (+) Users that did not specify host mount on startup have a chance to recover their data
  • (-) Anonymous volume is terribly hard to search for when container is gone and you have literally hundreds of them
  • (-) Volumes consume free space at dramatic speeds. Because htere is no GC. Because they were made to not to be garbage collected, and that means they should not be created automagically, because, in turn, that means they would require garbage collection.
  • (-) Users that persist their data will certainly do a regular host mount that renders volume useless.
  • (-) Users that do tons of CI builds a day and thought they can finally forget about data bloat when using containers, they, well, are highly annoyed when they discover the consequences.
  • (-) You can add a volume later, but you literally can't cancel declared volume. And if you mount it to your host - well, if that's remote host, you have no chance at cleaning volume at the end of the build.

I know i'll cause huge 'watch it, kid!' next second, but shouldn't it be dropped? I really don't see any huge benefits over drawbacks it brings in.

@ltangvald
Copy link
Collaborator

Volumes will also be faster than using the container's internal storage, but I agree it tends to very quickly clutter up the disk with anonymous volumes, since starting the container without any volumes specified is something I at least mostly do for quick testing.
@tianon @yosifkit
What are your thoughts on this? While volume cleanup has become simpler with the «docker volume» commands, do we need to have /var/lib/mysql a volume by default?

@yosifkit
Copy link
Member

yosifkit commented Jan 12, 2017

I just use a series of alias's and periodically cleanup stopped containers, unused volumes, and dangling images.

alias dclean='docker ps -aq | xargs --no-run-if-empty docker rm'
alias dcleanvol="docker volume ls | awk '/^local/ { print \$2 }' | xargs --no-run-if-empty docker volume rm"
alias ddangling='docker images --filter dangling=true -q | sort -u | xargs --no-run-if-empty docker rmi'

With upcoming docker 1.13.0 there will be built-in commands with docker container prune, docker volume prune, docker image prune.

Edit: If you do docker rm -v mysql-container it will also clean up the volumes associated with the stopped container you are deleting. It is automatic on a docker run -it --rm.

@etki
Copy link
Author

etki commented Jan 17, 2017

@yosifkit this is fine workaround for local (and, probably, swarm - haven't worked with it) containers, but as soon as nomad / kubernetes / other orchestration hero is hit, things go bad, sometimes you don't have easy automation for the node itself at all.

@shitalm
Copy link

shitalm commented Feb 8, 2017

Agree with etki. Biggest problem for us is we can't save data in the image itself. Why not leave it for the users to decide whether they want to use volumes or not. They can always add it later but as etki mentioned there's no way to remove it.

@schwamster
Copy link

we have the same problem as shitalm - we would like to add data to the image itself so we can e.g. prepare test/demo data or just to deliver static/readonly data. It tedious to have to copy dockerfiles, remove the volume instruction and build it ourselves... I could also live with a separate image with a tag - 5.7-no-volume

@tianon
Copy link
Member

tianon commented May 9, 2017 via email

@etki
Copy link
Author

etki commented May 10, 2017

@tianon it won't. The data will be stored in other place, but untracked directory will still be created on the host, consuming an inode. This is not as bad, but still something that has zero positive effect.

@armpogart
Copy link

As far as I know it's now recommended by docker team not to declare volumes in base images (or in this case official images). It would be better in my opinion to document the usage of volume, but not declare it in the image as it depends on the end user case, how he/she is going to store the data (local vs production cases).

@ltangvald
Copy link
Collaborator

@tianon, @yosifkit
Should we just remove it? The only downside I see is that people doing very basic testing (as most proper use will generally map the data directory anyway) would experience worse performance?

@ltangvald
Copy link
Collaborator

Actually, there is one other effect: https://docs.docker.com/engine/reference/builder/#notes-about-specifying-volumes
When installing the standard Debian packages (5.7 and older), /var/lib/mysql will be populated with a database as part of package installation. Since /var/lib/mysql is declared a volume that database will then be discarded.

If we just drop the VOLUME statement, the database will still be there, and no database initialization will be performed for basic testing of the image. I don't think this would require anything more than clearing out the directory after installing, though.

@yosifkit
Copy link
Member

I feel like removing the volume would break many users that rely on the volume when using docker-compose. Compose tries hard to keep the volume between restarts of the container to persist the data and these users would suddenly see new deployments unable to survive a re-creation. My opinion is that if it is such a problem to have a volume defined, then docker needs to provide an unvolume/"don't use any defined volumes" (via Dockerfile and docker run) and users should set automatic volume and image deletion if space/inode usage is a problem.

I would think that most users would rather their database data preserved by default rather than discovering that their data has been automatically deleted when they did a docker-compose restart (or docker stack deploy) after bumping a database version number.

There is not a good alternative for telling the user where persistent data lives. Labels are not standardized and many users skip over the Docker Hub documentation.

Being able to build an image that ships with a database already initialized is still possible and the automatic volume would be left empty.

FROM mysql:5.7
CMD ["--datadir=/sql"]
# assuming ./sql-datadir contains an already initialized database
COPY ./sql-datadir/* /sql/
# on startup the entrypoint script will detect the already initialized database and start right up
# leaving /var/lib/mysql empty

or.... without having to use a different data directory:

FROM mysql:5.7
# ./sql-datadir contains a database dump of *.sql files
COPY ./sql-datadir/* /docker-entrypoint-initdb.d/
# initdb logic will restore the database via the sql files in alphanumeric order on first container start
# users will have to `docker rm -vf sql-container` when a new image is pulled with a new database dump

@ltangvald, as for the automatic population of /var/lib/mysql/ by the apt package, that is already deleted as soon as it is created (since the volume is declared later).

@bflad
Copy link

bflad commented Jul 6, 2017

Would it be simple to tag the image twice for both use cases? e.g. do everything the same sans VOLUME in the Dockerfile and tag it something like #-no-volume (naming is hard) then simply have another Dockerfile do the below and tag with the existing tags:

FROM mysql:#-no-volume
VOLUME ["/var/lib/mysql"]

Image behavior stays the same for existing tags while we allow the other use case for those who want it.

@ltangvald
Copy link
Collaborator

@yosifkit I hadn't considered the compose use case
I agree this would probably be too big a behavior change to the existing images.

@bflad In general I don't think we want more files to maintain (though it's simple enough), but when/if we get a template system in place (discussed in issue #289) this might be an option.

@codycraven
Copy link

codycraven commented Dec 26, 2017

@yosifkit I agree with you regarding an "UNVOLUME" command, however I don't see Docker implementing that anytime in the near future.

Until that occurs we're basically stuck telling educated Docker users that they need to go copy the Dockerfile from the MySQL image that they want and create their own image with the VOLUME line commented/deleted. Preventing the user from automatically receiving potential security updates or writing a script to automate the process (which makes me uneasy, but I have seriously considered it...).

I'm a heavy user of Compose (doing a lot of local-dev with Docker) and would have been perfectly fine seeing the documentation on Docker Hub stating that I need to define a volume in my run command or docker-compose service.

I know you stated that many users skip over the Docker Hub documentation, but the image is already relatively useless if you don't scroll down to read the section regarding environment variables. The Compose/Stack documentation appears before that section, which could certainly include a sample Volume definition with a comment above it, something like:

# Use root/example as user/password credentials
version: '3.1'

services:

  db:
    image: mysql
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: example
    # Use a volume to support persistent storage on container restart.
    volumes:
      - data-volume:/var/lib/mysql

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080

volumes:
  data-volume:

I'd be happy to write a suggestion for the "Where to Store Data" section as well, if that's a hangup.

@thaJeztah
Copy link

however I don't see Docker implementing that anytime in the near future.

If someone wants to work on that, it may be implemented, see moby/moby#3465 (comment) and moby/moby#3465 (comment)

Nobody so far offered working on it though

@ufoscout
Copy link

ufoscout commented Mar 22, 2018

The request for docker to support an "UNSET" feature is there only to help people to cope with bad images.
In addition, it is a workaround that will force everyone to create custom images to unset something that should have never been set.
Setting an anonymous volume is clearly a bad practice everywhere discouraged. In my company, we use lots of different database docker images and the MySQL ones are the only ones with this annoying problem.

About this sentence:

I feel like removing the volume would break many users that rely on the volume when using docker-compose.

it is completely wrong.

In every company and project I have ever worked, when it is desired to persist data between docker restarts, either you don't delete the container or you explicitly mount a volume. I have never seen someone relying on (or being in love with) anonymous volumes in the real world.

If you don't want to break the (frustrating) behavior of this image, you should really adopt another tag and offer both the alternatives.
Anyway, from my point of view, the default tags (e.g. "5.7") should offer the behavior that everybody expects, which is without the volume; then you can extend the default image adding the VOLUME option and offer another specific tag (e.g. "5.7-persistent" or whatever). Obviously, this should be clearly reported and highlighted in the documentation.

@wglambert wglambert added the question Usability question, not directly related to an error with the image label Apr 24, 2018
@wglambert wglambert added the Request Request for image modification or feature label May 3, 2018
@t-hofmann
Copy link

I would second the request to remove the volume definition from the Dockerfile.
Dockerfiles should merely define how an image is built (built-time configuration) and not how a container is run (runtime configuration), and I deem the definition of volumes and also ports as runtime configuration.

As a user of docker-compose I see the built- and runtime-configuration nicely separated, the docker-compose.yml refers to the build environment (including the Dockerfile) for the underlying image and it allows to define the runtime configuration of the actual container (including, volumes and ports).

@alexlatchford
Copy link

Just hit this issue as well, took several hours of a junior devs time before we found the underlying cause as we didn't think this would be included by default and was a big surprise. Having a no-volume tag would suffice for me as well understanding the compatibility concerns but I guess we're stuck with a forked Dockerfile for now.

@swen128
Copy link

swen128 commented Mar 26, 2021

I'm confused about this part:

I feel like removing the volume would break many users that rely on the volume when using docker-compose. Compose tries hard to keep the volume between restarts of the container to persist the data and these users would suddenly see new deployments unable to survive a re-creation.

docker-compose up after docker-compose down creates a new anonymous volume. While older volume is preserved on host's disk, it is not reused by a re-created container unless explicitly specified.

How is docker-compose relevant to the problem? Could someone give me an example use case which will be affected by the removal of VOLUME direction?

@tianon
Copy link
Member

tianon commented Mar 26, 2021

$ cat Dockerfile
FROM bash
VOLUME /foo
$ cat docker-compose.yml
version: '3.8'
services:
  bash:
    build: .
    tty: true

$ docker-compose build
...
$ docker-compose up -d
Starting tmp_bash_1 ... done
$ docker-compose exec bash touch /foo/bar
$ docker-compose exec bash ls /foo
bar
$ docker-compose up -d --force-recreate
Recreating tmp_bash_1 ...
$ docker-compose exec bash ls /foo
bar

(Docker Compose works extra hard to keep even anonymous unspecified volumes around and attached to the appropriate container.)

@lucasbasquerotto
Copy link

lucasbasquerotto commented Oct 19, 2021

@tianon Couldn't you just add an explicit anonymous volume to the docker command or docker-compose file, like docker run -v /foo image, or in the docker-compose file:

volumes:
  - /foo

The above has the benefits of having the volume defined explicitly, so as to not catch people by surprise with lots of anonymous volumes that shouldn't have been created, and not reusing persisted data that should not have been persisted in the fist place (and also to not have the need of hacks like defining the mysql data directory in another place, that doesn't stop the volume creation, anyway).

Furthermore, as @ufoscout said:

In every company and project I have ever worked, when it is desired to persist data between docker restarts, either you don't delete the container or you explicitly mount a volume. I have never seen someone relying on (or being in love with) anonymous volumes in the real world.

Last year oracle removed volume from their official image, and I don't know about it impacting people negatively (although the mysql image is probably more used):

oracle/docker-images#640 (comment)

@matthiash82

This comment was marked as spam.

@christian-andersson

This comment was marked as spam.

@jamiejackson

This comment was marked as abuse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Usability question, not directly related to an error with the image Request Request for image modification or feature
Projects
None yet
Development

No branches or pull requests