Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

The utility of creating a volume per node #1529

Closed
etoews opened this issue Dec 10, 2015 · 11 comments
Closed

The utility of creating a volume per node #1529

etoews opened this issue Dec 10, 2015 · 11 comments

Comments

@etoews
Copy link
Contributor

etoews commented Dec 10, 2015

While researching issue #1528, it really made me wonder about the practical utility of Swarm creating a volume per node when doing a docker volume create. I realize that a volume per node is the expected behavior but what use case does that cover?

When I'm using Swarm, I want a place for my stuff. Doing a docker volume create seems ideal as it gives me a named volume where I can store persistent data (e.g. #1528). I can reference that volume later from other containers and not have to worry about orphaning the volume if any associated container gets removed. This seems to me like it's the primary use case for volumes.

But when Swarm creates a volume per node using the default driver, it seems to eliminate that use case. Now I have X volumes (depending on how many nodes I have) and I have no idea where my stuff is. It could be in any one of those volumes but I have no way of knowing which one much less being able to reference that one directly even if I did.

It seems to me we'll need more control over docker volume create in order to be able to effectively utilize volumes in Swarm.

Thoughts?

@kacole2
Copy link

kacole2 commented Dec 10, 2015

👍 for bringing this to light again. I don't see a use case for this either.

If you use a volume driver to talk to a persistent datastore "outside" of the host itself, then you are creating multiple volumes on the storage end-point.

We've found the only way around this is to use docker volume create --volume-driver xyz on a single host in a swarm cluster where the cli tool isn't talking to the swarm master. After it's created, then we can use another host that is pointed to the swarm master to run the container and mount the volume. Here's a sample of the workflow Lab IV: Persist a container's state and data

If it's the normal expected behavior by the machine, that's understandable. But I wouldn't consider it to be normal expected behavior from a user perspective.

@clintkitson
Copy link

In theory the create operation is supposed to be idempotent. This means that no matter how many times the same volume name is requested, it does not impact the backend storage platform. The reality is that if you have 30 hosts all requesting to create a volume at one time, then it can be a race condition to see which one actually successfully creates the volume.

Different storage drivers may have workflows to achieve a volume creation. In the case of EC2, the workflow involves creating a volume and then assigning metadata. So the idempotency would be violated here since a volume operation may proceed on all hosts since the volume by name doesn't exist yet, but then when it comes time to change the metadata other hosts will fail after the first host succeeds.

cc @cpuguy83

@cpuguy83
Copy link
Contributor

The reasoning for this is to make sure a container started with docker run specifying a named volume can be scheduled to any node w/o having some weird side-effects (like expecting a rex-ray volume but actually getting a local volume due to implicit creation).

btw, I'm inclined to disable implicit volume creation on docker run when a volume name is specified... for instance, if a user does docker run -v somename:/foo, and somename doesn't exist, instead of creating a volume somename implicitly, instead error out.

@kacole2
Copy link

kacole2 commented Dec 12, 2015

Personally, I would prefer that a volume is NOT created unless specified through the docker volume create command. One spelling mistake and then you end up with a container and a volume that needs to be killed and re-created.

@etoews
Copy link
Contributor Author

etoews commented Dec 14, 2015

@cpuguy83 But let's look just a bit beyond that initial docker run.

It a very common use case to separate your data volume from your service container. That way you can kill an existing service version X, upgrade that service to version X+1, and still connect to the exact same data volume.

However, if there are N volumes and you cannot uniquely identify them, that makes it impossible to connect service version X+1 to the exact same data volume that service X was connected to.

Also, I just confirmed that when added a new node to a Swarm cluster that has a volume created with docker volume create --name test does not create the volume test on the new node. So even the initial docker run can still fail if the container gets scheduled to the node without that volume.

@cpuguy83
Copy link
Contributor

@everett-toews This is up to the volume driver being used.
Using the local driver you are of course going to be limited to local uses.
Working on enhancing engine to actually query all registered drivers for volumes instead of assuming the volumes were created locally.
Here's the PR: moby/moby#16534

@glyph
Copy link

glyph commented Dec 17, 2015

👍 This behavior was hugely confusing to me. I'm just coming up to speed on Swarm, but I still don't quite understand all the implications of --volumes-from, and I was hoping that docker volume create would make this simpler. Apparently not :).

@BrianAdams
Copy link

Agree that this is confusing. Having the local driver just create a volume local to the requesting container makes sense. It would seem a reasonable limitation that you cannot schedule containers with local volumes on other nodes unless another volume with the same name has been created with some other workflow.

Also agree that I would prefer an explicit volume create and a volume affinity flag instead of implicitly creating the volume during docker run.

@itzg
Copy link

itzg commented Jan 19, 2016

As a long time Docker user who tended not to use data containers, because it felt clunky, I was pleased to see docker volume and to see that it "worked" on Swarm. Before @everett-toews pointed me here, I assumed that:

  • An explicit docker volume was needed for all referenced named volumes
  • When using the local volume driver, that would imply a dependency affinity like with --volumes-from=dependency, but in this case a node affinity rather than container affinity

and those two assumed constraints seemed fine to me. What I gather from the discussion above is that neither assumption is correct, but would enforcing those help pin down behavior?

@gittycat
Copy link

@itzg Yes, data containers are clunky but the alternative (duplicated volumes) is a can of worms; You can't tell where your data is and in which state of sync it is.

@nishanttotla
Copy link
Contributor

Closing due to lack of activity. Please reopen if you wish to continue discussing it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants