Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: volume mounting #2190

Open
vdauwera opened this issue Apr 21, 2017 · 14 comments
Open

Feature request: volume mounting #2190

vdauwera opened this issue Apr 21, 2017 · 14 comments
Labels
Needs Triage Ticket needs further investigation and refinement prior to moving to milestones

Comments

@vdauwera
Copy link
Contributor

To address the type of use case described in http://gatkforums.broadinstitute.org/gatk/discussion/comment/38188#Comment_38188

@katevoss katevoss removed their assignment May 13, 2017
@katevoss
Copy link

@vdauwera can you summarize the use case in the forum?

@vdauwera
Copy link
Contributor Author

I don't have a good handle on the details but it looked like @ChrisL understood it well.

@CarlosBorroto
Copy link

CarlosBorroto commented Oct 6, 2017

Found the forum entry looking for a solution for mounting a docker volumen.

In my case I would like to run Ensembl VEP with Cromwell/WDL. Using VEP in cache/offline mode has many advantages, among them much better performance. When running VEP in cache mode it is necessary to have a large set of files locally installed. Downloading these files using the provided INSTALL.pl will be very inefficient. I plan for now to tar everything together and download and untar from a google bucket every time I run the task. However, it would be much better if I could mount a docker volume to the container running the task.

The way I see it I would be able to define an snapshot in the runtime section of the task definition. I would also be able to define the mount point (docker run -v *:{mount point}) where this snapshot would be available as a docker volume. In the background Cromwell would provision a disk using the snapshot, mount it to the VM and use the correct docker run -v /path/to/disk:/requested/mount/point docker run command.

Hope this helps defining this issue.

Thanks for considering raising the priority of this.

@lbergelson
Copy link
Member

We have a very similar use case. We'd like to be able to run a different annotator that has a massive pile of data sources ~20gb. We want an easy way to package different sets of test files and make them available for people to use with our docker image, without having to make a 20gb docker image.

@Selonka
Copy link

Selonka commented Feb 28, 2018

Hi, the same problem as @CarlosBorroto ... !
Just wanted to push the issue!

@vinash85
Copy link

Hi, the same problem. It will be great addition to Cromwell. thnx

@jason-weirather
Copy link

jason-weirather commented Mar 14, 2018

Hello @vdauwera , I have a similar use case in Cromwell that I think this could cover. We specifically hope we can mount the type=tmpfs volume. This creates a ram disk which we use to unpack data that has tens of thousands of files very quickly.

Google describes how to do this in their docs
https://cloud.google.com/compute/docs/containers/configuring-options-to-run-containers#mounting_tmpfs_file_system_as_a_data_volume

We have had success using this in our Slurm Cromwell by launching the docker docker through submit ourselves and giving the docker run the parameter to mount

${'--mount type=tmpfs,destination='+mount_tmpfs}

It would be great if declaring a tmpfs mount point could also be supported by cromwell in google cloud submissions. Thanks!

@dinvlad
Copy link
Contributor

dinvlad commented Aug 17, 2018

+1 on tmpfs. Currently, we have to create a directory under /dev/ and rely on the assumption that that directory gets mounted by default as a tmpfs with 1/2 of the available RAM (at least on GCP). This is obviously not ideal. Delocalization of such files is problematic as well.

Our use case is exactly the same, to unpack/process tens or hundreds of thousands of small files (in a BCL). Doing so with any "normal" disk is much slower than with RAM.

@armedgorillas
Copy link

Hi -- I know this is an old issue, but has there been any further discussion on how to mount persistent disks? We're using PAPIv2 as the backend, and we'd like to expose reference databases (stored as filesystems) to our docker containers via a mounted volume.

@gemmalam gemmalam added the Needs Triage Ticket needs further investigation and refinement prior to moving to milestones label Mar 28, 2019
@Selonka
Copy link

Selonka commented Mar 28, 2019

Hi @armedgorillas,

you can bypass this by calling the docker-container from the task itself. I wrote about it in the GATK-Forum a while ago take a look at:
https://gatkforums.broadinstitute.org/gatk/discussion/comment/50056#Comment_50056

Greetings Selonka

@armedgorillas
Copy link

Thanks @Selonka! That looks like a nifty workaround.

Does this work with the Pipelines API backend, or just with a local backend?

@dinvlad
Copy link
Contributor

dinvlad commented Mar 29, 2019

I don’t think it works for the PAPI backend, because one needs to mount docker.sock into the container to be able to invoke Docker commands. I’m honestly a little surprised it even works locally.

@antonkulaga
Copy link
Contributor

so, no progress on this issue for years?

@valentynbez
Copy link

+1, similar issue needed for workflow to run with docker container & massive databases necessary for app to finish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Ticket needs further investigation and refinement prior to moving to milestones
Projects
None yet
Development

No branches or pull requests