Permalink
Please sign in to comment.
Browse files
Merge pull request #97 from broadinstitute/gvda_ga4gh_tools_phase1
GA4GH Tool Execution Challenge Phase 1
- Loading branch information...
Showing
with
106 additions
and 0 deletions.
- +76 −0 scripts/ga4gh/tool_execution_challenge_phase1/README.txt
- +3 −0 scripts/ga4gh/tool_execution_challenge_phase1/empty.options.json
- +3 −0 scripts/ga4gh/tool_execution_challenge_phase1/ga4ghMd5.json
- +22 −0 scripts/ga4gh/tool_execution_challenge_phase1/ga4ghMd5.wdl
- +1 −0 scripts/ga4gh/tool_execution_challenge_phase1/md5sum.input
- +1 −0 scripts/ga4gh/tool_execution_challenge_phase1/md5sum.txt
| @@ -0,0 +1,76 @@ | ||
| +This document describes the protocol that was used to submit the entry "Cromwell via Google Genomics Pipelines API wdl_runner" in the GA4GH Tool Execution Challenge, Phase 1 (https://www.synapse.org/#!Synapse:syn8080305). | ||
| + | ||
| +This protocol is based on the "Broad Institute GATK on Google Genomics" tutorial at https://cloud.google.com/genomics/v1alpha2/gatk, which demonstrates how to run a WDL pipeline through the Google Genomics Pipelines API wdl_runner. | ||
| + | ||
| +To replicate this work, change the GCS bucket paths and set the Google Project and local paths (which are set using environment variables in the protocol below) as appropriate. | ||
| + | ||
| + | ||
| +### Obtain challenge materials | ||
| + | ||
| +1. Get the raw WDL from https://raw.githubusercontent.com/briandoconnor/dockstore-tool-md5sum/master/Dockstore.wdl and save to file ga4ghMd5.wdl | ||
| + | ||
| +2. Get the JSON content {"ga4ghMd5.inputFile": "md5sum.input"} and save to file ga4ghMd5.json | ||
| + | ||
| +3. Get the input text "this is the test file that will be used when calculating an md5sum" and save to file md5sum.input | ||
| + | ||
| + | ||
| +### Prepare for execution | ||
| + | ||
| +4. Add files to WDL repository under scripts/ga4gh/tool_execution_challenge_phase1 (need local copy of WDL and JSON to invoke in the execution command, plus a dummy workflow options file). | ||
| + | ||
| +5. Upload files to GCS at gs://ga4gh-tool-execution-challenge/phase1 (need copy of input file on GCS). | ||
| + | ||
| + | ||
| +### Execute through Google Genomics Pipelines API | ||
| + | ||
| +6. Run the following command: | ||
| + | ||
| +```` | ||
| +gcloud alpha genomics pipelines run \ | ||
| + --pipeline-file workflows/wdl_pipeline.yaml \ | ||
| + --zones us-central1-f \ | ||
| + --logging gs://ga4gh-tool-execution-challenge/phase1/logs \ | ||
| + --inputs-from-file WDL=${WDL_SCR}/ga4gh/tool_execution_challenge_phase1/ga4ghMd5.wdl \ | ||
| + --inputs-from-file WORKFLOW_INPUTS=${WDL_SCR}/ga4gh/tool_execution_challenge_phase1/ga4ghMd5.json \ | ||
| + --inputs-from-file WORKFLOW_OPTIONS=${WDL_SCR}/ga4gh/tool_execution_challenge_phase1/empty.options.json \ | ||
| + --inputs WORKSPACE=gs://ga4gh-tool-execution-challenge/phase1/work \ | ||
| + --inputs OUTPUTS=gs://ga4gh-tool-execution-challenge/phase1/out \ | ||
| + --project $P_OUTREACH | ||
| +```` | ||
| + | ||
| +where ${WDL_SCR} is a local path to the WDL scripts directory. | ||
| + | ||
| +This returns `Running [operations/EOeigKSoKxi10d7o7tjQmHYg7qv-spECKg9wcm9kdWN0aW9uUXVldWU].` | ||
| + | ||
| +The status of the job can be monitored by running the following command: | ||
| + | ||
| +```` | ||
| +gcloud alpha genomics operations describe EOeigKSoKxi10d7o7tjQmHYg7qv-spECKg9wcm9kdWN0aW9uUXVldWU \ | ||
| + --format='yaml(done, error, metadata.events)' | ||
| +```` | ||
| + | ||
| +When the job has completed successfully, this is what is returned: | ||
| + | ||
| + | ||
| +```` | ||
| +done: true | ||
| +metadata: | ||
| + events: | ||
| + - description: start | ||
| + startTime: '2017-02-28T11:35:14.458414638Z' | ||
| + - description: pulling-image | ||
| + startTime: '2017-02-28T11:35:14.458467173Z' | ||
| + - description: localizing-files | ||
| + startTime: '2017-02-28T11:35:56.281473416Z' | ||
| + - description: running-docker | ||
| + startTime: '2017-02-28T11:35:56.281499402Z' | ||
| + - description: delocalizing-files | ||
| + startTime: '2017-02-28T11:37:21.569738442Z' | ||
| + - description: ok | ||
| + startTime: '2017-02-28T11:37:23.273628610Z' | ||
| +```` | ||
| + | ||
| +The output file md5sum.txt is written at gs://ga4gh-tool-execution-challenge/phase1/out and contains the result, 00579a00e3e7fa0674428ac7049423e2. | ||
| + | ||
| + | ||
| + |
| @@ -0,0 +1,3 @@ | ||
| +{ | ||
| + | ||
| +} |
| @@ -0,0 +1,3 @@ | ||
| +{ | ||
| + "ga4ghMd5.inputFile": "gs://ga4gh-tool-execution-challenge/phase1/md5sum.input" | ||
| +} |
| @@ -0,0 +1,22 @@ | ||
| +task md5 { | ||
| + File inputFile | ||
| + | ||
| + command { | ||
| + /bin/my_md5sum ${inputFile} | ||
| + } | ||
| + | ||
| + output { | ||
| + File value = "md5sum.txt" | ||
| + } | ||
| + | ||
| + runtime { | ||
| + docker: "quay.io/briandoconnor/dockstore-tool-md5sum:1.0.2" | ||
| + cpu: 1 | ||
| + memory: "512 MB" | ||
| + } | ||
| +} | ||
| + | ||
| +workflow ga4ghMd5 { | ||
| + File inputFile | ||
| + call md5 { input: inputFile=inputFile } | ||
| +} |
| @@ -0,0 +1 @@ | ||
| +this is the test file that will be used when calculating an md5sum |
| @@ -0,0 +1 @@ | ||
| +00579a00e3e7fa0674428ac7049423e2 |
0 comments on commit
a8b6234