Cannot retrieve the latest commit at this time.
| .. | |||
| Failed to load latest commit information. | |||
|
|
README.txt | ||
|
|
empty.options.json | ||
|
|
ga4ghMd5.json | ||
|
|
ga4ghMd5.wdl | ||
|
|
md5sum.input | ||
|
|
md5sum.txt | ||
README.txt
This document describes the protocol that was used to submit the entry "Cromwell via Google Genomics Pipelines API wdl_runner" in the GA4GH Tool Execution Challenge, Phase 1 (https://www.synapse.org/#!Synapse:syn8080305). This protocol is based on the "Broad Institute GATK on Google Genomics" tutorial at https://cloud.google.com/genomics/v1alpha2/gatk, which demonstrates how to run a WDL pipeline through the Google Genomics Pipelines API wdl_runner. To replicate this work, change the GCS bucket paths and set the Google Project and local paths (which are set using environment variables in the protocol below) as appropriate. ### Obtain challenge materials 1. Get the raw WDL from https://raw.githubusercontent.com/briandoconnor/dockstore-tool-md5sum/master/Dockstore.wdl and save to file ga4ghMd5.wdl 2. Get the JSON content {"ga4ghMd5.inputFile": "md5sum.input"} and save to file ga4ghMd5.json 3. Get the input text "this is the test file that will be used when calculating an md5sum" and save to file md5sum.input ### Prepare for execution 4. Add files to WDL repository under scripts/ga4gh/tool_execution_challenge_phase1 (need local copy of WDL and JSON to invoke in the execution command, plus a dummy workflow options file). 5. Upload files to GCS at gs://ga4gh-tool-execution-challenge/phase1 (need copy of input file on GCS). ### Execute through Google Genomics Pipelines API 6. Run the following command: ```` gcloud alpha genomics pipelines run \ --pipeline-file workflows/wdl_pipeline.yaml \ --zones us-central1-f \ --logging gs://ga4gh-tool-execution-challenge/phase1/logs \ --inputs-from-file WDL=${WDL_SCR}/ga4gh/tool_execution_challenge_phase1/ga4ghMd5.wdl \ --inputs-from-file WORKFLOW_INPUTS=${WDL_SCR}/ga4gh/tool_execution_challenge_phase1/ga4ghMd5.json \ --inputs-from-file WORKFLOW_OPTIONS=${WDL_SCR}/ga4gh/tool_execution_challenge_phase1/empty.options.json \ --inputs WORKSPACE=gs://ga4gh-tool-execution-challenge/phase1/work \ --inputs OUTPUTS=gs://ga4gh-tool-execution-challenge/phase1/out \ --project $P_OUTREACH ```` where ${WDL_SCR} is a local path to the WDL scripts directory. This returns `Running [operations/EOeigKSoKxi10d7o7tjQmHYg7qv-spECKg9wcm9kdWN0aW9uUXVldWU].` The status of the job can be monitored by running the following command: ```` gcloud alpha genomics operations describe EOeigKSoKxi10d7o7tjQmHYg7qv-spECKg9wcm9kdWN0aW9uUXVldWU \ --format='yaml(done, error, metadata.events)' ```` When the job has completed successfully, this is what is returned: ```` done: true metadata: events: - description: start startTime: '2017-02-28T11:35:14.458414638Z' - description: pulling-image startTime: '2017-02-28T11:35:14.458467173Z' - description: localizing-files startTime: '2017-02-28T11:35:56.281473416Z' - description: running-docker startTime: '2017-02-28T11:35:56.281499402Z' - description: delocalizing-files startTime: '2017-02-28T11:37:21.569738442Z' - description: ok startTime: '2017-02-28T11:37:23.273628610Z' ```` The output file md5sum.txt is written at gs://ga4gh-tool-execution-challenge/phase1/out and contains the result, 00579a00e3e7fa0674428ac7049423e2.