Nested scatter blocks #838

stekaz · 2016-05-17T05:04:23Z

It would be great if Cromwell supported nested scatter blocks. For example:

workflow wf {

  Array[Array[String]] array

  scatter (arr in array) {
    scatter (i in arr) {
      call foo {
        input: str=i
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

sooheelee · 2016-09-19T17:41:29Z

I want to

(i) scatter scatters as requested by @skazakoff above (aka "nested scatter") and also

(ii) re-array and scatter again (what I'm calling criss-cross scatter)

As I show in this diagram:

My attempt to ask WDL/Cromwell to do the first scatter-of-scatter above gives the following error:

[2016-09-02 16:30:43,699] [error] WorkflowActor [e1b18d33]: Call failed to initialize: Failed to start call: Nested Scatters are not supported (yet).

It was unclear how I would go about criss-cross scattering.

My current solution around this is to run two separate WDL workflows. Between the two workflows, I use a simple python script to organize the files for the second scatter. The first WDL workflow is per sample, while the second WDL workflow acts on all the outputs of the multiple runs of the first workflow (multiple samples).

Example of workflows that I want to run within a single WDL workflow

These are scripts that run on the cloud that I would like to run within a single WDL script using the nest-scatter and criss-cross-scatter features. Currently, I run the first script per sample (1) for multiple samples, then run a helper script to organize all the outputs (1.5), and finally run the second WDL script across the multi-sample outputs per genomic interval (2).

I would like to be able to scatter the samples, scatter per interval, then run a differently organized (criss-cross) scatter across all the samples per interval.

(1) First WDL workflow

UltimateScatterHaplotypeCaller_cloud_quicktest.wdl

# ScatterHaplotypeCaller.wdl #############################################################
# Must use GATK v3.6, especially with hg38 where contig names have colons,
# as a bug in prior versions of GATK strips this off.
# Each BAM file represents a single different sample.
# Option to include an additional file to HaplotypeCaller run. If an intervals list is
# supplied here, then the intersection of this will be used against the scattered
# intervals.
#########################################################################################

# TASK DEFINITIONS ######################################################################

# Call variants on a single sample with HaplotypeCaller to produce a GVCF
# Here allowing scattering over contigs from one interval list,
# and allowing application of a second OPTIONAL intervals list.
# The intersection of the two is used instead of a union so we can be more discerning
# of processed regions and so runs can finish in shorter times.
# Make the application of additional options, e.g. read filters, optional.

task InvertIntervals {
    File? intersect_intervals_file
    Int disk_size
    Int preemptible_tries

    command {
    java -Xmx2g -jar /usr/gitc/picard.jar IntervalListTools \
      I=${intersect_intervals_file} \
      INVERT=true \
      O=inverted.interval_list
    }
runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "3 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
      File exclude_intervals = "inverted.interval_list"
    }
}

task HaplotypeCaller {
    File input_bam
    File bam_index
    String sample_basename
    String starting_contig
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    Float? contamination
    String? additional_options
    String? optional_XL
    File? exclude_intervals
    Array [String] scatter_intervals_group
    Int disk_size
    Int preemptible_tries

  command {
    java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx800m \
      -jar /usr/gitc/GATK36.jar \
      -T HaplotypeCaller \
      -R ${ref_fasta} \
      -o ${sample_basename}.${starting_contig}.vcf.gz \
      -I ${input_bam} \
      -L ${sep=" -L " scatter_intervals_group} \
      -ERC GVCF \
      -variant_index_parameter 128000 \
      -variant_index_type LINEAR \
      -contamination ${default=0 contamination} \
      ${additional_options} \
      ${optional_XL} ${exclude_intervals}
  }
  runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "10 GB"
    cpu: "1"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
  output {
    File output_gvcf = "${sample_basename}.${starting_contig}.vcf.gz"
    File output_gvcf_index = "${sample_basename}.${starting_contig}.vcf.gz.tbi"
  }
}

# WORKFLOW DEFINITION ###################################################################
workflow ScatterHaplotypeCaller {
    File input_bam
    File input_bam_index
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    Float? contamination
    String? additional_options
    String? optional_XL
    File? intersect_intervals_file
    String bam_directory_path
    Int agg_preemptible_tries
    Int agg_small_disk
    Array [Array [String]] scatter_intervals

# Create inverted intervals that will be skipped by HaplotypeCaller
    call InvertIntervals {
    input:
    intersect_intervals_file = intersect_intervals_file,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }

# Get sample_basename
# The script defines the output sample_basename automatically from the BAM basename
    String sub_strip_path = bam_directory_path
    String sub_strip_extension = "\\.bam"

# Call variants in parallel over grouped calling intervals
    scatter (group in scatter_intervals) {

    # Generate GVCF by interval
    call HaplotypeCaller {
    input:
    input_bam = input_bam,
    bam_index = input_bam_index,
    scatter_intervals_group = group,
    sample_basename = sub(sub(input_bam, sub_strip_path, ""), sub_strip_extension, ""),
    starting_contig = group[0],
    ref_dict = ref_dict,
    ref_fasta = ref_fasta,
    ref_fasta_index = ref_fasta_index,
    contamination =  contamination,
    additional_options = additional_options,
    optional_XL = optional_XL,
    exclude_intervals = InvertIntervals.exclude_intervals,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }
    }

# Outputs that will be retained when execution is complete
    output {
    HaplotypeCaller.*
    }
}

UltimateScatterHaplotypeCaller_cloud_quicktest.json

{
  "ScatterHaplotypeCaller.input_bam": "gs://shlee-dev/simulated_bam/papa_hg37_snaut.bam",
  "ScatterHaplotypeCaller.input_bam_index": "gs://shlee-dev/simulated_bam/papa_hg37_snaut.bai",
  "ScatterHaplotypeCaller.bam_directory_path": "gs://shlee-dev/simulated_bam/",
  "ScatterHaplotypeCaller.ref_fasta": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta",
  "ScatterHaplotypeCaller.ref_fasta_index": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta.fai",
  "ScatterHaplotypeCaller.ref_dict": "gs://shlee-dev/ref/b37/human_g1k_v37.dict",
  "ScatterHaplotypeCaller.contamination": "0",
  "ScatterHaplotypeCaller.additional_options": "--max_alternate_alleles 3 -U ALLOW_SEQ_DICT_INCOMPATIBILITY",
  "ScatterHaplotypeCaller.intersect_intervals_file": "gs://shlee-dev/intervals_lists/b37/exome_calling_regions.v1.interval_list",
  "ScatterHaplotypeCaller.optional_XL": "-XL",
  "ScatterHaplotypeCaller.agg_small_disk": 200,
  "ScatterHaplotypeCaller.agg_medium_disk": 300,
  "ScatterHaplotypeCaller.agg_large_disk": 400,
  "ScatterHaplotypeCaller.agg_preemptible_tries": 3,
  "ScatterHaplotypeCaller.flowcell_small_disk": 200,
  "ScatterHaplotypeCaller.flowcell_medium_disk": 300,
  "ScatterHaplotypeCaller.preemptible_tries": 3,
  "ScatterHaplotypeCaller.scatter_intervals": [
["1"],
["2"],
["3"],
["4"],
["5"],
["6"],
["7"],
["8"],
["9"],
["10"],
["11"],
["12","13"],
["14","15"],
["16","17"],
["18","19","20","21"],
["22","X"],
["Y","MT","GL000207.1","GL000226.1","GL000229.1","GL000231.1","GL000210.1","GL000239.1","GL000235.1","GL000201.1","GL000247.1","GL000245.1","GL000197.1","GL000203.1","GL000246.1","GL000249.1","GL000196.1","GL000248.1","GL000244.1","GL000238.1","GL000202.1","GL000234.1","GL000232.1","GL000206.1","GL000240.1","GL000236.1","GL000241.1","GL000243.1","GL000242.1","GL000230.1","GL000237.1","GL000233.1","GL000204.1","GL000198.1","GL000208.1","GL000191.1","GL000227.1","GL000228.1","GL000214.1","GL000221.1","GL000209.1","GL000218.1","GL000220.1","GL000213.1","GL000211.1","GL000199.1","GL000217.1","GL000216.1","GL000215.1","GL000205.1","GL000219.1","GL000224.1","GL000223.1","GL000195.1","GL000212.1","GL000222.1","GL000200.1","GL000193.1","GL000194.1","GL000225.1","GL000192.1"]
]
}

(1.5) Helper commands and script to organize inputs for second workflow

First, generate a text file of inputs by `ls`ing and globbing

E.g. for a different set of cloud files:

wmd2a-330:genotype_cohort shlee$ gsutil ls gs://shlee-dev/platinum/HC_GVCFs/NA12???/*.vcf* > ls_15samples.txt

Then, run the txt file of file paths through the following python script

I copy-paste the output for the samples into the INPUTS JSON file for the second WDL script.

#!python
import json
import re

# Organizes a list of files, one file per line, into
# (i) a json formatted list of indices ending in .vcf.gz.tbi and
# (ii) a json formatted array of files according to matching string preceding .vcf.gz,
# e.g. 'chrM' in file.chrM.vcf.gz.

with open('ls_cat_all.txt', 'r') as all_files:
    files_list = all_files.read().splitlines()
    vcf_list = [file for file in files_list if file.endswith(".vcf.gz")]
    index_list = [file for file in files_list if file.endswith(".vcf.gz.tbi")]

print '\n' + '1. LIST OF INDICES:' + '\n'
print json.dumps(index_list)

shard_dict = {}
for file in vcf_list:
    shard_dict.setdefault(re.sub(r'.*\.(.*)\.(g\.vcf\.gz)', r'\1', file), []).append(file)

shard_array = []
for key in shard_dict:
    shard_array.append(shard_dict[key])

print '\n' + '2. ARRAY OF GVCFS:' + '\n'
print json.dumps(shard_array)
print '\n' + 'END'  + '\n'

(2) Second WDL workflow

GenotypeScatteredGVCFs_cloud_quicktest.wdl

# GenotypeScatteredGVCFs.wdl #############################################################
#
#########################################################################################

# TASK DEFINITIONS ######################################################################

task UnzipGVCFs {
    Array [File] group_gz_gvcfs
    File ref_dict
    Int disk_size
    Int preemptible_tries

    command <<<
    cp ${sep=' ' group_gz_gvcfs} .
    ls -1 *.vcf.gz | xargs -P 4 -n 1 -I {} \
    gunzip {}
    ls -1 *.vcf | xargs -P 4 -n 1 -I {} \
    java -Xmx2g -jar /usr/gitc/picard.jar SortVcf \
        I={} \
        O=sort_{} \
        SEQUENCE_DICTIONARY=${ref_dict}
    >>>
runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "3 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
    Array [File] unzipped_GVCFs = glob("sort_*.vcf")
    Array [File] gvcf_indices = glob("sort_*.vcf.idx")
    }
}

task GenotypeGVCFs {
    Array [File] scattered_gvcfs
    Array [File] GVCF_indices
    String vcf_basename
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    String? additional_options
    Int disk_size
    Int preemptible_tries

    command {
    java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx8000m \
        -jar /usr/gitc/GATK36.jar \
        -T GenotypeGVCFs \
        -R ${ref_fasta} \
        --variant ${sep=' --variant ' scattered_gvcfs} \
        -o ${vcf_basename}.gz ${additional_options}
    }
  runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "10 GB"
    cpu: "1"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
    File genotyped_vcf = "${vcf_basename}.gz"
    File genotyped_index = "${vcf_basename}.gz.tbi"
    }
}

# Combine multiple VCFs from scattered GenotypeGVCFs runs
task MergeVCFs {
    File ref_dict
    Array [File] input_vcfs
    Array [File] input_vcfs_indices
    String cohort_vcf_name
    Int disk_size
    Int preemptible_tries

    command {
    java -Xmx2g -jar /usr/gitc/picard.jar \
    MergeVcfs \
    INPUT=${sep=' INPUT=' input_vcfs} \
    OUTPUT=${cohort_vcf_name}.vcf.gz
    }
  runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "3 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
    File output_vcf = "${cohort_vcf_name}.vcf.gz"
    File output_vcf_index = "${cohort_vcf_name}.vcf.gz.tbi"
    }
}

# WORKFLOW DEFINITION ###################################################################
workflow GenotypeScatteredGVCFs {
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    Array [Array [File]] gvcf_groups
    String? additional_options
    String cohort_vcf_name
    Int agg_preemptible_tries
    Int agg_small_disk

# Scatter intervals
scatter (group in gvcf_groups) {

    # Get vcf_basename
    # The script defines the output vcf_basename automatically from the vcf
#   String sub_strip_path = "gs://.*/"
    String sub_strip_path = "^.+/[^\\.]+\\."

    # GenotypeGVCFs takes vcf + idx files so we need to uncompress
    call UnzipGVCFs {
    input:
    group_gz_gvcfs = group,
    ref_dict = ref_dict,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }

    # Joint genotype in parallel over intervals
    call GenotypeGVCFs {
    input:
    scattered_gvcfs = UnzipGVCFs.unzipped_GVCFs,
    GVCF_indices = UnzipGVCFs.gvcf_indices,
    vcf_basename = sub(UnzipGVCFs.unzipped_GVCFs[0], sub_strip_path, ""),
    ref_dict = ref_dict,
    ref_fasta = ref_fasta,
    ref_fasta_index = ref_fasta_index,
    additional_options = additional_options,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }
}

# Merge per-interval GVCFs into a single sample GVCF file
    call MergeVCFs {
    input:
    ref_dict = ref_dict,
    input_vcfs = GenotypeGVCFs.genotyped_vcf,
    input_vcfs_indices = GenotypeGVCFs.genotyped_index,
    cohort_vcf_name = cohort_vcf_name,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }

# Outputs that will be retained when execution is complete
    output {
    MergeVCFs.*
    }
}

GenotypeScatteredGVCFs_cloud_quicktest.json

{
  "GenotypeScatteredGVCFs.ref_dict": "gs://shlee-dev/ref/b37/human_g1k_v37.dict",
  "GenotypeScatteredGVCFs.agg_small_disk": 200,
  "GenotypeScatteredGVCFs.ref_fasta": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta",
  "GenotypeScatteredGVCFs.cohort_vcf_name": "simulated_trio_callset",
  "GenotypeScatteredGVCFs.ref_fasta_index": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta.fai",
  "GenotypeScatteredGVCFs.gvcf_groups": [["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.11.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.11.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.11.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.10.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.10.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.10.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.12.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.12.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.12.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.14.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.14.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.14.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.22.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.22.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.22.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.16.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.16.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.16.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.18.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.18.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.18.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.1.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.1.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.1.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.Y.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.Y.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.Y.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.3.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.3.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.3.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.2.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.2.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.2.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.5.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.5.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.5.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.4.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.4.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.4.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.7.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.7.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.7.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.6.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.6.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.6.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.9.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.9.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.9.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.8.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.8.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.8.vcf.gz"]],
  "GenotypeScatteredGVCFs.additional_options": "-stand_call_conf 0 -stand_emit_conf 0 -U ALLOW_SEQ_DICT_INCOMPATIBILITY",
  "GenotypeScatteredGVCFs.agg_preemptible_tries": 3
}

If there is some other way to run all of the above within a single cloud command instance, and new features in the WDL/Cromwell are not the best approach, please do let me know.

knoblett · 2016-11-15T15:21:31Z

Was this ever implemented? Or are we no longer planning on doing this feature?

A user expressed interest in this feature here.

cjllanwarne · 2016-11-15T15:26:38Z

Oops, this ticket shouldn't have been closed by that PR. Looks like a typo.

knoblett · 2016-11-15T15:29:30Z

Ah, okay! Thank you for the update.

LeeTL1220 · 2016-12-02T20:09:14Z

DSDE Methods would like this too. (Important for M2 development/evaluations)

cjllanwarne · 2017-01-27T21:23:08Z

@kcibul I propose closing this ticket since I don't think it'll be implemented in the foreseeable future.

There are actually two ways that this functionality already exists:

You can use the cross(arrayA, arrayB) function to get the array of all pairs (a,b | a ∈ arrayA, b ∈ arrayB). This single array can then be scattered over as normal. E.g:

workflow foo {
  # Inputs:
  Array[Int] arrayA
  Array[String] arrayB

  # Cross them together:
  Array[Pair[Int, String]] crossed = cross(arrayA, arrayB)
  
  scatter (p in crossed) {
    Int a = p.left
    String b = p.right

    # Do the work with a and b.
  }
}

If that isn't good enough (e.g. you need to do something to a before you scatter over the bs), you can put the inner scatter into a sub-workflow and Cromwell will be able to run it just fine.

sooheelee · 2017-01-31T15:33:28Z

@cjllanwarne Which version of Cromwell supports this feature? I'm writing another similar WDL workflow.

cjllanwarne · 2017-01-31T20:27:22Z

@sooheelee the cross method and sub-workflows were introduced in Cromwell 23.

katevoss · 2017-03-28T16:55:36Z

I believe this is now complete in Cromwell 23, let me know if there is more to be done here.

alongalor · 2017-09-04T17:49:31Z

If that isn't good enough (e.g. you need to do something to a before you scatter over the bs), you can put the inner scatter into a sub-workflow and Cromwell will be able to run it just fine.

This is what I have implemented in order to get around this but it adds significant complexity to code. I agree that support for a nested scatter option is an excellent idea and would save pipeline writers a lot of time.

alongalor · 2017-09-11T16:24:26Z

I believe this is now complete

By "this" do you mean nested scatter blocks or cross and subworkflows?

katevoss · 2017-09-12T17:06:17Z

@alongalor Subworkflows were completed in Cromwell 23, see the Changelog for more.

Horneth · 2017-09-12T17:48:59Z

@katevoss I think the confusion is maybe that nested scatters (which this ticket is about) are not completed. Subworkflows, which can be used as a way to obtain an equivalent result to nested scatters, are completed.
If you meant to close this because "nested scatters won't be done anytime soon and there is a (slightly painful but working) workaround for it" that's also totally fine :)

rdali · 2020-06-24T02:26:52Z

have nested scatters seen the light yet?

yfarjoun · 2021-09-20T03:52:51Z

I have managed to run nested scatters on Terra in Cromwell 67. So...that's a yes!

yfarjoun · 2021-09-20T03:53:06Z

🥳

stekaz mentioned this issue May 17, 2016

nested workflows #839

Closed

kcibul added the WDL Developer Joy label Sep 19, 2016

kcibul mentioned this issue Sep 21, 2016

Array Transpose WDL Function #1463

Closed

cjllanwarne closed this as completed in 13455bc Nov 15, 2016

cjllanwarne reopened this Nov 15, 2016

katevoss closed this as completed Mar 28, 2017

yfarjoun reopened this Sep 20, 2021

yfarjoun closed this as completed Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested scatter blocks #838

Nested scatter blocks #838

stekaz commented May 17, 2016

sooheelee commented Sep 19, 2016 •

edited

Loading

knoblett commented Nov 15, 2016

cjllanwarne commented Nov 15, 2016

knoblett commented Nov 15, 2016

LeeTL1220 commented Dec 2, 2016 •

edited

Loading

cjllanwarne commented Jan 27, 2017

sooheelee commented Jan 31, 2017

cjllanwarne commented Jan 31, 2017

katevoss commented Mar 28, 2017

alongalor commented Sep 4, 2017 •

edited

Loading

alongalor commented Sep 11, 2017

katevoss commented Sep 12, 2017

Horneth commented Sep 12, 2017

rdali commented Jun 24, 2020

yfarjoun commented Sep 20, 2021

yfarjoun commented Sep 20, 2021

Nested scatter blocks #838

Nested scatter blocks #838

Comments

stekaz commented May 17, 2016

sooheelee commented Sep 19, 2016 • edited Loading

I want to

(i) scatter scatters as requested by @skazakoff above (aka "nested scatter") and also

(ii) re-array and scatter again (what I'm calling criss-cross scatter)

Example of workflows that I want to run within a single WDL workflow

(1) First WDL workflow

UltimateScatterHaplotypeCaller_cloud_quicktest.wdl

UltimateScatterHaplotypeCaller_cloud_quicktest.json

(1.5) Helper commands and script to organize inputs for second workflow

First, generate a text file of inputs by lsing and globbing

Then, run the txt file of file paths through the following python script

(2) Second WDL workflow

GenotypeScatteredGVCFs_cloud_quicktest.wdl

GenotypeScatteredGVCFs_cloud_quicktest.json

knoblett commented Nov 15, 2016

cjllanwarne commented Nov 15, 2016

knoblett commented Nov 15, 2016

LeeTL1220 commented Dec 2, 2016 • edited Loading

cjllanwarne commented Jan 27, 2017

sooheelee commented Jan 31, 2017

cjllanwarne commented Jan 31, 2017

katevoss commented Mar 28, 2017

alongalor commented Sep 4, 2017 • edited Loading

alongalor commented Sep 11, 2017

katevoss commented Sep 12, 2017

Horneth commented Sep 12, 2017

rdali commented Jun 24, 2020

yfarjoun commented Sep 20, 2021

yfarjoun commented Sep 20, 2021

sooheelee commented Sep 19, 2016 •

edited

Loading

First, generate a text file of inputs by `ls`ing and globbing

LeeTL1220 commented Dec 2, 2016 •

edited

Loading

alongalor commented Sep 4, 2017 •

edited

Loading