Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested scatter blocks #838

Closed
stekaz opened this issue May 17, 2016 · 16 comments
Closed

Nested scatter blocks #838

stekaz opened this issue May 17, 2016 · 16 comments

Comments

@stekaz
Copy link

stekaz commented May 17, 2016

It would be great if Cromwell supported nested scatter blocks. For example:

workflow wf {

  Array[Array[String]] array

  scatter (arr in array) {
    scatter (i in arr) {
      call foo {
        input: str=i
      }
    }
  }
}
@sooheelee
Copy link

sooheelee commented Sep 19, 2016

I want to

(i) scatter scatters as requested by @skazakoff above (aka "nested scatter") and also

(ii) re-array and scatter again (what I'm calling criss-cross scatter)

As I show in this diagram:

img_4460

My attempt to ask WDL/Cromwell to do the first scatter-of-scatter above gives the following error:

[2016-09-02 16:30:43,699] [error] WorkflowActor [e1b18d33]: Call failed to initialize: Failed to start call: Nested Scatters are not supported (yet).

It was unclear how I would go about criss-cross scattering.

My current solution around this is to run two separate WDL workflows. Between the two workflows, I use a simple python script to organize the files for the second scatter. The first WDL workflow is per sample, while the second WDL workflow acts on all the outputs of the multiple runs of the first workflow (multiple samples).


Example of workflows that I want to run within a single WDL workflow

These are scripts that run on the cloud that I would like to run within a single WDL script using the nest-scatter and criss-cross-scatter features. Currently, I run the first script per sample (1) for multiple samples, then run a helper script to organize all the outputs (1.5), and finally run the second WDL script across the multi-sample outputs per genomic interval (2).

I would like to be able to scatter the samples, scatter per interval, then run a differently organized (criss-cross) scatter across all the samples per interval.

(1) First WDL workflow

UltimateScatterHaplotypeCaller_cloud_quicktest.wdl

# ScatterHaplotypeCaller.wdl #############################################################
# Must use GATK v3.6, especially with hg38 where contig names have colons,
# as a bug in prior versions of GATK strips this off.
# Each BAM file represents a single different sample.
# Option to include an additional file to HaplotypeCaller run. If an intervals list is
# supplied here, then the intersection of this will be used against the scattered
# intervals.
#########################################################################################

# TASK DEFINITIONS ######################################################################

# Call variants on a single sample with HaplotypeCaller to produce a GVCF
# Here allowing scattering over contigs from one interval list,
# and allowing application of a second OPTIONAL intervals list.
# The intersection of the two is used instead of a union so we can be more discerning
# of processed regions and so runs can finish in shorter times.
# Make the application of additional options, e.g. read filters, optional.

task InvertIntervals {
    File? intersect_intervals_file
    Int disk_size
    Int preemptible_tries

    command {
    java -Xmx2g -jar /usr/gitc/picard.jar IntervalListTools \
      I=${intersect_intervals_file} \
      INVERT=true \
      O=inverted.interval_list
    }
runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "3 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
      File exclude_intervals = "inverted.interval_list"
    }
}

task HaplotypeCaller {
    File input_bam
    File bam_index
    String sample_basename
    String starting_contig
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    Float? contamination
    String? additional_options
    String? optional_XL
    File? exclude_intervals
    Array [String] scatter_intervals_group
    Int disk_size
    Int preemptible_tries

  command {
    java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx800m \
      -jar /usr/gitc/GATK36.jar \
      -T HaplotypeCaller \
      -R ${ref_fasta} \
      -o ${sample_basename}.${starting_contig}.vcf.gz \
      -I ${input_bam} \
      -L ${sep=" -L " scatter_intervals_group} \
      -ERC GVCF \
      -variant_index_parameter 128000 \
      -variant_index_type LINEAR \
      -contamination ${default=0 contamination} \
      ${additional_options} \
      ${optional_XL} ${exclude_intervals}
  }
  runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "10 GB"
    cpu: "1"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
  output {
    File output_gvcf = "${sample_basename}.${starting_contig}.vcf.gz"
    File output_gvcf_index = "${sample_basename}.${starting_contig}.vcf.gz.tbi"
  }
}

# WORKFLOW DEFINITION ###################################################################
workflow ScatterHaplotypeCaller {
    File input_bam
    File input_bam_index
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    Float? contamination
    String? additional_options
    String? optional_XL
    File? intersect_intervals_file
    String bam_directory_path
    Int agg_preemptible_tries
    Int agg_small_disk
    Array [Array [String]] scatter_intervals

# Create inverted intervals that will be skipped by HaplotypeCaller
    call InvertIntervals {
    input:
    intersect_intervals_file = intersect_intervals_file,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }

# Get sample_basename
# The script defines the output sample_basename automatically from the BAM basename
    String sub_strip_path = bam_directory_path
    String sub_strip_extension = "\\.bam"

# Call variants in parallel over grouped calling intervals
    scatter (group in scatter_intervals) {

    # Generate GVCF by interval
    call HaplotypeCaller {
    input:
    input_bam = input_bam,
    bam_index = input_bam_index,
    scatter_intervals_group = group,
    sample_basename = sub(sub(input_bam, sub_strip_path, ""), sub_strip_extension, ""),
    starting_contig = group[0],
    ref_dict = ref_dict,
    ref_fasta = ref_fasta,
    ref_fasta_index = ref_fasta_index,
    contamination =  contamination,
    additional_options = additional_options,
    optional_XL = optional_XL,
    exclude_intervals = InvertIntervals.exclude_intervals,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }
    }

# Outputs that will be retained when execution is complete
    output {
    HaplotypeCaller.*
    }
}

UltimateScatterHaplotypeCaller_cloud_quicktest.json

{
  "ScatterHaplotypeCaller.input_bam": "gs://shlee-dev/simulated_bam/papa_hg37_snaut.bam",
  "ScatterHaplotypeCaller.input_bam_index": "gs://shlee-dev/simulated_bam/papa_hg37_snaut.bai",
  "ScatterHaplotypeCaller.bam_directory_path": "gs://shlee-dev/simulated_bam/",
  "ScatterHaplotypeCaller.ref_fasta": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta",
  "ScatterHaplotypeCaller.ref_fasta_index": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta.fai",
  "ScatterHaplotypeCaller.ref_dict": "gs://shlee-dev/ref/b37/human_g1k_v37.dict",
  "ScatterHaplotypeCaller.contamination": "0",
  "ScatterHaplotypeCaller.additional_options": "--max_alternate_alleles 3 -U ALLOW_SEQ_DICT_INCOMPATIBILITY",
  "ScatterHaplotypeCaller.intersect_intervals_file": "gs://shlee-dev/intervals_lists/b37/exome_calling_regions.v1.interval_list",
  "ScatterHaplotypeCaller.optional_XL": "-XL",
  "ScatterHaplotypeCaller.agg_small_disk": 200,
  "ScatterHaplotypeCaller.agg_medium_disk": 300,
  "ScatterHaplotypeCaller.agg_large_disk": 400,
  "ScatterHaplotypeCaller.agg_preemptible_tries": 3,
  "ScatterHaplotypeCaller.flowcell_small_disk": 200,
  "ScatterHaplotypeCaller.flowcell_medium_disk": 300,
  "ScatterHaplotypeCaller.preemptible_tries": 3,
  "ScatterHaplotypeCaller.scatter_intervals": [
["1"],
["2"],
["3"],
["4"],
["5"],
["6"],
["7"],
["8"],
["9"],
["10"],
["11"],
["12","13"],
["14","15"],
["16","17"],
["18","19","20","21"],
["22","X"],
["Y","MT","GL000207.1","GL000226.1","GL000229.1","GL000231.1","GL000210.1","GL000239.1","GL000235.1","GL000201.1","GL000247.1","GL000245.1","GL000197.1","GL000203.1","GL000246.1","GL000249.1","GL000196.1","GL000248.1","GL000244.1","GL000238.1","GL000202.1","GL000234.1","GL000232.1","GL000206.1","GL000240.1","GL000236.1","GL000241.1","GL000243.1","GL000242.1","GL000230.1","GL000237.1","GL000233.1","GL000204.1","GL000198.1","GL000208.1","GL000191.1","GL000227.1","GL000228.1","GL000214.1","GL000221.1","GL000209.1","GL000218.1","GL000220.1","GL000213.1","GL000211.1","GL000199.1","GL000217.1","GL000216.1","GL000215.1","GL000205.1","GL000219.1","GL000224.1","GL000223.1","GL000195.1","GL000212.1","GL000222.1","GL000200.1","GL000193.1","GL000194.1","GL000225.1","GL000192.1"]
]
}

(1.5) Helper commands and script to organize inputs for second workflow

First, generate a text file of inputs by lsing and globbing

E.g. for a different set of cloud files:

wmd2a-330:genotype_cohort shlee$ gsutil ls gs://shlee-dev/platinum/HC_GVCFs/NA12???/*.vcf* > ls_15samples.txt

Then, run the txt file of file paths through the following python script

I copy-paste the output for the samples into the INPUTS JSON file for the second WDL script.

#!python
import json
import re

# Organizes a list of files, one file per line, into
# (i) a json formatted list of indices ending in .vcf.gz.tbi and
# (ii) a json formatted array of files according to matching string preceding .vcf.gz,
# e.g. 'chrM' in file.chrM.vcf.gz.

with open('ls_cat_all.txt', 'r') as all_files:
    files_list = all_files.read().splitlines()
    vcf_list = [file for file in files_list if file.endswith(".vcf.gz")]
    index_list = [file for file in files_list if file.endswith(".vcf.gz.tbi")]

print '\n' + '1. LIST OF INDICES:' + '\n'
print json.dumps(index_list)

shard_dict = {}
for file in vcf_list:
    shard_dict.setdefault(re.sub(r'.*\.(.*)\.(g\.vcf\.gz)', r'\1', file), []).append(file)

shard_array = []
for key in shard_dict:
    shard_array.append(shard_dict[key])

print '\n' + '2. ARRAY OF GVCFS:' + '\n'
print json.dumps(shard_array)
print '\n' + 'END'  + '\n'

(2) Second WDL workflow

GenotypeScatteredGVCFs_cloud_quicktest.wdl

# GenotypeScatteredGVCFs.wdl #############################################################
#
#########################################################################################

# TASK DEFINITIONS ######################################################################

task UnzipGVCFs {
    Array [File] group_gz_gvcfs
    File ref_dict
    Int disk_size
    Int preemptible_tries

    command <<<
    cp ${sep=' ' group_gz_gvcfs} .
    ls -1 *.vcf.gz | xargs -P 4 -n 1 -I {} \
    gunzip {}
    ls -1 *.vcf | xargs -P 4 -n 1 -I {} \
    java -Xmx2g -jar /usr/gitc/picard.jar SortVcf \
        I={} \
        O=sort_{} \
        SEQUENCE_DICTIONARY=${ref_dict}
    >>>
runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "3 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
    Array [File] unzipped_GVCFs = glob("sort_*.vcf")
    Array [File] gvcf_indices = glob("sort_*.vcf.idx")
    }
}

task GenotypeGVCFs {
    Array [File] scattered_gvcfs
    Array [File] GVCF_indices
    String vcf_basename
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    String? additional_options
    Int disk_size
    Int preemptible_tries

    command {
    java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx8000m \
        -jar /usr/gitc/GATK36.jar \
        -T GenotypeGVCFs \
        -R ${ref_fasta} \
        --variant ${sep=' --variant ' scattered_gvcfs} \
        -o ${vcf_basename}.gz ${additional_options}
    }
  runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "10 GB"
    cpu: "1"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
    File genotyped_vcf = "${vcf_basename}.gz"
    File genotyped_index = "${vcf_basename}.gz.tbi"
    }
}

# Combine multiple VCFs from scattered GenotypeGVCFs runs
task MergeVCFs {
    File ref_dict
    Array [File] input_vcfs
    Array [File] input_vcfs_indices
    String cohort_vcf_name
    Int disk_size
    Int preemptible_tries

    command {
    java -Xmx2g -jar /usr/gitc/picard.jar \
    MergeVcfs \
    INPUT=${sep=' INPUT=' input_vcfs} \
    OUTPUT=${cohort_vcf_name}.vcf.gz
    }
  runtime {
    docker: "broadinstitute/genomes-in-the-cloud:2.2.3-1469027018"
    memory: "3 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
}
    output {
    File output_vcf = "${cohort_vcf_name}.vcf.gz"
    File output_vcf_index = "${cohort_vcf_name}.vcf.gz.tbi"
    }
}

# WORKFLOW DEFINITION ###################################################################
workflow GenotypeScatteredGVCFs {
    File ref_dict
    File ref_fasta
    File ref_fasta_index
    Array [Array [File]] gvcf_groups
    String? additional_options
    String cohort_vcf_name
    Int agg_preemptible_tries
    Int agg_small_disk

# Scatter intervals
scatter (group in gvcf_groups) {

    # Get vcf_basename
    # The script defines the output vcf_basename automatically from the vcf
#   String sub_strip_path = "gs://.*/"
    String sub_strip_path = "^.+/[^\\.]+\\."

    # GenotypeGVCFs takes vcf + idx files so we need to uncompress
    call UnzipGVCFs {
    input:
    group_gz_gvcfs = group,
    ref_dict = ref_dict,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }

    # Joint genotype in parallel over intervals
    call GenotypeGVCFs {
    input:
    scattered_gvcfs = UnzipGVCFs.unzipped_GVCFs,
    GVCF_indices = UnzipGVCFs.gvcf_indices,
    vcf_basename = sub(UnzipGVCFs.unzipped_GVCFs[0], sub_strip_path, ""),
    ref_dict = ref_dict,
    ref_fasta = ref_fasta,
    ref_fasta_index = ref_fasta_index,
    additional_options = additional_options,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }
}

# Merge per-interval GVCFs into a single sample GVCF file
    call MergeVCFs {
    input:
    ref_dict = ref_dict,
    input_vcfs = GenotypeGVCFs.genotyped_vcf,
    input_vcfs_indices = GenotypeGVCFs.genotyped_index,
    cohort_vcf_name = cohort_vcf_name,
    disk_size = agg_small_disk,
    preemptible_tries = agg_preemptible_tries
    }

# Outputs that will be retained when execution is complete
    output {
    MergeVCFs.*
    }
}

GenotypeScatteredGVCFs_cloud_quicktest.json

{
  "GenotypeScatteredGVCFs.ref_dict": "gs://shlee-dev/ref/b37/human_g1k_v37.dict",
  "GenotypeScatteredGVCFs.agg_small_disk": 200,
  "GenotypeScatteredGVCFs.ref_fasta": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta",
  "GenotypeScatteredGVCFs.cohort_vcf_name": "simulated_trio_callset",
  "GenotypeScatteredGVCFs.ref_fasta_index": "gs://shlee-dev/ref/b37/human_g1k_v37.fasta.fai",
  "GenotypeScatteredGVCFs.gvcf_groups": [["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.11.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.11.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.11.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.10.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.10.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.10.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.12.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.12.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.12.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.14.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.14.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.14.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.22.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.22.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.22.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.16.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.16.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.16.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.18.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.18.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.18.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.1.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.1.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.1.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.Y.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.Y.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.Y.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.3.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.3.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.3.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.2.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.2.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.2.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.5.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.5.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.5.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.4.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.4.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.4.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.7.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.7.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.7.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.6.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.6.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.6.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.9.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.9.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.9.vcf.gz"], ["gs://shlee-dev/quicktest/scattered_GVCFs/altalt/altalt_snaut.8.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/paalt/paalt_snaut.8.vcf.gz", "gs://shlee-dev/quicktest/scattered_GVCFs/papa/papa_snaut.8.vcf.gz"]],
  "GenotypeScatteredGVCFs.additional_options": "-stand_call_conf 0 -stand_emit_conf 0 -U ALLOW_SEQ_DICT_INCOMPATIBILITY",
  "GenotypeScatteredGVCFs.agg_preemptible_tries": 3
}

If there is some other way to run all of the above within a single cloud command instance, and new features in the WDL/Cromwell are not the best approach, please do let me know.

@knoblett
Copy link

Was this ever implemented? Or are we no longer planning on doing this feature?

A user expressed interest in this feature here.

@cjllanwarne
Copy link
Contributor

Oops, this ticket shouldn't have been closed by that PR. Looks like a typo.

@cjllanwarne cjllanwarne reopened this Nov 15, 2016
@knoblett
Copy link

Ah, okay! Thank you for the update.

@LeeTL1220
Copy link

LeeTL1220 commented Dec 2, 2016

DSDE Methods would like this too. (Important for M2 development/evaluations)

@cjllanwarne
Copy link
Contributor

@kcibul I propose closing this ticket since I don't think it'll be implemented in the foreseeable future.

There are actually two ways that this functionality already exists:

  • You can use the cross(arrayA, arrayB) function to get the array of all pairs (a,b | a ∈ arrayA, b ∈ arrayB). This single array can then be scattered over as normal. E.g:
workflow foo {
  # Inputs:
  Array[Int] arrayA
  Array[String] arrayB

  # Cross them together:
  Array[Pair[Int, String]] crossed = cross(arrayA, arrayB)
  
  scatter (p in crossed) {
    Int a = p.left
    String b = p.right

    # Do the work with a and b.
  }
}
  • If that isn't good enough (e.g. you need to do something to a before you scatter over the bs), you can put the inner scatter into a sub-workflow and Cromwell will be able to run it just fine.

@sooheelee
Copy link

@cjllanwarne Which version of Cromwell supports this feature? I'm writing another similar WDL workflow.

@cjllanwarne
Copy link
Contributor

@sooheelee the cross method and sub-workflows were introduced in Cromwell 23.

@katevoss
Copy link

I believe this is now complete in Cromwell 23, let me know if there is more to be done here.

@alongalor
Copy link

alongalor commented Sep 4, 2017

If that isn't good enough (e.g. you need to do something to a before you scatter over the bs), you can put the inner scatter into a sub-workflow and Cromwell will be able to run it just fine.

This is what I have implemented in order to get around this but it adds significant complexity to code. I agree that support for a nested scatter option is an excellent idea and would save pipeline writers a lot of time.

@alongalor
Copy link

I believe this is now complete

By "this" do you mean nested scatter blocks or cross and subworkflows?

@katevoss
Copy link

@alongalor Subworkflows were completed in Cromwell 23, see the Changelog for more.

@Horneth
Copy link
Contributor

Horneth commented Sep 12, 2017

@katevoss I think the confusion is maybe that nested scatters (which this ticket is about) are not completed. Subworkflows, which can be used as a way to obtain an equivalent result to nested scatters, are completed.
If you meant to close this because "nested scatters won't be done anytime soon and there is a (slightly painful but working) workaround for it" that's also totally fine :)

@rdali
Copy link

rdali commented Jun 24, 2020

have nested scatters seen the light yet?

@yfarjoun yfarjoun reopened this Sep 20, 2021
@yfarjoun
Copy link

I have managed to run nested scatters on Terra in Cromwell 67. So...that's a yes!

@yfarjoun
Copy link

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests