Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on gatk-framework CombineVariants while squaring-off #679

Closed
inti opened this issue Dec 3, 2014 · 3 comments
Closed

Error on gatk-framework CombineVariants while squaring-off #679

inti opened this issue Dec 3, 2014 · 3 comments

Comments

@inti
Copy link

inti commented Dec 3, 2014

Hi,
I am finding this error

' returned non-zero exit status 1
Traceback (most recent call last):
  File "/home/shared/app/bcbio/tool/bin/bcbio_nextgen.py", line 216, in <module>
    main(**kwargs)
  File "/home/shared/app/bcbio/tool/bin/bcbio_nextgen.py", line 42, in main
    run_main(**kwargs)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 37, in run_main
    fc_dir, run_info_yaml)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 81, in _run_toplevel
    for xs in pipeline.run(config, run_info_yaml, parallel, dirs, samples):
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 174, in run
    samples = joint.square_off(samples, run_parallel)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/joint.py", line 94, in square_off
    "vrn_file", ["region", "sam_ref", "config"])
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 32, in grouped_parallel_split_combine
    final_output = parallel_fn(parallel_name, split_args)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 84, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 653, in __call__
    self.dispatch(function, args, kwargs)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 400, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 138, in __init__
    self.results = func(*args, **kwargs)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 142, in wrapper
    return apply(f, *args, **kwargs)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 147, in square_batch_region
    return joint.square_batch_region(*args)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/joint.py", line 123, in square_batch_region
    _square_batch_bcbio_variation(data, region, bam_files, vrn_files, out_file, "square")
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/joint.py", line 173, in _square_batch_bcbio_variation
    do.run(cmd, "%s in region: %s" % (cmd, bamprep.region_to_gatk(region)))
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/home/shared/app/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'bcbio-variation-recall square -Xms250m -Xmx2g -XX:+UseSerialGC -c 1 -r groupXVII:1-2314682 --caller freebayes /media/TeraData/ipedroso/ANALYSES/Gasterosteus_aculeatus_RADseq/Hohenlohe_2010_radseq/work2/joint/freebayes-joint/1/groupXVII/1-groupXVII_0_2314682.vcf.gz /home/shared/app/bcbio/genomes/Gasterosteus_aculeatus/G_aculeatus_v1/seq/G_aculeatus_v1.fa /media/TeraData/ipedroso/ANALYSES/Gasterosteus_aculeatus_RADseq/Hohenlohe_2010_radseq/work2/joint/freebayes-joint/1/groupXVII/1-groupXVII_0_2314682.vcf-inputs.txt
                     ##### ERROR ------------------------------------------------------------------------------------------
                       bcbio.run.itx/check-run       itx.clj:  151
           bcbio.variation.ensemble.prep/fn/fn      prep.clj:   81
              bcbio.variation.ensemble.prep/fn      prep.clj:   81
                   clojure.lang.MultiFn.invoke  MultiFn.java:  249
       bcbio.variation.recall.square/by-region    square.clj:  208
 bcbio.variation.recall.square/combine-vcfs/fn    square.clj:  273
bcbio.variation.recall.merge/prep-by-region/fn     merge.clj:  138
                           clojure.core/map/fn      core.clj: 2487
                     clojure.lang.LazySeq.sval  LazySeq.java:   42
                      clojure.lang.LazySeq.seq  LazySeq.java:   60
                           clojure.lang.RT.seq       RT.java:  484
                              clojure.core/seq      core.clj:  133
                           clojure.core/map/fn      core.clj: 2479
                     clojure.lang.LazySeq.sval  LazySeq.java:   42
                      clojure.lang.LazySeq.seq  LazySeq.java:   60
                        clojure.lang.Cons.next     Cons.java:   39
                        clojure.lang.RT.length       RT.java: 1646
                    clojure.lang.RT.seqToArray       RT.java: 1587
                  clojure.lang.LazySeq.toArray  LazySeq.java:  140
                       clojure.lang.RT.toArray       RT.java: 1565
                         clojure.core/to-array      core.clj:  333
                             clojure.core/sort      core.clj: 2753
                          clojure.core/sort-by      core.clj: 2769
                          clojure.core/sort-by      core.clj: 2767
   bcbio.variation.recall.merge/prep-by-region     merge.clj:  140
    bcbio.variation.recall.square/combine-vcfs    square.clj:  270
           bcbio.variation.recall.square/-main    square.clj:  320
                   clojure.lang.RestFn.applyTo   RestFn.java:  137
                            clojure.core/apply      core.clj:  617
             bcbio.variation.recall.main/-main      main.clj:   33
                   clojure.lang.RestFn.applyTo   RestFn.java:  137
              bcbio.variation.recall.main.main              :

2014-12-03 16:43:39 jimi ERROR [bcbio.variation.recall.main] -
java.lang.Exception: Shell command failed: gatk-framework -Xms250m -Xmx2g -XX:+UseSerialGC -T CombineVariants -R /home/shared/app/bcbio/genomes/Gasterosteus_aculeatus/G_aculeatus_v1/seq/G_aculeatus_v1.fa -L groupXVII:260959-261389 --out /media/TeraData/ipedroso/ANALYSES/Gasterosteus_aculeatus_RADseq/Hohenlohe_2010_radseq/work2/joint/freebayes-joint/1/groupXVII/union/txtmp7610835335328262845/union-groupXVII_260958_261389.vcf.gz --minimalVCF --sites_only --suppressCommandLineHeader --setKey null -U LENIENT_VCF_PROCESSING --logging_level ERROR --variant /media/TeraData/ipedroso/ANALYSES/Gasterosteus_aculeatus_RADseq/Hohenlohe_2010_radseq/work2/joint/freebayes-joint/1/groupXVII/merge/Bear_Paw_CAAC_SRR034310/Bear_Paw_CAAC_SRR034310-passonly-Bear_Paw_CAAC_SRR034310-groupXVII_1_2314682.vcf.gz --variant /media/TeraData/ipedroso/ANALYSES/Gasterosteus_aculeatus_RADseq/Hohenlohe_2010_radseq/work2/joint/freebayes-joint/1/groupXVII/merge/Bear_Paw_CACA_SRR034310/Bear_Paw_CACA_SRR034310-passonly-Bear_Paw_CACA_SRR034310-groupXVII_1_2314682.vcf.gz
                     ##### ERROR ------------------------------------------------------------------------------------------
                     ##### ERROR MESSAGE: java.util.ArrayList cannot be cast to java.lang.String
                     ##### ERROR
                     ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
                     ##### ERROR Visit our website and forum for extensive documentation and answers to 
                     ##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
                     ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
                     ##### ERROR
                     ##### ERROR A GATK RUNTIME ERROR has occurred (version 3.2-18-g478145d):
                     ##### ERROR ------------------------------------------------------------------------------------------
                        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)
                        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
                        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
                        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
                        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
                        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
                        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
                        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
                        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
                        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
                        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
                        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
                        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
                        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:117)
                        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:309)
                        at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:946)
                        at htsjdk.variant.variantcontext.VariantContext.getAttributeAsInt(VariantContext.java:703)
                        at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsInt(CommonInfo.java:242)
                     java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
                     ##### ERROR stack trace 
                     ##### ERROR ------------------------------------------------------------------------------------------
                       bcbio.run.itx/check-run       itx.clj:  151
           bcbio.variation.ensemble.prep/fn/fn      prep.clj:   81
              bcbio.variation.ensemble.prep/fn      prep.clj:   81
                   clojure.lang.MultiFn.invoke  MultiFn.java:  249
       bcbio.variation.recall.square/by-region    square.clj:  208
 bcbio.variation.recall.square/combine-vcfs/fn    square.clj:  273
bcbio.variation.recall.merge/prep-by-region/fn     merge.clj:  138
                           clojure.core/map/fn      core.clj: 2487
                     clojure.lang.LazySeq.sval  LazySeq.java:   42
                      clojure.lang.LazySeq.seq  LazySeq.java:   60
                           clojure.lang.RT.seq       RT.java:  484
                              clojure.core/seq      core.clj:  133
                           clojure.core/map/fn      core.clj: 2479
                     clojure.lang.LazySeq.sval  LazySeq.java:   42
                      clojure.lang.LazySeq.seq  LazySeq.java:   60
                        clojure.lang.Cons.next     Cons.java:   39
                        clojure.lang.RT.length       RT.java: 1646
                    clojure.lang.RT.seqToArray       RT.java: 1587
                  clojure.lang.LazySeq.toArray  LazySeq.java:  140
                       clojure.lang.RT.toArray       RT.java: 1565
                         clojure.core/to-array      core.clj:  333
                             clojure.core/sort      core.clj: 2753
                          clojure.core/sort-by      core.clj: 2769
                          clojure.core/sort-by      core.clj: 2767
   bcbio.variation.recall.merge/prep-by-region     merge.clj:  140
    bcbio.variation.recall.square/combine-vcfs    square.clj:  270
           bcbio.variation.recall.square/-main    square.clj:  320
                   clojure.lang.RestFn.applyTo   RestFn.java:  137
                            clojure.core/apply      core.clj:  617
             bcbio.variation.recall.main/-main      main.clj:   33
                   clojure.lang.RestFn.applyTo   RestFn.java:  137
              bcbio.variation.recall.main.main              :

' returned non-zero exit status 1

I am not sure what it means ... but I think it can be reproduced with

gatk-framework -Xms250m -Xmx2g -XX:+UseSerialGC -T CombineVariants -R /home/shared/app/bcbio/genomes/Gasterosteus_aculeatus/G_aculeatus_v1/seq/G_aculeatus_v1.fa -L groupXVII:260959-261389 --out  union-groupXVII_260958_261389.vcf.gz --minimalVCF --sites_only --suppressCommandLineHeader --setKey null -U LENIENT_VCF_PROCESSING --logging_level ERROR --variant /media/TeraData/ipedroso/ANALYSES/Gasterosteus_aculeatus_RADseq/Hohenlohe_2010_radseq/work2/joint/freebayes-joint/1/groupXVII/merge/Bear_Paw_CAAC_SRR034310/Bear_Paw_CAAC_SRR034310-passonly-Bear_Paw_CAAC_SRR034310-groupXVII_1_2314682.vcf.gz --variant /media/TeraData/ipedroso/ANALYSES/Gasterosteus_aculeatus_RADseq/Hohenlohe_2010_radseq/work2/joint/freebayes-joint/1/groupXVII/merge/Bear_Paw_CACA_SRR034310/Bear_Paw_CACA_SRR034310-passonly-Bear_Paw_CACA_SRR034310-groupXVII_1_2314682.vcf.gz
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
        at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsInt(CommonInfo.java:242)
        at htsjdk.variant.variantcontext.VariantContext.getAttributeAsInt(VariantContext.java:703)
        at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:946)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:309)
        at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:117)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
        at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
        at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
        at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
        at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314)
        at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.2-18-g478145d):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.util.ArrayList cannot be cast to java.lang.String
##### ERROR ------------------------------------------------------------------------------------------

all help or comments welcomed.
thanks in advance

@chapmanb
Copy link
Member

chapmanb commented Dec 4, 2014

Inti;
Thanks for the report. This looks like the same underlying issue as #675. My guess is that something is mismatched between the combined header and the actual attributes. Would you be able to send along the VCF files? I can try to pinpoint the cause and add a fix. Thanks again.

@chapmanb
Copy link
Member

chapmanb commented Dec 5, 2014

Inti;
Thanks much for sending those files along. It looks like the issue is that vcfallelicprimitives is producing invalid VCF output for MNPs with complex multiple alleles. We'd tried to resolve this earlier with #674 but vcffixup does not do the right thing in other cases for resolving these, resulting in the invalid VCF.

At this point we'll need to do more work to be able to pass INFO and FORMAT fields when splitting MNPs, so will drop this capability, introduced in 0.8.3.

Practically the best thing to do is re-run the FreeBayes calls to get clean output, which should then feed into the downstream merge steps correctly.

Thanks again for the help tracking down the underlying issue.

@inti
Copy link
Author

inti commented Dec 9, 2014

The pipeline is running fine now. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants